Parse.ly Blog

Scroll to Info & Navigation

Bigger Data, Smarter Scaling

Parse.ly’s CTO, Andrew Montalenti was one of three presenters at the Times Open at the New York Times HQ on Wednesday, October 17. The theme of the event was “Bigger Data, Smarter Scaling” and who better to talk about scaling big data than the guys ingesting data from some of the web’s most highly-trafficked websites. 

To give you a sense for what we’re dealing with at Parse.ly, just take a look at a couple of Andrew’s slides posted below. 

What did our growth look like during our first year? Well we started with a handful of beta partners and then skyrocketed. Kudos to our engineers for keeping the infrastructure running better than ever during a time of crazy growth. 

Scaling, and scaling quickly as we’ve had to do is not easy but luckily we’ve managed it quite well thanks to some smart decisions and technology choices. Big data is great and all, but without speed, you’re not really scaling effectively.

Our users of Dash and our API particularly value speed. Waiting minutes or even seconds for reports to load is unacceptable in fast paced newsrooms. Some of the best praise we’ve gotten is actually around the speed of Dash compared to the usual suspects (primarily Google Analytics and Omniture). Here’s some of the technologies that make it all possible.

Lastly, Andrew presented on some competing news standards including rNews, OpenGraph, and Schema.org. To enrich normal analytics data, Parse.ly scrapes publisher pages for additional metadata like author, section, publish date, thumbnail image, headline, etc.

When we first started, we had to write a custom crawler for every single publisher we brought on board. Our time got really, really good at writing these spiders quickly and effectively, but we knew it couldn’t scale for the long haul.

We therefore participated in the development of the rNews metadata standard, and through our work there, discovered that our product provides a great impetus for publishers to adopt open metadata standards.

On stage at the TimesOpen event, our awesome full-stack engineer Emmett Butler also released Mr. Schemato into the wild. Described as “a friendly semantic web validator and distiller that is making metadata cool again”, it eases the adoption and use of open standards. We are hosting an open Github project to collaborate on this effort across the industry.

Andrew was kind enough to post all of his slides online. You can find them here: https://speakerdeck.com/u/amontalenti/p/semantic-analytics-at-scale

Let us know if you’d like to talk more. You can find us on Twitter: @parsely

For info on future Times Open events, here’s the upcoming schedule: http://open.blogs.nytimes.com/timesopen-schedule/

- John Levitt, Director of Sales & Marketing

New York Times Makes Up Ad Money In Circulation

Not surprisingly, ad revenue is down 8.1% for the New York Times in the first quarter. What is surprising, however, is that the Times is reporting an increase in circulation revenue of 9.7%. According to Forbes, the Times beat first quarter estimates by 300%, earning 8 cents a share. Refinements in the Times’s digital subscriptions model, specifically, the decrease in free articles from 20 to 10, have improved subscription conversion rates. Like The Economist, the Times is proving that conventional circulation systems, if effectively replicated in a digital space, can drive big revenue. I stand corrected; apparently the paywall is working.

Why Do We Care About NYT-Flipboard If the Paywall Is Leakier Than the Titanic?

On Tuesday, GigaOM’s Mathew Ingram wrote about “why the NYT-Flipboard deal is a smart move.” To summarize, Ingram believes that the deal is an important step for the Times towards adapting to digital content distribution. Flipboard is a news aggregating service, and before striking the deal the Times found that 20% of its digital subscribers use similar aggregation platforms. As Ingram titles one of his article’s subheadings, “people discover content differently now.” Clearly, the Times needs to distribute its content across third-party apps and reading platforms in order to capture lost revenue and potential readers. Furthermore, the deal with Flipboard allows the Times to better understand how content discovery unfolds in third-party spaces. Yet, I am skeptical that the deal heralds a radical transformation of the Times’s distribution and consumption structures, especially because the Times has yet to figure out, in any meaningful way, how to keep subscribers in and moochers out.

To access Times content now as a non-subscriber, you can just browse through the Times website, copying and pasting interesting headlines into a Google search. Since the Times does not keep click-through traffic out, you can access almost all the Times content you want by spending an extra 20 seconds per article to navigate through this gaping hole in the paywall hull. The main problem is that the Times website is completely open to traffic, whereas the article content itself isn’t—which allows malfeasant users (myself coughcough) to identify desired content and then exploit the hole. Surely, the Times has conducted extensive research on its paywall/website interfaces and understands best how to optimize revenue. Nevertheless, it seems as though there must be a better way to regulate content access and to incentivize subscription. In fact, if I could not conduct my very brief evasion tactic, I would definitely subscribe, because I like reading the Times enough to pay the nominal fee. But the hassle of signing up combined with the monetary cost of a subscription make my maneuver worth it. The Economist is an example of how you can make a digital subscription system profitable. Of course, The Economist produces content of a very different source from the Times, even heavier on analysis, anonymous, and skewed towards a particular and somewhat bounded audience. Fundamentally, The Economist is a magazine, and it remains unproven whether newspapers can implement digital subscription models successfully. I would contend that the Times has not.

Therefore, the NYT-Flipboard deal is an interesting but not especially revolutionary development in the history of digital newspapers. The deal demonstrates how newspaper companies are being forced to adapt, at least on a constrained scale; however, it does not indicate a sudden shift in direction, a 180 wherein the Times embraces alternative revenue models than a paywall. Ultimately, the problem is that the Times is not innovating independently, but rather responding to offsite innovation. No wonder it looks like newspapers are playing catch-up: there’s no native innovation to speak of. If the Times produced an alternative, competitive reading platform to news aggregators, or worked to radically democratize content distribution—beyond the pseudo-democratization that revenue-gobbling titans advance as an illusory “value”—they could avoid having to fit their old-fashioned newsprint into incompatible, newfangled formats.  

APIs Make News Media More Flexible

On June 12th, publishers met at the New York Times Conference Center for International Press Telecommunications Council Business and Technology Day. Whew—that’s a word count killer. NewsCred, the Associated Press, and the New York Times presented on a panel about news media APIs. All three publishers have released an API. They shared how their API increases the flexibility of content creation:

Read more

Commercializing Art and Democratizing Culture: Incompatible Objectives?

In her article for the New York Times, ”Web Sites Illuminate Unknown Artists,” Melena Ryzik describes how startups like ArtistsWanted.org, Behance.net, and EveryArt.com “talk not only about making money but also about democratizing culture.”These sites help discover new and unheard of artists and then sell their work. The business model seems sound, so art market startups have attracted sizable investments. Can a startup really democratize culture and turn a profit at the same time?

Read more