Considered a creative skill, writing has long been seen as mostly immune to automation and commoditization — the seemingly inevitable end-state of anything touched by the Internet. Perhaps no longer.
What’s the score?
One of the more ubiquitous writing genres is sports reporting. Countless publications, portals, aggregators and distributors in print, radio, TV and Internet cover team rosters, game previews, schedules, results and all manner of short notices from Little League to college games to professional sports. An army of writers are routinely tasked to generate the base content for this wide spectrum of sports coverage.
Here’s a recent example. Despite having been promoted as championship contenders this year but currently being at the very bottom of the NBA standings, Brooklyn Nets and NY Knicks recently met. The day before the game, as is customary, a “preview” of the upcoming game for general syndication had to be written. Something with a lede like this:
Now remember, there are games in all sports. At all levels. Across the entire world. Every single day. There are also daily and and hourly developments to be covered in finance, weather, healthcare, marketing, real estate, politics, entertainment, transportation, technology and myriad other fields. There’s always been an insatiable demand for expository writing across the board. While domains are very different, to an analytical eye all such data-driven writing share two important traits: they’re very structured and highly automatable. Everything in the game preview above is simple prose, wrapped around stored data, shown in blue here:
It turns out one NBA game preview is pretty much the same as any other similar game. We could structurally separate parts that can be substituted for different data about the other 28 teams and roughly the same compositional logic:
If we can now plug in team-specific names, places and data wherever there’s one of those blue-bracketed placeholders above, we could customize a game preview so specifically to a given event that I’m confident 95% of the reading public couldn’t tell if those sentences were composed by a human writer or an algorithm, like the one I pseudo-reverse-engineered and highly simplified below:
Fortunately, or unfortunately depending on your perspective and profession, such algorithmic-writing is not some hovering, hyperlooping fantasy. Here’s the actual preview that ran across many sites on the Internet and elsewhere before the game:
And syndicated in one of the biggest such venues, Yahoo Sports:
Who’s your daddy?
See the non-human byline below the headline, Automated Insights? That’s one of the new generation of companies involved in algorithmic writing. There are, and will be, others. For the initiated, the technology is quite straight forward. Often structured data is the gating factor, not compositional technology. Parsing and conditional templating technology is well understood by now. It’s tedious but low-scale pieces could be done with procedural programming, larger ones with rules engines and truly scalable and flexible ones with semantic coupling of the domain specific data.
In fact, many aspects of the writing itself is amenable to conditional embellishment of the parts of speech. For example, in the piece above, we could have pre-programmed a list of synonyms for “struggled” and picked a substitute randomly or one specific to geography, audience or sports. Lexical stylization can indeed get very sophisticated through contextual or randomized algorithms. Management of such conditional logic and metadata at scale has been possible for a couple of decades. When composing a personalized investment report or answering a question on your iPhone, your broker and Siri (though using different technologies underneath) already do something similar.
In our example, the day before the game there was another “Knicks-Nets Preview” written by a human, Associated Press basketball writer Brian Mahoney, also syndicated in Yahoo Sports. The two pieces clearly serve different purposes. Mahoney’s article is much longer, as well as being significantly more detailed, colorful and analytical. Automated Insights’s preview is all about brevity, information, timeliness and, ultimately, volume, coverage and cost-effectiveness. In one millionth of the time it takes Mahoney to write one of his NBA previews, Automated Insights can generate previews for all the games not just in NBA but in all sports, anywhere on the planet, as long as there’s underlying data. And in a domain like sports, there’s plenty of data.
The differentiating cost of algorithmic writing is nearly all front-loaded on template and conditional logic programming. When done properly, this can obviate post-production fact checking and proof reading. Once set up, these pieces can be auto-produced when underlying data changes or when schedules are triggered. Thus the marginal cost of iterative articles approaches zero.
The day has arrived
Clearly, programmed robots can in fact write sports previews. And many other types of writing suitable for algorithmic automation. As is the case with the Internet, this will displace a lot of writers and also create concomitant technology jobs elsewhere.
It would be easy to dismiss this as procedural, utilitarian writing that doesn’t share much with literary prose. Granted. But such competition is not the focus of algorithmic writing. Not yet, anyway. Given enough nouns, verbs and associations in a specific knowledge domain, you’d be surprised how close you can come in compositional “believability” even today. Tomorrow, don’t be surprised if your next textbook or travel guide or cookbook is written mostly by domain-specific algorithms. And welcome to the [“brave” | “splendid” | “efficient” | “fearful” | “faceless” | “decimating”] new world of algorithms…eating yet another profession!