News Organizations Are Using Artificial Intelligence to Generate Data-driven Articles


Will AI algorithms replace trained human reporters? The automated journalism revolution may already be in its infancy. henrik5000/Getty Images
Will AI algorithms replace trained human reporters? The automated journalism revolution may already be in its infancy. henrik5000/Getty Images

Maybe you're a fan of minor league baseball. Or perhaps you're an investor interested in a company's performance last quarter, or a politics junkie who wants to dig into the numbers of local election results. Whatever type of data devourer you are, you may be surprised to learn that that news story you just read wasn't written by a human being.

Automated journalism, in which software programs with artificial intelligence capabilities use algorithms to fashion stories from raw data such as sports box scores and corporate earnings reports, seems to be one of the hotter new trends in the media business. Proponents say that using robotic writers can help news organizations to produce vastly more coverage of topics where the news is mostly numbers, while freeing human journalists from the drudgery of churning out formulaic articles and allowing them the time to report and write stories on more important, complicated subjects.

The Associated Press (AP) helped launch the trend back in 2014 when it started using article-generating software created by a startup called Automated Insights to automatically produce stories on U.S. corporate earnings. Previously, human reporters had cranked out those by-the-numbers articles on deadline, as soon as possible following company announcements. The software enabled AP to increase its output of the stories — typically 150 to 300 words long — by a factor of 12, from 300 per quarter to about 3,700, according to an Automated Insights case study.

It worked so well that in 2016, AP began using the software to cover 10,000 minor league baseball games across the nation each season, using data from box scores to pump out stories that go out on the wire within minutes of the umpire's final call. The copy isn't anything that you'd confuse with the work of sports essayist Roger Angell. The pieces include no quotes from players or colorful descriptions of plays, and it's possible that a newsworthy moment such as a bench-clearing brawl might not make it into the story if no players were ejected. But the stories do provide insights such as whether a player is having a career-defining performance, or if the team has extended a winning streak against its opponent. It's the kind of deep statistical reporting that is the bread and butter of fantasy sports gaming.

The trend has spread to other news organizations. The Washington Post, which began using a program called Heliograph to generate updates at the 2016 Rio Olympics, and then employed the technology to cover U.S. House, Senate and gubernatorial races in all 50 states — nearly 500 elections altogether — in the 2016 election cycle. Bloomberg editor-in-chief John Micklethwait told the news organization's staffers in a memo last year that automation "is crucial to the future of journalism in a much broader way than many of us realize," according to journalism think-tank The Poynter Institute.

So far, automation has been confined to "some of the more cookie-cutter, stats-heavy kind of stories, such as earnings reports and sports," says Ricardo Bilton, a staffer at Neiman Journalism Lab, via email. The organization tracks news media innovation as part of the Neiman Foundation for Journalism at Harvard University.

"This is generally where much of the early action is in the space," says Bilton, "because these stories have a clear structure that can be easily parsed and replicated by software."

Bilton says that robot writers can provide a big competitive edge in fields such as financial reporting, where getting out information fast increases its value. "Investors pay them for access to information that will help them put their money in the right place," he says. "If those customers can get that information even a few seconds faster than their competition, that's a huge advantage, and core to the promise of automated stories."

Automated journalism isn't without its potential downsides. "It's easy to put complete trust in automated stories, where 'automation' in our minds gets equated with 'infallible,'" Bilton adds. "But we know that code can also make mistakes."

In June, for example, an uncorrected Y2K bug in U.S. Geological Survey software, led the agency to send out an erroneous alert about a 6.8 earthquake in California that actually had occurred in 1925. That mistake, in turn, led the Los Angeles Times' Quakebot algorithm to generate a web story and send out a tweet about the nonexistent quake. (The errors were quickly corrected.)

Additionally, Bilton questions whether bringing in robots to churn out copy will necessarily result in better news coverage. "One of the arguments for automated journalism — much like the argument for automation itself — is that leaving low-level rote stories to robots frees up reporters to focus on more enterprising work," he says. "That's clearly true on many levels. But I think that, with reporting, there's a case to be made for reporters doing that daily 'stock' reporting to help inform their more infrequent enterprising work. Those two things are two sides of the same coin, and it's not clear what happens if reporters hand over a share of that workload to automated tools."

He's also concerned that the spread of automated journalism eventually could wipe out more jobs in a newspaper profession where employment declined by 42 percent between 1990 and 2015. "I can't say for certain what percentage we're talking about, but this, too, seems like a sad inevitability," Bilton adds.