Join us
Analysis

Google and Facebook Have a News Labeling Problem

October 9, 2020
Photo: Adobe Stock

Sign up for The Media Today, CJR’s daily newsletter.

A version of this article appeared in the Tow Center’s weekly newsletter. To stay updated about the Tow Center’s work on how technology is changing journalism, subscribe here.

When the Institute for Nonprofit News was approached several years ago by a news organization targeting African American readers, executive director Sue Cross was intrigued. Although the organization did not pass the transparency of funding standards for INN membership, Cross followed their journalism closely over a period of months. While much of the reporting seemed solid, Cross noted “a pattern of stories emerging; particularly positive stories about coal and stories about how particular types of new energy have a racist impact.” The undisclosed money funding the site was clearly aligned with the interests of the fossil-fuel industry, but as Cross put it recently, “You had to read the output of a number of months to even detect it.”

Sites like this one were the subject of a recent Tow Center discussion on the phenomenon of covertly partisan money funding local news. Tow Center research into understanding how partisan online news networks operate ahead of elections revealed over a thousand politically backed sites cropping up across the US producing largely automatically generated stories. The Metric Media network at the center of the study is the largest, but by no means the only example of organizations that are serving lobbying or political interests by producing what appears to be local news content. As Cross pointed out, the lack of transparency around funding sources is designed to deceive readers by making it difficult to detect political or commercial motives. 

Of course, the phenomenon of partisan news is not new. Is the Metric Media network (which insists it is not partisan) substantially different from, say, Sinclair Media, which operates controversial right-leaning local TV news franchises, or blatantly partisan cable news channels like Fox News? In the UK, both left- and right-leaning press chains, from the Daily Mail to the Daily Mirror, have operated local newspapers for decades. Yet for online outlets, the facility of digital composition and the lack of clarity around funding pushes us to make tighter distinctions around what a “news source” might look like. And for third party platforms that aggregate news content like Google and Facebook, there is an increasing need to flag instances where news production becomes lobbying or advertising.

Facebook and Google classify publishers as news sources in different ways. Facebook relies on self-identification—meaning, effectively, anyone can register as a “Media/News Company.” Google, on the other hand, proactively identifies news sources by including them in Google News, and has made several statements about the need to support high-quality, original reporting in its search results. 

However, research by the Tow Center has found that despite clear guidelines about inclusion in Google News, standards for identifying outlets as “news sources” are inconsistently applied.  

Sign up for CJR’s daily email

In a January 2019 post on Google’s Official Webmaster Central, Public Liaison for Search Danny Sullivan outlined a list of best practices for websites looking to be officially classified as “news sources” on Google News. The blog post, entitled, “Ways to succeed in Google News,” offered several tips on how to properly format dates and bylines, how to structure data, and how to optimize headlines with SEO-friendly keywords. The article also provided information on what news sites should avoid in order to stay within the parameters of Google News’ guidelines. Sullivan warned publishers against duplicating content, citing that Google News “seeks to reward independent, original journalistic content by giving credit to the originating publisher, as both users and publishers would prefer.” For Google, duplicate content includes news that relies on scraping, rewriting, or republishing stories. In theory, this would mean that algorithmically generated news would not rank highly on Google News. Yet despite clear policies, research by the Tow Center found that a number of shadowy, politically backed “local news websites” are indexed as news sources by Google News. 

The organizational structure of the network of shadowy, politically backed “local news websites” designed to promote partisan talking points analyzed in a recent Tow Report

We sampled roughly one-third of over 1,200 sites identified in the Metric Media network, 90 percent of whose stories are automated. We found that the “news source” label was inconsistently applied in Google News. For example, only one of the 268 Metric Media sites surveyed on Google News was indexed as a news source, yet 35 of the 36 Local Government Information Services (LGIS) sites, also linked to Metric Media, were indexed. In total, 13 percent of the “news sites” sampled were found to be indexed as news sources by Google. 

Determining if a website is identified as a news source is straightforward. By using the search bar within Google News, some sites will appear with an italicized modifier “news source,” meaning they are indexed as news publications. For example:

The Carbondale Reporter is owned by LGIS, which is in turn owned by Dan Proft, a Republican politician and columnist in Illinois. 

Metric Media and LGIS websites often generate stories through the automated publishing of press releases or public-data feeds, meaning there is a high degree of replication within the story headlines across a number of publications:

According to the official guidelines of the Google News Initiative, the company uses seven standards to rate, rank, and categorize news: relevance, interests, location, prominence, authoritativeness, freshness, and usability. While those guidelines explicitly state that “Google does not make editorial decisions,” each of these categories is theoretically designed to prevent a rash of aggregated news when people attempt to search for timely, relevant, and accurate information on the platform. 

What’s more, in the 2019 article by Google that outlined publishers’ best practices, the site explicitly warned would-be news providers to avoid repurposing stories “without adding significant information or some other compelling reason for freshening.” 

It is unclear why LGIS sites are predominately labelled as news sources while Metric Media sites are not, although it could be related to the relatively recent genesis of the latter. (Most LGIS sites have been around since before 2016, whereas most Metric Media sites were created in the last 12 months.) What is clear, however, is that there is a lack of consistency in how Google indexes news sites, which might feed public confusion as to the provenance or value of the news that appears in search results. If the indexing of a publisher as a Google-certified news source translates to increased credibility and more prominent placement in search rankings, it is vital that Google gets it right.

Facebook has also struggled with properly labeling politically backed local news outlets like the ones described above. The company recently announced it was cracking down on news pages with “direct and meaningful ties” to political organizations, effectively removing their publisher privileges and ensuring their promoted posts will be identified as political advertising. As with Google, there are inherent advantages for publishers to be categorized as “news.” But scrutinizing these designations, given to tens of thousands of titles, is laborious and difficult. Facebook does not offer a public register or an API where researchers, journalists, or the public can easily see what is indexed as a “Media/News Company.”

With both Facebook and Google recently stepping up their efforts to financially support local news, it is reasonable to ask whether both companies could do more on their own sites to ensure credible local news sources are privileged over mysterious, politically backed websites that rely on low-cost automated story generation.

Has America ever needed a media defender more than now? Help us by joining CJR today.

Emily Bell and Sara Sheridan