Every year, Google.com makes hundreds of changes to improve the computer code of its search engine, but in an attempt to combat the scourge of fake news and offensive content, its engineers are beginning to collect data from a new source: humans.
"It's become very apparent that a small set of queries in our daily traffic (around 0.25 per cent), have been returning offensive or clearly misleading content," writes Ben Gomes, vice-president of engineering for Google, in a blog post outlining some policy changes that will seek more user feedback in an effort to clean up some of the scandals related to automatically generated sections of its search results.
Google's troubles with offensive content have been popping up with more frequency in recent months. In October, 2016, users noticed Google would sometimes autocomplete the phrase "are jews …" with the word "evil." After a public outcry the company made changes to remove the offending lines, adding more scrutiny in its algorithm to so-called "sensitive" topics. But even after the fixes, its search engine still regularly turns up offensive results.
For example, right now, users who type "are black" into a Google search bar might see autocomplete suggestions such as "are black people smart," which leads to a search page topped by a story about the offensiveness of that autocomplete suggestion, followed by a Fox News article claiming a DNA connection to intelligence and a fourth article with the headline: "Black people aren't human." That last article is from an organization called National Vanguard, which is identified as a U.S. neo-Nazi white nationalist splinter group by the Southern Poverty Law Centre.
To combat the problem, Google is giving regular users a new "report" button on its search-bar autocomplete feature so people can more easily alert Google to problematic results. A similar button will be added to the "featured snippets" section of its results pages. Autocomplete and featured snippets – previews of search results – have both been the subject of controversies that involved the promotion of conspiracy theories, fake news and racist slurs on the hugely popular website.
After Tuesday, a user who spots an offensive autocomplete result will be able to flag it for Google's engineers to review.
But even these high-profile anecdotes don't capture the scale of the problem Google faces. The company doesn't say how many searches a day it processes; it simply says it processes "trillions" of search requests a year. So while one-quarter of 1 per cent of bad content might be a good result for almost any other enterprise, Google could be responding to many billions of user requests a year with these "low quality" results. Small for Google is still a potential avalanche of unwelcome content for users.
Mr. Gomes explained that content promoting hate is also being given the lowest possible search weighting, and an increased importance will be given to "high-quality" sources of information, particularly on sensitive topics. The process of sifting through search results involves a mix of algorithm and human-curation efforts.
For instance, Google has seen posts containing Holocaust-denying falsehoods ranking high in its searches – an absurd condition when there is excellent scholarship and documentation of the horrors of the Holocaust available online.
Google is also releasing more details about its human "raters" – a hand-picked group of 10,000 users who already give Google feedback on the hundreds of tests it runs to improve search results. In addition to this extra monitoring, Mr. Gomes and Google believe making feedback tools easier to find for the rest of the general public could expand the effectiveness of this human-curation effort.
Mr. Gomes and Pandu Nayak (a Google research fellow in search quality) said on Monday that some of Google's problems come from users trying to game the system to gain a higher ranking for their content (which can lead to more ad dollars, among other effects). The company blog described "low-quality 'content farms,' hidden text and other deceptive practices," among the tactics. In that environment, Google's challenge is to guard against abuse of the new feedback buttons. For instance, if Google guaranteed that flagging content would remove a search result, unethical users could wield a "banhammer" to block content they didn't like or in order to favour their own content.
"There is likely to be [helpful] signal in there, even through all the noise through abuse," Mr. Nayak says. "We don't expect the problem will completely disappear."
Content problems with Google's featured snippets may even be more serious. According to the search engine optimizing service MozCast, currently about 15 per cent of Google searches return a result including a featured snippet, which on Google.com just looks like a text box – one of many results – off to the right side. However, if you searched using one of Google's voice-assistant or smart-home products and a snippet was returned, the context of the other results on a Web page is missing and the service would read the snippet aloud as if that was the one true answer.
In recent months, users have posted videos of a "smart device" responding with answers sponsored by racist or conspiratorial sites, such as false claims that former president Barack Obama was plotting a coup, or allegations that four U.S. presidents had been members of the Ku Klux Klan (there is little evidence to suggest any U.S. president was an active or former KKK member).
"There are people who are writing all kinds of things on the Web," says Mr. Gomes, who added that one issue Google is having is finding high-quality sources to promote in place of some of the more explicit content. "Journalists are not covering some of these conspiracy theories."
As Mr. Gomes points out in his blog post, it's both a strength and a limitation of its service that Google doesn't create its own content.
The trouble is that some users go to Google to find others who agree with the latest reality-challenged statement from their preferred political leader … Other users who find the same counterfactual content might assume the world -+ or Google – has gone mad.
"The content that appears in these features is generated algorithmically and is a reflection of what people are searching for and what's available on the Web. This can sometimes lead to results that are unexpected," Mr. Gomes writes.
The changes began rolling out in the U.S. and international markets today.