Google Inc. has (sort of) poached a University of Toronto professor whose area of expertise is helping computers understand common sense.
Geoffrey Hinton works in the area of “deep learning” networks, which refers to the difficult task of explaining context to machines.
“If I say a sentence like ‘I saw the Grand Canyon flying to Chicago,’ you know that doesn’t mean the Grand Canyon was flying to Chicago, because you know what kind of thing the Grand Canyon is,” Prof. Hinton said.
In effect, he and his students, Alex Krizhevsky and Ilya Sutskever, are researching how to build a similar kind of understanding digitally.
For Google, such research is of critical importance, as more people try to search using images and voice from their smartphones.
Currently, Google and other search engines do a relatively good job of helping someone find information when they know exactly what to search for. That’s in large part because they use a barrage of indicators to rank how relevant a web page is to a specific search query. Such indicators include the number of people who click on a site after entering a particular term, the number of other sites that link to a page and, in cases where timeliness is relevant – such as news stories – the date when the page was published.
But in areas where the user doesn’t know an exact definition, or is looking for concepts that are similar but don’t share common wording, search engines rarely perform adequately.
“One document might say the Maple Leafs lost, and the other says the Canucks were destroyed,” Prof. Hinton said. “There’s a lot of similarities in those docs, but they don’t share the same words.”
Jeff Dean, a Google Fellow in the company’s systems infrastructure group, said Prof. Hinton’s work has applications for voice- and image-based searches, two small but growing segments of Google’s overall search traffic. As more and more users send search queries by snapping a picture from or speaking to their smartphones, Google has spent more research dollars trying to figure out ways to automatically derive contextual clues from images and sound.
Such searches are more difficult to parse for a number of reasons. First, the computer must figure out what the person is saying, or what a picture actually represents. Computer-science researchers have been working on areas such as voice recognition for decades, but the computational power to store and learn from massive amounts of audio data is a fairly recent development.
After a computer determines what the person is asking for, it must then apply the kind of contextual clues that allow it to return a relevant result.
In a rarity for the search giant, which regularly lures academics to its Mountain View campus in California, Google is allowing Prof. Hinton to keep his day job. The computer scientist and two of his postdoctoral students will split their time between the university and Google headquarters in Mountain View. The company has also agreed to help fund the professor’s university research, but Prof. Hinton said the company will have no influence over the focus of his research.