Creating a global brain

Google? That's old school. Intelligent new Web 3.0 applications will revolutionize the way we interact with the world's data

Ken Hunt

Globe and Mail Update

The World Wide Web as we know it is a mess. There is no organizational scheme, no oversight committee, no editorial standards, nothing to limit its growth—just millions of people adding new documents and data all the time, wherever and whenever they please, to an ever-expanding pile.

And what a glorious mess it is! The democracy and freedom that reigns on the Web today has led to an explosion in creativity perhaps unrivalled in history. Never before have so many been able to publish so much so easily. Never have we had so much data available to us. Of course, this surfeit of information can make finding what you want a bit of a hassle, and things are becoming more difficult all the time as the signal-to-noise ratio continues to diminish.

Luckily, a growing community of entrepreneurs and scientists is helping us sift through all this data and, most importantly, bring context to it.
Right now, the information that we find on the Web is flat. We can search a company's website to find the name of its CEO, but from that point forward we have to remember that relationship ourselves. When we see that CEO's name mentioned on another website, perhaps funding some new business venture, we have the ability to put that information in context only because we know and understand who the person is. The next generation of intelligent web applications will be able to discover and remember relationships like that automatically. In effect, there will be a new layer of meaning on top of the information on the Web today. This evolutionary shift is commonly referred to as "Web 3.0," and it will revolutionize the way we interact with the world's data. This is the semantic Web, and its goal is nothing less than to create a global brain.

The visionary behind this is Sir Tim Berners-Lee, the same man who had the vision to combine hypertext with Internet protocols back in 1989, inventing the World Wide Web. Rather than going into business and potentially becoming one of the Web's billionaires, Berners-Lee has remained an academic and continues to seek ways to improve upon his invention. His vision (along with that of colleagues James Hendler and Ora Lassila) for a semantic Web was laid out in an article in Scientific American in May, 2001. Ever since then, semantic technologies and standards have slowly been maturing, and this year will be a major turning point as several new companies come online and finally start to turn Berners-Lee's vision into a reality.

Which of these companies will find a market and survive the competition is impossible to determine, but the fact that so many of them are finally bringing their versions online is encouraging. If these concepts catch on and start to gain acceptance, then 2008 might turn out to be the greatest year for information organization since 1876—the year a young assistant librarian at Amherst College named Melvil Dewey got so fed up with the disorganization that reigned in libraries that he devised a system for classifying and ordering books using a series of decimal numbers.

Collective Knowledge Systems
The real power of the semantic Web will be realized in applications that bring together people to add content, organize information and build connections between different kinds of data. These applications will build on the success of Web 2.0 social technologies and become more intelligent as their user bases grow.

Freebase The first major Web 3.0 application open to the general public is Freebase. Founded by artificial intelligence guru Danny Hillis—best known for developing the Connection Machine, the world's first massively parallel supercomputer, at MIT in the early 1980s—the goal of Freebase is to become "an open, shared database of the world's knowledge." It already covers more than three million topics, leveraging dozens of freely accessible databases from sources as diverse as the Securities and Exchange Commission archives, U.S. census data, and MusicBrainz's massive collection of information about bands and albums.

Freebase users can add their own databases to the system and create new connections between different types of data. It might be easiest to think of Freebase as a highly structured, database-powered Wikipedia. The main difference is that while Wikipedia stores information in the form of articles, Freebase stores specific facts and statistics. It also features an open application programming interface (API) that allows developers to build applications against these data sets and recombine them in interesting ways. A small company called Dipity is using this API to allow people to use the information in Freebase to create timelines of events, such as the life of Douglas Adams or the history of film noir. Another company, Archiportal, combines Freebase with Google Maps to allows users to search through the work of dozens of famous architects.

Twine Though still in an invite-only beta phase, Twine is generating more buzz than any other semantic web application. Like Freebase, Twine allows users to organize and find the connections between different kinds of information. Twine can be set up to automatically collect your e-mail or the web pages that you visit, as well as just about any other kind of data you want to feed into it. As it collects this information, Twine begins to organize it and look for connections, learning about the things that interest you. As Twine learns more about you, it starts to recommend people or topics that you might find interesting.

Twine is set up to be highly social, and it allows users to create connections between one another. If Freebase is like Wikipedia, then Twine might most easily be compared to Facebook. It not only builds connections between different kinds of data and understands your interests, but it also tries to understand your social or professional network so it can organize information in a manner that's of specific interest to you and your colleagues or friends. Founder Nova Spivak describes Twine as a "knowledge networking" application, and it will be particularly useful to teams that need to share and organize a great deal of information among themselves.

Anyone can request an invitation to become part of Twine's private beta as the company tests the system and improves its functionality. The site is expected to open to the public later this year.

Contextual browsing
The idea behind contextual browsing is that our browsers should be able to identify the topics we're looking at and automatically suggest other relevant pages. Rather than leaving a page or opening a new window to conduct a search to find more information, it allows you to explore topics in a more natural way. Two companies are focusing on building semantic concepts directly into the way we use the Web today.

AdaptiveBlue AdaptiveBlue has developed an add-on for the Firefox browser called BlueOrganizer. As you surf the Web with this plug-in, it tries to recognize the main subject that a page is discussing. It then creates shortcuts to other pages that cover the same topic. If, for example, you're looking at a page that discusses a movie, BlueOrganizer automatically recognizes the film and suggests links to more information on the Internet Movie Database, clips of the movie on YouTube and reviews on Metacritic, and lets you add the film automatically to your Amazon wish list.

ClearForest This company also uses a Firefox plug-in, but rather than identifying the main topic of a page, ClearForest's Gnosis software scours a page for every noun mentioned and organizes them into categories. It recognizes, for example, that Bill Gates is a person, that Toronto is a city, and that Yahoo is a company. Gnosis highlights those concepts on the page and points you to sources for further information. For Toronto, for instance, it will suggest looking the city up on Google Maps. For Bill Gates, it will direct you to his Wikipedia page, and for Yahoo it will recommend checking out the Reuters news website.

Natural Language Search
Search has always been the killer app of the Web, and the first company to get semantic searching right stands to become a major player, potentially even challenging Google for supremacy. Google searches by keyword: Its powerful algorithms essentially produce a statistical analysis of which web pages in its database best match the keywords entered in the search box. The results are then returned in an order that Google determines based on the perceived relevance of the website and its apparent level of authority on the Internet. Finding the exact information you want using keyword searching is an iterative process of choosing the right keywords, looking over results and refining the search.

Natural language searching, however, seeks to understand exactly what you are asking and return a specific answer. While keyword searches generally ignore prepositions and articles such as "in," "on," "of," "a" and "the," in natural language searching these words hold real meaning. To an actor, the phrase "parts in television" means something very different from "parts of a television," but keyword systems have a great deal of difficulty telling them apart. Keyword searches also aren't adept at understanding synonyms or homonyms. "Best movie of 2007" would return different results than "best film of 2007" or "best picture of 2007." "Soap," on the other hand, can return results about an XML messaging protocol, a combination of fat and lye that is useful in cleaning, or a groundbreaking 1970s sitcom starring Billy Crystal and Robert Guillaume. While keyword searches return a word-to-word match, natural language attempts to return a meaning-to-meaning match. Three companies are all using natural language processing to understand and search the Web.

Powerset Natural language search is an immensely difficult problem from a computational point of view. Not only do you have to deal with the massive amount of information on the Web, but you also have to contend with the subtleties of human language. Powerset is perhaps the company best equipped to deal with this complexity.

Founded by experts in natural language, artificial intelligence and computational linguistics, Powerset has licensed technology from Xerox's Palo Alto Research Center to serve as the computational backbone for its search engine. The company is still in invite-only beta phase, but demonstrations have shown that its method is adept at understanding such questions as "Who acquired PeopleSoft?" The results returned by this search highlight the word "Oracle" in news stories of the takeover. The system clearly recognizes "take over," "acquire," "purchase" and "buy" as synonyms, but perhaps more impressive is the fact that Powerset's search results manage to ignore stories of PeopleSoft acquiring smaller enterprise applications maker J.D. Edwards. This shows that it understands the major difference between the questions "Who did PeopleSoft acquire?" and "Who acquired PeopleSoft?" On a keyword basis, these two questions are practically identical.

True Knowledge This company, based in Cambridge, U.K., and led by experts in artificial intelligence from the Oxbridge community, is also still in an invite-only beta. True Knowledge is building a knowledge base built on a natural-language understanding of the Web. Rather than simply returning a list of search results, True Knowledge seeks to answer a question outright. On Google, the question "Is Jennifer Lopez single?" returns a hodgepodge of gossip from two marriages and other relationships. True Knowledge, meanwhile, produces a single one-word answer: "No." It then goes on to explain how it knows this, the relevant information being that "Jennifer Lopez has been married to Marc Anthony since the 5th of June, 2004." After that is a list of links where more information can be found.

Hakia While both True Knowledge and Powerset have dominated the buzz around natural search and raised a lot of capital, this smaller player has managed to beat them both to market. Hakia's public beta launched late last year and, while the technology on display is not as slick as the demos from Powerset and True Knowledge, Hakia does a good job of answering such questions as "What is the population of Florida?" and "What is the weather in Vancouver?"

It has also assembled "galleries" on thousands of topics that arrange search information into useful subcategories. The gallery for Canada, for example, includes basic information on population and geography, as well as links to news, history, major cities and tourist attractions. Hakia is also combining semantic technology with social tools. Any question can easily be turned into a discussion forum that those searching for the same information can find. It allows users with similar interests or problems to find each other quickly and combine their resources—and that, of course, is the essence of Web 3.0.

Join the Discussion:

Sorted by: Oldest first
  • Newest to Oldest
  • Oldest to Newest
  • Most thumbs-up

Latest Comments

Most Popular in The Globe and Mail