The Googlization of Everything. Siva Vaidhyanathan
computers tend to take the string of text that users type into a box and scour their vast indexes of copies of Web pages for matches. Among the matches, each page is ranked instantly by a system that judges “relevance.” Google calls its ranking system PageRank: links rise to the top of the list of search results by attracting a large number of incoming links from other pages. The more significant or highly ranked a recommending page is, the more weight a link from it carries within the PageRank scoring system.17 Each website copied into Google’s servers thus carries with it a set of relative scores instantly calculated to place it in a particular place on a results page, and this ranking is presumed to reflect its relevance to the search query. Relevance thus tends to mean something akin to value, but it is a relative and contingent value, because relevance is also calculated in a way that is specific not just to the search itself but also to the search history of the user. For this reason, most Web search companies retain records of previous searches and note the geographic location of the user.
While this approach is standard, and works fairly well in most situations for most users, a number of search-engine companies have been working furiously to deepen the “thinking” that computers do when queried. Since 2008, we have seen the debut of a number of new search engines that offer a different way of searching and depend heavily on the ability to understand the context and purpose of the search query. And Google, understandably, refines and alters its search principles with regularity.
Cuil, which debuted ignominiously in 2008, was founded by a group of former Google employees. Its launch was marred by too much publicity and attention. The first users found the system terribly slow and fragile. Cuil boasts of searching a larger index of sources than either Google or Microsoft’s search engine, Bing. It also claims to be able to conduct rudimentary semantic analyses of the potential results pages to assess relevance better than the popularity method of PageRank. By the summer of 2009, Cuil delivered consistently good results to basic queries, but no one seemed to notice. Most importantly, Cuil pledged not to collect user data via logs or cookies, the small files with identifying information that Google and other search engines leave in every user’s Web browser, because it is more interested in what the potential results pages mean than what the user might think about. Cuil is a clever and innovative search service that has suffered from terrible business and public-relations decisions.18
In early 2009, the eccentric entrepreneur and scientist Stephan Wolfram released what he called a “computational knowledge engine,” Wolfram Alpha. By staging a series of small-scale demonstrations for the most elite Web thinkers in the United States, Wolfram was able to seed curiosity and attract attention for his service. Unlike a commercial search engine, Alpha is not so much designed to find pages and videos on the Web as to answer research questions by mining publicly available data sets. It does not even attempt to index Web sites. Its utility to users and advertisers, therefore, is narrow. But as a concept in knowledge management and discovery, it is potentially revolutionary. If you ask Alpha, “How many atoms are in a molecule of ammonia?” it will tell you the answer. It finds facts. It even generates facts, in a sense, by computing new information from different, distinct data sets. Wolfram Alpha is not intended to compete with Google in any way or in any market (although Google’s Web search can answer the same question by directing users to the top link: a page from Yahoo Answers!). However, if it succeeds, Alpha will remove a small set of scientific queries from the mass of Google searches. Google will hardly notice—unless it decides to adopt elements of Alpha technology for its own services. Wolfram Alpha is certain to serve as a useful experiment in the development of machine-based knowledge development. But it’s not for shopping.19 It won’t have anything like Google’s effect on people worldwide, and it, too, is designed to remain a clever resource but never to become a major player in general information or Web searching.
Currently, the major search engines do not “read” the query for meaning. They are purely navigational: they point. However, all the big search companies (and most of the small ones, as well) are working on what is known in the industry as “semantic search,” searches that take account of the contextual meaning of the search terms. For example, in 2001, if a user typed “What is the capital of Norway?” into Google, the results would have been a set of pages that included the string of text “What is the capital of Norway?” By contrast, a semantic search engine that reads what computer scientists and linguists call “natural language” can understand the patterns of human diction well enough to predict that a user expects the result of this search to be the answer to the question, not a set of pages asking the same question. To accomplish the goal of generating a natural-language or semantic search system, search companies need two things: brilliant thinkers in the areas of linguistics, logic, and computer science, and massive collections of human-produced language on which computers can conduct complex statistical analysis. Many companies have the former. Only Google, Yahoo, and Microsoft have the latter. Of those, Google leads the pack.
It’s no accident that Google has enthusiastically scanned and “read” millions of books from some of the world’s largest libraries. It wants to collect enough examples of grammar and diction in enough languages from enough places to generate the algorithms that can conduct natural-language searches. Google already deploys some elements of semantic analysis in its search process. PageRank is no longer flat and democratic. When I typed “What is the capital of Norway?” into Google in August 2010, the top result was “Oslo” from the Web Definitions site hosted by Princeton University. The second result was “Oslo” from Wikipedia.
One search company is trying to combine the two approaches, blending semantic search with community-based assessment of the quality of sources. By those standards, Hakia should be the best search engine in the world. Hakia specializes in medical information, and it invited medical professionals to help assess the value and validity of potential result sites. The results, however, are not clearly superior to Google’s. Hakia does place medical journal results higher in many searches.20 But a search for “IT band” on Google and Hakia conducted in July 2009 yielded excellent results on Google and inappropriate results on Hakia. Google directed me to sites such as the Mayo Clinic’s orthopedic pages, where I leaned about the malady known clinically as iliotibial band syndrome, which involves chronic tightness and pain in a band of connective tissue that runs from the hip to the knee. Hakia, supposedly specializing in medical searches, directed me to the Wikipedia site for the Band, the musical group that first gained international acclaim by backing up Bob Dylan in 1965 and 1966 and went on to deliver some of the greatest American music until it broke up in 1976.21
While Yahoo struggles to keep itself in the game, the two behemoths in the search-engine competition, Google and Microsoft, continue to battle each other, not just in the search-engine field, but increasingly across the whole domain of computer software and online services. In hopes of keeping Google off its guard, in June 2009 Microsoft released Bing, developed in a partnership with Yahoo, which is a completely revised version of its Live Search engine. To differentiate itself from Google, Microsoft has advertised Bing as a “decision engine” as opposed to a search engine. It specializes in searches about travel, shopping, health, and local knowledge. In other words, while Wolfram Alpha is experimenting with ways to peel off some searches from Google that concern factual data, Microsoft hopes to attract consumers. The advertisements Microsoft ran ridiculed Google for offering too much information when users just want to buy stuff. Early on, Bing seemed able to pry some users away from Yahoo but posed no major threat to Google in the U.S. search market.22
In July 2009, just after Microsoft announced Bing in an attempt to force Google to refocus on its core moneymaking activity—Web searches and the advertising they generate—Google countered by announcing the development of a light, clean operating system that would run on a small, cheap computer, a netbook. This operating system, to be known as Chrome OS (just like the Web browser Chrome), would simply run a browser—like Chrome, for instance. It would facilitate Web-based services, thus pushing more users away from bulky, expensive, poorly designed programs such as Microsoft Windows and Office and toward programs that operate via the Web (“in the cloud”), such as Google Docs. Realistically, Google’s initiative is no short-term or direct threat to Microsoft’s dominance in the personal computer software market. But over time it could chip away at new markets in the developing world that are much more price sensitive and whose