Semantic Web for the Working Ontologist. Dean Allemang
the Web of data with very different values from the involved parties. There will always be multiple ontologies and diverging statements, just as there will always be multiple web pages on any given topic. One of the features of the Web that has made it so successful, and so different from anything that came before it, is that it, from the start, has always allowed all these multiple viewpoints to coexist.
To each their own
How can the Web architecture support this sort of variation of opinion? That is, how can two people say different things about the same topic? There are two approaches to this issue. First, we have to talk a bit about how one can make any statement at all in a web context.
The IAU can make a statement in plain English about Pluto, such as “Pluto is a dwarf planet,” but such a statement is fraught with all the ambiguities and contextual dependencies inherent in natural language. We think we know what “Pluto” refers to, but how about “dwarf planet”? Is there any possibility that someone might disagree on what a “dwarf planet” is? How can we even discuss such things?
The first requirement for making statements on a global web is to have a global way of identifying the entities we are talking about. We need to be able to refer to “the notion of Pluto as used by the IAU” and “the notion of Pluto as used by the American Federation of Astrologers” if we even want to be able to discuss whether the two organizations are referring to the same thing by these names.
In addition to Pluto, another object was also classified as a “dwarf planet.” This object is sometimes known as UB313 and sometimes known by the name Xena. How can we say that the object known to the IAU as UB313 is the same object that its discoverer Michael Brown calls “Xena”?
One way to do this would be to have a global arbiter of names decide how to refer to the object. Then Brown and the IAU can both refer to that “official” name and say that they use a private “nickname” for it. Of course, the IAU itself is a good candidate for such a body, but the process to name the object took over two years. Coming up with good, agreed-on global names is not always easy business.
On the Web, we name things with URIs. The URI standard provides rules to mint identifiers for anything around us. The most common form of URIs are the URLs commonly called Web addresses (for example, http://www.inria.fr/
) that locate a specific resource on the Web. In the absence of an agreement, different Web authors will select different URIs for the same real-world resource. Brown’s Xena is IAU’s UB313. When information from these different sources is brought together in the distributed network of data, the Web infrastructure has no way of knowing that these need to be treated as the same entity. The flip side of this is that we cannot assume that just because two URIs are distinct, they refer to distinct resources. This feature of the Semantic Web is called the Nonunique Naming Assumption; that is, we have to assume (until told otherwise) that some Web resource might be referred to using different names by different people. It’s also crucial to note that there are times when unique names might be nice, but it may be impossible. Some other organization than the IAU, for example, might decide they are unwilling to accept the new nomenclature.
There’s always one more
In a distributed network of information, as a rule we cannot assume at any time that we have seen all the information in the network, or even that we know everything that has been asserted about one single topic. This is evident in the history of Pluto and UB313. For many years, it was sufficient to say that a planet was defined as “any object of a particular size orbiting the sun.” Given the information available during that time, it was easy to say that there were nine planets around the sun. But the new information about UB313 changed that; if a planet is defined to be any body that orbits the sun of a particular size, then UB313 had to be considered a planet, too. Careful speakers in the late twentieth century, of course, spoke of the “known” planets, since they were aware that another planet was not only possible but even suspected (the so-called “Planet X,” which stood in for the unknown but suspected planet for many years).
The same situation holds for the Semantic Web. Not only might new information be discovered at any time (as is the case in solar system astronomy), but, because of the networked nature of the Web, at any one time a particular server that holds some unique information might be unavailable. For this reason, on the Semantic Web, we can rarely conclude things like “there are nine planets,” since we don’t know what new information might come to light.
In general, this aspect of a Web has a subtle but profound impact on how we draw conclusions from the information we have. It forces us to consider the Web as an Open World and to treat it using the Open World Assumption. An Open World in this sense is one in which we must assume at any time that new information could come to light, and we may draw no conclusions that rely on assuming that the information available at any one point is all the information available.
For many applications, the Open World Assumption makes no difference; if we draw a map of all the Mongotel hotels in Boston, we get a map of all the ones we know of at the time. The fact that Mongotel might have more hotels in Boston (or might open a new one) does not invalidate the fact that it has the ones it already lists. In fact, for a great deal of Semantic Web applications, we can ignore the Open World Assumption and simply understand that a semantic application, like any other web page, is simply reporting on the information it was able to access at one time.
The openness of the Web only becomes an issue when we want to draw conclusions based on distributed data. If we want to place Boston in the list of cities that are not served by Mongotel (for example, as part of a market study of new places to target Mongotels), then we cannot assume that just because we haven’t found a Mongotel listing in Boston, no such hotel exists.
As we shall see in the following chapters, the Semantic Web includes features that correspond to all the ways of working with Open Worlds that we have seen in the real world. We can draw conclusions about missing Mongotels if we say that some list is a comprehensive list of all Mongotels. We can have an anonymous “Planet X” stand in for an unknown but anticipated entity. These techniques allow us to cope with the Open World Assumption in the Semantic Web, just as they do in the Open World of human knowledge.
In contrast to the Open World Assumption, most data systems operate under the Closed World Assumption, that is, if we are missing some data in a document or a record, then that data is simply not available. In many situations (such as when evaluating documents that have a set format or records that conform to a particular database schema), the Closed World Assumption is appropriate. The Semantic Web standards have provisions for working with the Closed World Assumption when it is appropriate.
The nonunique name of the Semantic Web
One problem the first time you discover linked data on the Web and Semantic Web is that this evolution of the Web is perceived and presented under different names, each name insisting on a different facet of the overall architecture of this evolution. In the title of this book, we refer to the Semantic Web, emphasizing the importance of meaning to data sharing. The Semantic Web is known by many other names. The name “Web of data” refers to the opportunity now available on the Web to open silos of data of all sizes, from the small dataset of a personal hotel list up to immense astronomic databases, and to exchange, connect, and combine them on the Web according to our needs. The name “linked data” refers to the fact that we can use the Web addressing and linking capabilities to link data pieces inside and between datasets across the Web much in the same way we reference and link Web pages on the hypertext Web. Only this time, because we are dealing with structured data, applications can process these data and follow the links to discover new data in many more automated ways. The name “linked open data” focuses on the opportunity to exploit open data from the Web in our applications and the high benefit there is in using and reusing URIs to join assertions from different sources. This name also reminds us that linked data are not necessarily open and that all the techniques we are introducing here can also be used in private spaces (intranets, intrawebs, extranets, etc.). In an enterprise, we often refer to a “Knowledge Graph,” which is specific to that enterprise, but can include any information that the enterprise needs to track (including information