Ontology Engineering. Elisa F. Kendall
a bit academic, we believe that by the time you finish reading this book, you’ll understand what it means and how to use it. It is, in fact, the most terse and most precise definition of ontology that we have encountered. Having said this, some people may find a more operational definition helpful:
“An ontology is a formal, explicit description of concepts in a domain of discourse (classes (sometimes called concepts)), properties of each concept describing various features and attributes of the concept (slots (sometimes called roles or properties)), and restrictions on slots (facets (sometimes called role restrictions)).” (Noy and McGuinness, 2001)
The most common term for the discipline of ontology engineering is “knowledge engineering,” as defined by John Sowa years ago:
“Knowledge engineering is the application of logic and ontology to the task of building computable models of some domain for some purpose.” (Sowa, 1999)
Any knowledge engineering activity absolutely must be grounded in a domain and must be driven by requirements. We will repeat this theme throughout the book and hope that the “of some domain for some purpose” part of John’s definition will compel our readers to specify the context and use cases for every ontology project you undertake. Examples of what we mean by context and use cases will be scattered throughout the sections that follow, and will be covered in depth in Chapter 3.
Here are a few other classic definitions and quotes that may be useful as we consider how to model knowledge and then reason with that knowledge:
“Artificial Intelligence can be viewed as the study of intelligent behavior achieved through computational means. Knowledge Representation then is the part of AI that is concerned with how an agent uses what it knows in deciding what to do.” (Brachman and Levesque, 2004)
“Knowledge representation means that knowledge is formalized in a symbolic form, that is, to find a symbolic expression that can be interpreted.” (Klein and Methlie, 1995)
“The task of classifying all the words of language, or what’s the same thing, all the ideas that seek expression, is the most stupendous of logical tasks. Anybody but the most accomplished logician must break down in it utterly; and even for the strongest man, it is the severest possible tax on the logical equipment and faculty.” (Charles Sanders Peirce, in a letter to editor B. E. Smith of the Century Dictionary)
Our own definition of ontology is based on applied experience over the last 25–30 years of working in the field, and stems from a combination of cognitive science, computer science, enterprise architecture, and formal linguistics perspectives.
An ontology specifies a rich description of the relevant to a particular domain or area of interest.
• terminology, concepts, nomenclature;
• relationships among and between concepts and individuals; and
• sentences distinguishing concepts, refining definitions and relationships (constraints, restrictions, regular expressions)
Figure 1.1: Ontology definition and expressivity spectrum.
Figure 1.1 provides an abbreviated view of what we, and many colleagues, call the “ontology spectrum”—the range of models of information that practitioners commonly refer to as ontologies. It covers models that may be as simple as an acronym list, index, catalog, or glossary, or as expressive as a set of micro theories supporting sophisticated analytics.
The spectrum was developed during preparation for a panel discussion in 1999 at an Association for the Advancement of Artificial Intelligence (AAAI) conference, where a number of well-known researchers in the field attempted to arrive at a consensus on a definition of ontology. This spectrum is described in detail in McGuinness, Ontologies Come of Age (2003). We believe that an ontology can add value when defined at any level along the spectrum, which is usually determined by business or application requirements. Most of the ontologies we have developed, whether conceptual or application oriented, include at least a formal “is-a” or subclass hierarchy, and often additional expressions, such as restrictions on the number or type of values for a property, (i.e., they fall to the right of the red “squiggle” in the diagram).
Regardless of the level of expressivity and whether the ontology is conceptual in nature or application focused, we expect that an ontology will be: (1) encoded formally in a declarative knowledge representation language; (2) syntactically well-formed for the language, as verified by an appropriate syntax checker or parser; (3) logically consistent, as verified by a language-appropriate reasoner or theorem prover; and (4) will meet business or application requirements as demonstrated through extensive testing. The process of evaluating and testing an ontology is both science and art, with increasingly sophisticated methods available in commercial tools, but because no “one size fits all,” we typically need multiple tools to fully vet most ontologies. We will discuss some of the more practical and more readily available approaches to ontology evaluation in later chapters of this book.
1.2 LOGIC AND ONTOLOGICAL COMMITMENT
The primary reason for developing an ontology is to make the meaning of a set of concepts, terms, and relationships explicit, so that both humans and machines can understand what those concepts mean. The level of precision, breadth, depth, and expressivity encoded in a given ontology depends on the application: search applications over linked data tend to require broader ontologies and tolerate less precision than those that support data interoperability; some machine learning and natural language processing applications require more depth than others. Ontologies that are intended to be used as business vocabularies or to support data governance and interoperability require more metadata, including clearly stated definitions, provenance, and pedigree, as well as explanatory notes and other usage information than machine learning applications may need. The foundation for the machine-interpretable aspects of knowledge representation lies in a combination of set theory and formal logic. The basis for the metadata stems from library science and terminology work, which we discuss in Chapter 4.
Most people who are interested in knowledge representation took a course in logic at some point, either from a philosophical, mathematical, or linguistics perspective. Many of us also have basic knowledge of set theory, and can draw Venn diagrams showing set intersection when needed, but a little refresher may be helpful.
Logic can be more difficult to read than English, but is clearly more precise:
(forall ((x FloweringPlant))
(exists ((y Bloom)(z BloomColor))(and (hasPart x y)(hasCharacteristic y z))) )
Translation: Every flowering plant has a bloom which is a part of it, and which has a characteristic bloom color.
Language: Common Logic, CLIF syntax (ISO/IEC 24707:2018, 2018)
Logic is a simple language with few basic symbols. The level of detail depends on the choice of predicates made by the ontologist (e.g., FloweringPlant, hasPart, hasCharacteristic, in the logic, above); these predicates represent an ontology of the relevant concepts in the domain.
1.3 ONTOLOGY-BASED CAPABILITIES
An ontology defines the vocabulary that may be used to specify queries and assertions for use by independently developed resources, processes, and applications. “Ontological commitments are agreements to use a shared vocabulary in a coherent and consistent manner.”1 Agreements can be specified as formal ontologies, or ontologies with additional rules, to enforce the policies stated in those agreements. The meaning of the concepts included in the agreements can be defined precisely and unambiguously, sufficient to support machine interpretation of the assertions. By composing or mapping the terms