Ontology Engineering. Elisa F. Kendall
to analysis. It is essential to know which information sources contributed what to your results, particularly for reconcilliation and understanding when there are multiple sources involved and those sources of information differ. Most large companies have multiple databases, for example, containing customer and account information. In some cases there will be a “master” or “golden” source, with other databases considered either derivative or “not as golden”—meaning, that the data in those source databases is not as reliable. If information comes from outside of an organization, reliability will depend on the publisher and the recency of the content, among other factors.
Some of the kinds of provenance information that have proven most important for interpreting and using the information inferred by the reasoner include:
• identifying the information sources that were used (source);
• understanding how recently they were updated (currency);
• having an idea regarding how reliable these sources are (authoritativeness); and
• knowing whether the information was directly available or derived, and if derived, how (method of reasoning).
The methods used to explain why a reasoner reached a particular conclusion include explanation generation and proof specification. We will provide guidance in some depth on metadata to support provenance, and on explanations in general in the chapters that follow.
1 http://www-ksl.stanford.edu/kst/what-is-an-ontology.html.
2 Extensible Markup Language (XML), see http://www.w3.org/standards/xml/core.
3 The Resource Description Framework (RDF) Vocabulary Description Language (RDF Schema), available at https://www.w3.org/RDF/.
4 See https://schema.org/ for more information.
5 Structured Query Language, see https://docs.microsoft.com/en-us/sql/odbc/reference/structured-query-language-sql?view=sql-server-2017.
6 SPARQL 1.1 Query Language, available at https://www.w3.org/TR/sparql11-overview/.
7 The Rule Mark-up Initiative, see http://wiki.ruleml.org/index.php/RuleML_Home.
8 Jess, the Java Expert System Shell and scripting language, see https://herzberg.ca.sandia.gov/docs/52/.
9 FLORA-2: Knowledge Representation and Reasoning with Objects, Actions, and Defaults, see http://flora.sourceforge.net/.
10 For more information on general first-order logics and their use in ontology development, see Sowa (1999) and ISO/IEC 24707:2018 (2018).
11 For more information on description logics, KR and reasoning, see Baader et al. (2003) and Brachman and Levesque (2004).
CHAPTER 2
Before You Begin
In this chapter we provide an introduction to domain analysis and conceptual modeling, discuss some of the methods used to evaluate ontologies for reusability and fit for purpose, identify some common patterns, and give some high-level analysis considerations for language selection when starting a knowledge representation project.
2.1 DOMAIN ANALYSIS
Domain analysis involves the systematic development of a model of some area of interest for a particular purpose. The analysis process, including the specific methodology and level of effort, depends on the context of the work, including the requirements and use cases relevant to the project, as well as the target set of deliverables. Typical approaches range from brainstorming and highlevel diagramming, such as mind mapping, to detailed, collaborative knowledge and information modeling supported by extensive testing for more formal knowledge engineering projects. The tools that people use for this purpose are equally diverse, from free or inexpensive brainstorming tools to sophisticated ontology and software model development environments. The most common capabilities of these kinds of tools include:
• “drawing a picture” that includes concepts and relationships between them, and
• producing sharable artifacts, that vary depending on the tool—often including web sharable drawings.
Analysis for ontology development leverages domain analysis approaches from several related fields. In a software or data engineering context, domain analysis may involve a review of existing software, repositories, and services to find commonality and to develop a higher-level model for use in re-engineering or to facilitate integration (de Champeaux, Lea, and Faure, 1993; Kang et al., 1990). In an artificial intelligence and knowledge representation context, the focus is on defining structural concepts, their organization into taxonomies, developing individual instances of these concepts, and determining key inferences for subsumption and classification for example, as in Brachman et al. (1991b) and Borgida and Brachman (2003). From a business architecture perspective, domain analysis may result in a model that provides wider context for process re-engineering, including the identification of core competencies, value streams, and critical challenges of an organization, resulting in a common view of its capabilities for various purposes (Ulrich and McWhorter, 2011). In library and information science (LIS), domain analysis involves studying a broad range of information related to the knowledge domain, with an aim of organizing that knowledge as appropriate for the discourse community (Hjørland and Albrechtsen, 1995). Domain analysis to support ontology development takes inspiration from all of the above, starting from the knowledge representation community viewpoint and leveraging aspects of each of the others as well as from the terminology community (ISO 704:2009, 2009).
The fact that the techniques we use are cross-disciplinary makes the work easier for people from any of these communities to recognize aspects of it and dive in. At the same time, this cross-disciplinary nature may make the work more difficult to understand and master, involving unfamiliar and sometimes counterintuitive methods for practitioners coming from a specific perspective and experience base. Some of the most common disconnects occur when software or data engineers make assumptions about representation of relationships between concepts, which are first class citizens in ontologies, but not in some other modeling paradigms such as entity relationship diagramming (Chen, 1976) or the Unified Modeling Language, Version 2.5.1 (2017). Java programmers, for example, sometimes have difficulty understanding inheritance—some programmers take short cuts, collecting attributes into a class and “inheriting from it” or reusing it when those attributes are needed, which may not result in a true is-a hierarchy. Subject matter experts and analysis who are not fluent in logic or the behavior of inference engines may make other mistakes initially in encoding. Typically, they discover that something isn’t quite right because the results obtained after querying or reasoning over some set of model constructs are not what they expected. Although there may be many reasons for this, at the end of the day, the reasoners and query engines only act as instructed. Often the remedy involves modeling concepts and relationships more carefully from the domain or business perspective, rather than from a technical view that reflects a given set of technologies, databases, tagging systems, or software language.
2.2 MODELING AND LEVELS OF ABSTRACTION
Sometimes it helps people who are new to knowledge representation to provide a high-level view of where ontologies typically “play” in a more traditional modeling strategy. Figure