Data Management: a gentle introduction. Bas van Gils

Data Management: a gentle introduction - Bas van Gils


Скачать книгу
the terminology that is related to how data is stored in systems2. Systems typically have one or more data stores: parts of the system that are concerned with storing data. Defining different areas for storing data can be useful for different reasons such as privacy and security (a data store with privacy-sensitive data requires more security measures)3 or performance (data stores that are critical to the performance of a process may have extra computing power assigned to them).

      Data is stored in systems in various ways. By far the most common way to structure data in systems is through tables such that each row of the table maps to a record (see also section 4.5). More precisely put: the column headings of the table match the names of the fields in the record, and the intersection of rows and columns (the “cells” of the table) contain the individual data points. Example 11 builds on the previous example and illustrates these definitions.

      The diagram uses a model fragment to show how tables in a data store are defined. Modeling is an important part of DM. Data models – as well as other types of models – are explained in more detail in chapter 11.

       Example 11. Storing data in tables

      The lower part of the diagram is taken from the previous example and shows three person-records. However, this time each record also has a unique ID. The top part of the diagram shows the definition of what a typical record looks like. It shows that each record has four fields and also shows the data type. Last but not least, it shows whether a field is automatically generated or not.

      The example has two tables that are related through a dependency. These links between tables make it possible to answer questions such as “show me all orders where the customer was born before 1960”.

       Illustration

      The previous section discussed data from an IT perspective. In this section, I will switch gears and discuss data from a business (process) perspective. This is a major shift to another level of abstraction: rather than considering exactly how data is structured and stored in systems, this perspective is all about understanding which type of data is required to make processes run.

      One of the things that is key for good data management is that these business concepts are clearly defined. This often leads to the creating of a (business) glossary. The glossary is discussed in further details in chapters 10 and 28. By studying these definitions, it often becomes clear which business concepts are related. These relationships can be documented in a conceptual data model, which will be discussed in chapter 11 (see also section 4.4 on information/ data analysis).

      Example 12 illustrates the main points from this discussion.

       Example 12. Data in processes

      The diagram shows a single invoicing process which has an order as input and an invoice as output. These business concepts are related to each other, as well as to other business concepts. The solid arrows indicate these relationships. The labels on these relationships give an indication of how to interpret them.

Illustration

      The questions that remain are: how are business concepts stored in systems? How are the business and IT perspectives connected? When database systems became popular in the 1970s, a technique was developed to analyze and “normalize” data structures in an effective manner: the relational model [Cod70, Cod79, Dat12] (see also section 4.5). Around the same time, various modeling approaches were developed to visualize what these data structures should look like. Chief among them was the Entity Relationship Model [Che76]. The main idea behind this type of modeling approach is to analyze how business concepts should be structured in such a way that they can efficiently be stored in database systems. This level of analysis straddles the business and IT perspectives. Models at this level of abstraction are often called logical data models, something which will be discussed in more detail in chapter 11.

      What is relevant for purposes of this chapter is that business concepts and their relationships are transformed into a logical structure of data elements, which can be either entities or attributes of these entities. As with business concepts, entities can also be connected through relationships (hence the name Entity Relationship Diagram (ERD) that is frequently used). Example 13 explains this further.

       Example 13. Data elements

      The diagram shows four entities, each with several attributes. Even more, the entities are related and there is a verbalization attached to each relationship. Compare this diagram, which lists data elements to the diagram in example 12, which lists business concepts. The diagram with business concepts lists the things that business talks about. Apparently, order line is not something business stakeholders talk about, or else it would have shown up as a business concept. However, in order to store data in the system in an effective manner, the order line is needed as it stores the combination of products and required quantity for a specific order.

Illustration

      The goal of this chapter was to discuss base terminology in the field of data management. Important terms are business concept, data element, entity, attribute, table, column, field, and record. In addition to introducing important terminology, this chapter expanded on definitions with examples and created links to other chapters. By doing so, this chapter provides a basis for a consistent and complete framework for data management that can be used in practice.

Illustration
Скачать книгу