Data Management: a gentle introduction. Bas van Gils
of its sub-codes.
a The issue of data quality dimensions such as validity (is a value allowed according to some criteria) versus correctness (is it a true representation of the real world) is part of the discussion in chapter 16.
Reference data may seem like a really simple and straightforward concept yet in practice this is hardly the case. In chapter 14, I will discuss the relevant theory in more detail. Also note that reference data tends to be static. Using reference data in real-world situations will be discussed in more detail in the examples in part II in this book.
■ 8.7 METADATA
The fifth and last type of data that I will discuss is metadata. Loosely defined, metadata is “data about data”. Anything you can know about your data is metadata. Through metadata you can answer questions such as: what is the definition of “customer”? In which processes do we create customer data? How does customer data flow through our information systems? The list goes on and on. As an organization you can (and perhaps should) collect metadata about all other types of data. Having a good set of metadata available is foundational for managing and governing your data. Metadata is discussed in more detail in chapter 10.
■ 8.8 VISUAL SUMMARY
Synopsis - In this chapter, I introduce the topic of data governance. Data governance is the capability that deals with accountability for data. I will first position data governance in relation to (other) data management (activities). Then I will provide an overview of key data governance themes based on the Data Management Body of Knowledge (DMBOK) [Hen17]. Last but not least, I will give an overview of a modern approach to data governance based on three key roles: data owners, data users, and data stewards.
■ 9.1 INTRODUCTION
The word governance, or its associated verb to govern, has many definitions and interpretations, depending on the context in which it is used. Many people seem to associate this word with (the use of) power; with laying down and enforcing the law. This view is indeed close to the Merriam-Webster Dictionary definition which uses phrases such as “to exercise continuous sovereign authority over” and “to control, direct, or strongly influence the actions and conduct of”. The DMBOK defines data governance as follows [Hen17]:
Data governance is defined as the exercise of authority and control (planning, monitoring, and enforcement) over the management of data assets.
This definition screams a command-and-control, top-down approach to governance: make plans, define rules, implement, enforce, and punish when the rules are not followed. This isn’t the only way to implement data governance, though. In this chapter, I will first show how to position data governance in relation to data management. I will then follow-up with a discussion of the data governance activities as listed in the DMBOK and a discussion of a modern approach to data governance through data stewards, data owners, and data users. I will end the chapter with a brief discussion of the relationship between data governance and other governance processes that may be followed in the organization.
■ 9.2 DATA GOVERNANCE AND DATA MANAGEMENT
Looking closely at the definition of data governance from the DMBOK, it becomes clear that there is a relationship between data governance (DG) and data management (DM). This relationship – also pointed out by John Ladley in [Lad12] – is highlighted in figure 9.1 which was taken from the DMBOK. The idea is straightforward and not unlike the separation of powers in modern day (western) politics1: separate decision-making and oversight (DG) from the actual execution of DM activities. In my view, this has several implications.
Figure 9.1 Data Governance & Data Management (Taken from [Hen17])
First of all, DG is not so much about governing data (which are innate) but more about governing the people who handle data. In other words, it is about deciding what people can and can’t do with data, as well as ensuring that there are guard rails in place to make that happen. Whether this happens in a top-down fashion (define the policy, analyze implications, implement the policy) or in a bottom-up fashion (capture good practices from across the organization in a policy and arrange for sign-off) is a whole different matter.
A second implication deals with the type of decisions to be made: strategic, tactical, and operational. Example 20 illustrates different types of DG decisions that organizations deal with.
Example 20. Data governance decisions
Strategic decisions Setting up a data strategy is a prime example of a strategic decision. This entails questions such as: how and where do we want to create value with data? How does our business model evolve when we leverage data as a key asset? Are we going to let business units control their own data, or are we trying to achieve synergies between business units? Another example is the development of a data management strategy to complement the data strategy. Relevant questions here are: how good should our data management capability be? Are we going to centralize or decentralize certain data management functions?
Tactical decisions Setting up governance structures, appointing people in DM/ DG roles, and approving policies are good examples of tactical decisions. These types of decisions bridge the gap between the strategic and operational levels.
Operational decisions Approval of definitions of business concepts, dealing with conflicting definitions or data quality requirements, and sign-off on data quality improvement initiatives are good examples of operational decisions. The focus here is on decision-making about the operational data management activities.
Let’s examine these examples from the perspective of the DMBOK wheel as shown in figure 7.1. There is a reason that DG is in the center of the wheel: decision-making is something that is required for all capabilities in the wheel.
■ 9.3 DATA GOVERNANCE ACTIVITIES IN DMBOK
If DG is all about decision-making then the question is: what do we make decisions about? The previous example gave some suggestions. To give a more formal answer I will briefly discuss several governance topics that are listed by the DMBOK. This is by no means a complete summary of the DMBOK, nor is it intended to be. Instead, I am trying to give a broad enough overview to provide you with an understanding of what DG is all about.
One of the key topics is to define the organizational structure for DG in the form of steering committees, boards, and different roles in the organization. This is closely related to the operating model type, which helps to decide which activities are carried out and where. The main models that are listed are: centralized DG, replicating the DG structure across