The Informed Company. Dave Fowler

The Informed Company - Dave Fowler


Скачать книгу
complexity. In addition, we offer ways to establish an architecture with data integrity in mind. We provide modeling tool suggestions and an example SQL style guide. Finally, we give our recommendations for team structure, such as a lead to oversee this process and warehouse maintenance.

      Warehouse

      This stage is right if:

       More than a few people are going to be working with data.

       A clean source of truth would eliminate integrity issues.

       There's a need to adopt consistent structure on top of the data lakes.

       There's a need to adopt DRY principles.

      It's time for the next stage if:

       The democratization of data would help others explore and understand data without help.

       It's time to teach and enable business users to be more effective.

       Projects exist that require different formats than what currently exist in the data lake.

       Having truly informed employees is vital to your company's competitive success.

      Stage 4. Marts

      But given enough time, hundreds of tables accumulate in a warehouse. Users become overwhelmed when trying to find relevant data. It's also possible that, depending on the team, department, or use case, different people want to use the same data structured in different ways. So while the meanings of individual fields are unified, the abstractions used by different departments have diverged.

      To sort through these challenges, we progress to the data mart stage. These are smaller, more specific sources of truth for a team or topic of investigation. For example, the sales team may only need 12 or so tables from the central warehouse, while the marketing team may need 20 tables—some of them the same, but some different.

      Just as a warehouse lead manages data warehouses, data marts benefit from being facilitated by mart leads. The mart lead helps educate and communicate subject matter expertise within the domain of each respective mart while supporting everyday maintenance tasks. Not only will further simplification of data into local marts improve usability, but the integrity of data also becomes easier to maintain. After all, the responsibility of maintenance distributes to mart leads rather than to a single person. The organization that leverages data marts effectively is an example of intra‐company data literacy in action.

      Mart

      This stage is right if:

       The democratization of data would help others explore and understand data without help.

       It's time to teach and enable business users to be more effective.

       Projects exist that require different formats than what currently exists in the data lake.

       Having truly informed employees is vital to your company's competitive success.

      It's time for the next stage if:

       The data mart stage is the final stage. There can be any number of marts, and there can be multiple levels of marts if needed. After implementing this stage, data arrives in a complete, well‐architected, and governed stack that continually evolves to support an informed and competitive company.

image

      Source Stage Overview

      Over time, as sources accumulate more data, the number of data channels grows as well. It becomes even more challenging to manage data across separate sources.

      This section is about helping you work with sources. We talk about what sources are and what we can do with them. From there, we survey the tools commonly used to connect and investigate sources; for each example, we offer quick but tested thoughts on how these tools can be used by your team. We complete our discussion on sources by encouraging best practices on how to work with data during the source stage. We advocate for source replicas and streamlined data intelligence tools.

      This stage is ideal for new companies or teams with minimal data needs. It is inexpensive and relatively easy to tool, implement, and maintain. While it is exciting to build out a sophisticated data stack, it is not necessary before circumstances require it. Over-engineering is a costly mistake. However, the methods discussed in Chapters 1, 2, and 3 set the stage for your future data lake and data warehouse that arise as the scale and diversity of your source data proliferate.

      When starting from scratch, keep in mind the potential for the data to grow and the usability needs of users in the future. In the beginning, sources are their own islands separated from each other. Data streams remain in their own “silo.” When a data team is small, a collection of sources is easy to maintain and monitor. For example, to support new data teams, many data sources have their own built‐in dashboards and reporting capabilities (see Salesforce, Heap, and so on).

      While single‐source data isn't all that powerful, it's not at all useless. Some everyday use cases of solutions built from single‐source data include:

       Database queries that generate customer acquisition metrics.

       A dashboard that displays monthly sales featuring a downloadable spreadsheet.

       A custom web application that allows searching of referral traffic.

      As we explore data sources, remember that analysts can do much more for their teams than work with business intelligence tools, Excel, and simple queries. In time, they can build abstractions on top of the data that make data accessible to other colleagues and self‐servable (more on this in Stages 2 and 3).


Скачать книгу