The Informed Company. Dave Fowler
or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762‐2974, outside the United States at (317) 572‐3993, or fax (317) 572‐4002.
Wiley publishes in a variety of print and electronic formats and by print‐on‐demand. Some material included with standard print versions of this book may not be included in e‐books or in print‐on‐demand. If this book refers to media such as a CD or DVD that is not included in the version you purchased, you may download this material at http://booksupport.wiley.com. For more information about Wiley products, visit www.wiley.com.
Library of Congress Cataloging‐in‐Publication Data
Names: Fowler, Dave (Computer scientist), author. | Matt David, author.
Title: The informed company : how to build modern agile data stacks that drive winning insights / Dave Fowler, Matt David.
Description: Hoboken, New Jersey : Wiley, [2022] | Includes index.
Identifiers: LCCN 2021028324 (print) | LCCN 2021028325 (ebook) | ISBN 9781119748007 (paperback) | ISBN 9781119748021 (adobe pdf) | ISBN 9781119748014 (epub)
Subjects: LCSH: Data structures (Computer science) | Big data. | Cloud computing.
Classification: LCC QA76.9.D35 F69 2022 (print) | LCC QA76.9.D35 (ebook) | DDC 005.7/3—dc23
LC record available at https://lccn.loc.gov/2021028324
LC ebook record available at https://lccn.loc.gov/2021028325
Cover image: © Neo Geometric/Shutterstock
Cover design: Wiley
To my mother who continues to be my most supportive and patient teacher. As a software engineer you taught me to code for my sixth‐grade science project. Today as a Data Analyst you helped a 38‐year‐old me in discussions and edits of this book. Thank you for always supporting and encouraging my curiosities and for all your love.
— Dave Fowler
I dedicate this book to my Mom, an educator who is fueled by helping others learn. Thank you for always believing in me and being an example of how much you can affect other people’s lives.
— Matt David
About This Book
Why Write This Book
Most comprehensive books on analytics architecture that we've found are over a decade old, most of them pre‐cloud. Because there really isn't a modern equivalent to Kimball's seminal The Data Warehouse Toolkit, today's data teams have to reinvent the principles of building a data stack. Too often, they do this without guidance. To solve this problem, we have created a best‐practices guide for bootstrapping and nurturing a technologically current data warehouse.
Who This Book Is For
We wrote this book for whoever values data and believes that informed companies are competitive. It's a book for the working professional who is creating a practical, modern data stack. It's for the lone analyst or the professional embedded in a team. It's for anyone interested in what design practices underlie robust data architecture, the kind that equips entire companies with business intelligence insights. At its heart, this book is written with collaboration in mind (Figure A.1).
Figure A.1 Data management is a collaborative process.
Who This Book Is Not For
This book is not written for “big data” professionals. To be clear, even large corporations like Doordash, Discord, and the owners of The Financial Times and The New York Times (all previous customers of ours) do not qualify as big data companies. As a rule of thumb, the big data label applies to data architectures with raw input that exceeds 100 GB per day.
No doubt, many elements of this text map onto the big data workflow, especially since warehouses support all sorts of tables, not just, say, event streams. However, our aim is to focus on the central pillars of a modern data stack, so that the widest set of readers can readily benefit from the information herein. In this spirit, we forgo recommendations for mega‐scale architectures.
This book is not for AI‐enabled teams and does not cover AI workflows, machine learning models, or real‐time operational use cases. Instead, its goal is to provide best practices for building and maintaining a robust data analytics stack (i.e. the analytics foundation on which an AI workflow can be built).
If you are a small business that can run everything with Quickbooks and Excel, that ability is great. Data is important for all companies, but if these tools are already serving you well, the book may not offer helpful guidance. If you start exceeding the data capacity of Excel or bring in a data source that needs to be in a database to be analyzed, then keep reading.
Who Wrote the Book
This book was written by Dave Fowler and Matt David.
Dave Fowler has worked in BI for over a decade, and has always looked for ways to JOIN teams ON data
. He wants to enable any working professional (not just data analysts) to explore and understand their data. As the founder and CEO of Chartio, Dave has spent the last 11 years leading the development of a self‐service BI product that aims to do just that. Chartio's suite of tools make it easy for anyone at a data‐driven business to browse their schemas, merge various data sources, and produce beautiful dashboards. In March 2021, Atlassian acquired Chartio and is integrating it into their platform.
Matt David has worked in product management and education for eight years. As data becomes a necessary skill for more and more jobs, he passionately advocates for data literacy among the workforce. As the current head of The Data School, he oversees the production of free, online resources focused on leveraging data within companies. Recent book topics include SQL optimization, data governance, and common analysis biases. Dave started The Data School, and together he and Matt have grown it into an important free resource for the data community. He previously worked at Udacity and General Assembly teaching analytics.
Dave and Matt decided to co‐write this book after seeing how many people struggle when constructing data stacks and then trying to use them. This book was created with the support of many employees at Chartio. They graciously provided insights into how customers model their data and collected frequently asked data‐infrastructure questions. Their contributions guided the production of this text.
Who Edited the Book
This book was reviewed and edited by Emilie Schario, Mila Page, and David Yerrington. Emilie is the head of data at Netlify and previously helped build Gitlab's entire data organization. She regularly writes and speaks on all things related to modern