Data Lakes For Dummies. Alan R. Simon
Follow the Leader Guiding Principles of a Data Lake Reference Architecture A Reference Architecture for Your Data Lake Reference Architecture Incoming! Filling Your Data Lake Supporting the Fleet Sailing on Your Data Lake The Old Meets the New at the Data Lake Bringing Outside Water into Your Data Lake Playing at the Edge of the Lake Chapter 5: Anybody Hungry? Ingesting and Storing Raw Data in Your Bronze Zone Ingesting Data with the Best of Both Worlds Joining the Data Ingestion Fraternity Storing Data in Your Bronze Zone Just Passing Through: The Cross-Zone Express Lane Taking Inventory at the Data Lake Bringing Analytics to Your Bronze Zone Chapter 6: Your Data Lake’s Water Treatment Plant: The Silver Zone Funneling Data further into the Data Lake Bringing Master Data into Your Data Lake Impacting the Bronze Zone Getting Clever with Your Storage Options Working Hand-in-Hand with Your Gold Zone Chapter 7: Bottling Your Data Lake Water in the Gold Zone Laser-Focusing on the Purpose of the Gold Zone Looking Inside the Gold Zone Deciding What Data to Curate in Your Gold Zone Seeing What Happens When Your Curated Data Becomes Less Useful Chapter 8: Playing in the Sandbox Developing New Analytical Models in Your Sandbox Comparing Different Data Lake Architectural Options Experimenting and Playing Around with Data Chapter 9: Fishing in the Data Lake Starting with the Latest Guidebook Taking It Easy at the Data Lake Staying in Your Lane Doing a Little Bit of Exploring Putting on Your Gear and Diving Underwater Chapter 10: Rowing End-to-End across the Data Lake Keeping versus Discarding Data Components Getting Started with Your Data Lake Shifting Your Focus to Data Ingestion Finishing Up with the Sandbox
7 Part 3: Evaporating the Data Lake into the Cloud Chapter 11: A Cloudy Day at the Data Lake Rushing to the Cloud Running through Some Cloud Computing Basics The Big Guys in the Cloud Computing Game Chapter 12: Building Data Lakes in Amazon Web Services The Elite Eight: Identifying the Essential Amazon Services Looking at the Rest of the Amazon Data Lake Lineup Building Data Pipelines in AWS Chapter 13: Building Data Lakes in Microsoft Azure Setting Up the Big Picture in Azure The Magnificent Seven, Azure Style Filling Out the Azure Data Lake Lineup Assembling the Building Blocks
8
Part 4: Cleaning Up the Polluted Data Lake
Chapter 14: Figuring Out If You Have a Data Swamp Instead of a Data Lake
Designing Your Report Card and Grading System
Looking at the Raw Data Lockbox
Knowing What to Do When Your Data Lake Is Out of Order
Too Fast, Too Slow, Just Right: Dealing with Data Lake Velocity and Latency
Dividing the Work in Your Component Architecture
Tallying Your Scores and Analyzing the Results
Chapter 15: Defining Your Data Lake Remediation Strategy
Setting Your Key Objectives
Doing Your Gap Analysis
Identifying Resolutions
Establishing Timelines
Defining Your Critical Success Factors
Chapter 16: Refilling Your Data Lake
The Three