Data Lakes For Dummies. Alan R. Simon
gateways, firewalls, servers, databases — pretty much any piece of hardware in your enterprise — into your data lake, as quickly as you can as traffic flows across your network and transactions hit your databases. Then, just as quickly, you and your coworkers can analyze the rapidly incoming data and take necessary actions to keep everything running smoothly.
At the same time, not everything needs to zoom into your data lake at lightning-fast speed. Think about a lake that not only has speedboats zipping all over but also has much larger ferry-type vessels that take hundreds of passengers at a time all around the lake. Some of those ferries also offer evening gourmet dinner cruises in addition to their daytime excursions.
You’re not going to have much success trying to water-ski behind a lake ferry, nor will you have much success trying to eat a six-course gourmet meal served on the finest china while you’re bouncing all over the place on a speedboat. You need to find the proper water vessel for what you’re trying to do out on the lake, right?
You should think of your data lake as a variable-speed transportation engine for your enterprise data. If you need certain data blasted into your data lake as quickly as possible because you need to do immediate analysis, no problem! On the other hand, other data can be batched up and periodically brought into the data lake in bulk, on sort of a time-delayed basis, because you don’t need to do real-time analysis. You can mix and match the data feeds in whatever combination makes sense for your organization’s data lake.
Managing Overall Analytical Costs
You like the overall idea of a data lake. But you’re talking about overhauling almost all your current analytical data environment. Over the past couple of decades, your organization has spent a ton of money on a couple of data warehouses, not to mention hundreds of data marts. And that was just the start!
Every budget planning cycle, your CFO groans at the price tag of keeping those data warehouses and data marts running. The servers, the database software, the staff to keep things up and running … sure, everyone in your organization would love to stop writing those huge checks every year to keep those old systems running. But wouldn’t a data lake mean starting all over from scratch with a gigantic price tag?
Not necessarily! In fact, your new data lake presents you with the opportunity to get a grip on your overall analytical costs, as well as to get started without having to write a seven- or even eight-figure check.
Too good to be true? Thanks to the financials of cloud computing, you can have your data lake and drink it, too. (Wow, that was a really bad metaphor, but you get the idea.)
Almost all data lakes are built and deployed on a cloud computing platform, such as Amazon Web Services (AWS) or Microsoft Azure. With cloud computing, you can tiptoe into new technology using a pay-by-the-drink model. (Now that metaphor works much better!)
Chances are, your organization’s data warehouses and data marts were built in — and are still hosted in — your company data center. Even if your IT organization uses an outside data center for the actual hosting, you still had to write some pretty big checks for every aspect of your current data warehouses and data marts, and you probably have some all-inclusive hosting contracts with your outside data center providers.
Most likely, your organization has already headed into the world of cloud computing. You may be using Salesforce for customer relationship management (CRM), or enterprise resource planning (ERP) software such as NetSuite or Workday for your finance and accounting, human resources, and other “back office” functions. If you’ve already dabbled in cloud computing for your operational applications, heading in the same direction for your analytics is natural.
You’ll build your data lake in a phased, iterative, and incremental manner (see Chapter 17). With cloud computing, you can pay as you go along each phase, with tighter controls over your financial outlays for your data lake than you had with your earlier data warehouses and data marts.
“Managing overall analytical costs” does not equate to “not spending a lot of money. You’ll still need to keep a close watch on the total cost of ownership (TCO) of your data lake. With cloud computing, the meter is always running as your business users run reports, produce visualizations, and build machine learning models.
But you’ll have the opportunity for a fresh start with new technology and new approaches to enterprise analytics, without the need to make a gigantic investment up front as you bravely enter this new world of data lakes.
Конец ознакомительного фрагмента.
Текст предоставлен ООО «ЛитРес».
Прочитайте эту книгу целиком, купив полную легальную версию на ЛитРес.
Безопасно оплатить книгу можно банковской картой Visa, MasterCard, Maestro, со счета мобильного телефона, с платежного терминала, в салоне МТС или Связной, через PayPal, WebMoney, Яндекс.Деньги, QIWI Кошелек, бонусными картами или другим удобным Вам способом.