Modern Big Data Architectures. Dominik Ryzko
On the other hand, modern big data environments offer unprecedented possibilities of performing large scale computations both in batch and streaming mode, which can greatly enhance capabilities of MAS. Cloud resources supporting mobile and IoT devices might well be used to empower intelligent agents located in the environment. On the lower level, modern distributed programming libraries (e.g. Scala Akka) can greatly improve performance of MAS, which often use less advanced environments, not capable of efficient thread and resource management.
1.2 Assumptions
While establishing the scope and focus of this book, several assumptions and compromises had to be made. Firstly, when describing a field such as Big Data, where new concepts and projects emerge on a daily basis, it is difficult to resist the temptation to include every new finding, so the book will be as up to date as possible at the time of publishing. On the other hand it is difficult to predict the future of freshly proposed solutions, before they become more mature and are hardened by real life applications.
Therefore, difficult choices have been made and some might argue that a particular important architecture, project, or framework has been left out. In general, I have been following the rule of writing about topics, which have some proven maturity, e.g. have become mainstream Apache projects, have been followed by highly cited publications, have been applied by at least one of the large and recognized industry players, etc.
Secondly, since the book title refers to big data architectures, the contents concentrate on large scale solutions capable of solving practical problems experienced in the industry. Therefore, specific tools applicable at particular points in the larger architectures are described only to the point where they are relevant from the point of view of the big picture they take part in, rather than in their internal and technical details. For example Hadoop, which is often regarded as a technological synonym for big data, is described as a component for batch processing used in larger big data architectures. Map-Reduce, Hadoops', underlying algorithm, is presented as one of the generic computational models for processing extremely large data sets. Similarly, Spark is an example of stream processing and plays that role in larger big data setups.
In the field of MAS things have been somewhat easier, since the field is more mature in general and several comprehensive textbooks have been published to date, which summarize the research and development efforts in this area. Therefore, major agent models and architectures are described in line with the state of the art long established in the field. This is complemented with some more recent and more specific examples of applications of multi-agent paradigms in solving various big data problems.
1.3 For Whom Is This Book?
This book could be of interest to both researchers and practitioners from the fields of big data, analytics, machine learning, MAS, distributed computing, cloud computing, distributed artificial intelligence, as well as a number of other related fields.
The intention has been, for anyone from the fields mentioned above to see the current state of the art in distributed, asynchronous processing of massive data sets. As well as this it will be shown how various field and areas of research relate to each other by tackling similar issues and challenges from their respective perspectives.
For big data practitioners not familiar with MAS research it may come as a surprise how many relevant ideas and concepts have already been analyzed several years back. MAS researchers will find several big data environments, libraries, and tools very useful for taking their systems to the next level of efficiency.
In the end I hope that this book will initiate mutual discussion and exchange of ideas, which is to some extent already present but could become much more intense and fruitful.
1.4 Book Structure
The book is organized as follows. Chapter 2 discusses how major paradigms and concepts have changed over the last few decades, leading to the current landscape. Specifically we will analyze how the evolution of IT architectures influenced storage and analytics of the data. We will also look at the shift of paradigms in database systems, the growing role of the cloud, the Internet, and the IoT. Also the concepts of an agent and an actor are introduced. We conclude by discussing how all these trends led to the rise of big data.
In Chapter 3 we look at where the data comes from in the big data setups. We start with the Internet as the most commonly available data source today. Then we iterate over various branches of science and industry looking at how much data they generate and what is specific about each of them. Finally, the IoT as the fast growing source of huge data streams is described.
Once we are familiar with the data sources, the book dives into specific tasks which need to be performed with the use of the data. Chapter 4 looks at the most important challenges that research and industry is working on in the big data area. This covers recommender systems, search, real time bidding, as well as multiple other topics.
Cloud computing is discussed in Chapter 5. It deserves a separate chapter as a major trend shaping the creation of the next generation of information systems. We look at the advantages and challenges of utilizing cloud resources and how it enables the building of scalable, distributed big data systems. The means for efficient cloud management both in VM and container based setups are described.
In Chapter 6 several big data architectures are presented. We start with fundamental computational models and move towards more complex setups. This includes among others Lambda and Kappa architectures, which have recently emerged as important design patterns for building scalable big data processing and analytics. A separate section is devoted to stream processing.
The means for data analytics and building machine learning models are the subject of Chapter 7. The role of SQL versus other forms of ad-hoc interaction with the data is analyzed. Tools and architectures for providing SQL capabilities in noSQL environments are analyzed. We look at frameworks and tools for efficient building, deploying, and testing of machine learning models.
Geographically distributed systems are the topic of Chapter 8. We will take a look at how the latest trends driven by mobile computing and the IoT led to the emergence of edge and fog computing as new paradigms for extending the cloud towards the distributed elements of the cyber-physical systems.
The work is closed by Chapter 9 with a summary and conclusions. References to the literature complete the volume.
Конец ознакомительного фрагмента.
Текст предоставлен ООО «ЛитРес».
Прочитайте эту книгу целиком, купив полную легальную версию на ЛитРес.
Безопасно оплатить книгу можно банковской картой Visa, MasterCard, Maestro, со счета мобильного телефона, с платежного терминала, в салоне МТС или Связной, через PayPal, WebMoney, Яндекс.Деньги, QIWI Кошелек, бонусными картами или другим удобным Вам способом.