Big Data. Seifedine Kadry
data.Table 8.8 Database.Table 8.9 Frequency of occurrence.Table 8.10 Priority of the items.Table 8.11 Itemset in a transaction.Table 8.12 Maximal/closed frequent itemset.Table 8.13 Transaction database.Table 8.14 Frequent itemsets with minsup = 3.Table 8.15 Frequent itemsets with tidset.Table 8.16 Transaction database.Table 8.17 Frequent Itemset with minsup = 3.Table 8.18 Tidset of the frequent itemset.Table 8.19 Comparison between Traditional data mining technique and mining da...
4 Chapter 10Table 10.1 Tableau data types.
List of Illustrations
1 Chapter 1Figure 1.1 Evolution of Big Data.Figure 1.2 3 Vs of big data.Figure 1.3 High‐velocity data sets generated online in 60 seconds.Figure 1.4 Big data—data variety.Figure 1.5 Sources of big data.Figure 1.6 Human‐ and machine‐generated data.Figure 1.7 Structured data—employee details of an organization.Figure 1.8 Unstructured data—the result of a Google search.Figure 1.9 XML file with employee details.Figure 1.10 Big data life cycle.Figure 1.11 Data integration.Figure 1.12 Hadoop core components.
2 Chapter 2Figure 2.1 Big data storage architecture.Figure 2.2 Cluster computing.Figure 2.3 Symmetric clusters.Figure 2.4 Asymmetric cluster.Figure 2.5 Distribution model.Figure 2.6 (a) Sharding. (b) Sharding example.Figure 2.7 Replication.Figure 2.8 Data replication.Figure 2.9 Master‐Slave model.Figure 2.10 Peer‐to‐peer model.Figure 2.11 Combination of sharding and replication.Figure 2.12 Data divided across multiple related tables.Figure 2.13 Scale‐up architecture.Figure 2.14 Scale‐out architecture.
3 Chapter 3Figure 3.1 Properties of a system following CAP theorem.Figure 3.2 RBDMS life cycle.Figure 3.3 RDBMS vs. NoSQL databases.Figure 3.4 A key‐value store database.Figure 3.5 General representation of graph database.Figure 3.6 Neo4J Relationships with properties.Figure 3.7 Relationship graph between course and employee.
4 Chapter 4Figure 4.1 Data processing cycle.Figure 4.2 Shared everything architecture.Figure 4.3 Symmetric multiprocessing memory.Figure 4.4 Distributed shared memory.Figure 4.5 Shared‐nothing architecture.Figure 4.6 Batch processing.Figure 4.7 Real‐time processing.Figure 4.8 Real‐time and batch computation systems example.Figure 4.9 Parallel computing.Figure 4.10 Distributed computing.Figure 4.11 System architecture before and after virtualization.Figure 4.12 Isolation.Figure 4.13 Service‐oriented architecture.Figure 4.14 Google File System architecture.Figure 4.15 Read algorithm: (a) The first three steps. (b) The last three st...Figure 4.16 Write algorithm: (a) The first three steps. (b) Steps 4 and 5. (...Figure 4.17 Cloud architecture.
5 Chapter 5Figure 5.1 Hadoop architecture.Figure 5.2 Hadoop ecosystem.Figure 5.3 Distributed file system vs. single machine.Figure 5.4 HDFS architecture.Figure 5.5 File write.Figure 5.6 File read.Figure 5.7 MapReduce model.Figure 5.8 Combiner illustration.Figure 5.9 JobTracker and TaskTracker.Figure 5.10 Word count algorithm.Figure 5.11 Hadoop 1.0 vs Hadoop 2.0.Figure 5.12 Active NameNode and standby NameNode.Figure 5.13 Hadoop 2.0.Figure 5.14 ResourceManager.Figure 5.15 NodeManager.Figure 5.16 YARN architecture.Figure 5.17 HBase architecture.Figure 5.18 RegionServer architecture.Figure 5.19 SQOOP import and export.Figure 5.20 SQOOP 1.0 architecture.Figure 5.21 Flume architecture.Figure 5.22 Pig – internal process.Figure 5.23 Oozie workflow.Figure 5.24 Apache Hive architecture.
6 Chapter 6Figure 6.1 Data analytics.Figure 6.2 Analyzing a customer behavior.Figure 6.3 Analytics life cycle.Figure 6.4 Data integration with EmpId field.Figure 6.5 Illustration of extraction without transformation.Figure 6.6 (a) Positive correlation. (b) negative correlation. (c) No correl...Figure 6.7 (a) Linear regression. (b) Nonlinear regression.Figure 6.8 Data analysis cycle.Figure 6.9 Big Data analytics processing.Figure 6.10 Architecture of an integrated EDW with Big Data technologies.
7 Chapter 7Figure 7.1 Types of machine learning algorithms.Figure 7.2 Supervised machine learning.Figure 7.3 Support vector machines.Figure 7.4 (a) Support vectors with small margin. (b) Support vectors with a...Figure 7.5 Non‐separable support vector machines.Figure 7.6 Unsupervised machine learning.
8 Chapter 8Fig.8.1 Frequency plot with relative valueFig.8.1 Frequency plot with absolute valueFigure 8.1 Lattice structure of data set {a,b,c,d,e}.Figure 8.2 Apriori algorithm—frequent itemsets.Figure 8.3 Apriori algorithm—Every superset of an infrequent itemset is also...Figure 8.4 Apriori algorithm–frequent itemsets.Figure 8.5 Generation of the candidate itemsets and frequent itemsets with m...Figure 8.6 Eclat algorithm illustration.Figure 8.7 Intersection of two itemsets.Figure 8.8 Eclat algorithm.Figure 8.9 (a) FP tree for transaction 1. (b) FP tree for transaction 2. (c)...Figure 8.10 Itemset and their corresponding support count.Figure 8.11 Maximal and closed frequent itemset.Figure 8.12 Maximal and closed frequent itemset – subset of frequent itemset...Figure 8.13 Maximal and closed frequent itemset – subset of frequent itemset...Figure 8.14 GenMax Algorithm implementation.Figure 8.15 CHARM algorithm implementation.Figure 8.16 Data mining methods.Figure 8.17 K‐Nearest neighbor – classification.Figure 8.18 k‐nearest neighbor – regression.Figure 8.19 Decision tree diagram.Figure 8.20 Decision tree – Weekend plan.Figure 8.21 (a) DBSCAN with ε = 1.00 and MinPts = 4. (b) DBSCAN with ε = 1.0...Figure 8.22 Biological neural network.Figure 8.23 Time series forecasting.
9 Chapter 9Figure 9.1 Clustering algorithm.Figure 9.2 Clustering based on distance.Figure 9.3 A vector in space.Figure 9.4 Manhattan distance.Figure 9.5 Hierarchical clustering.Figure 9.6 Dendrogram graph.Figure 9.7 Agglomerative and divisive clustering.Figure 9.8 K‐means clustering flowchart.Figure 9.9 (a) Initial clustered points with random centroids (b) Iteration ...Figure 9.10 (a) Initial clustered points with random centroids (b) Final Ite...Figure 9.11 Linearly separable clusters.Figure 9.12 Arbitrarily shaped clusters.Figure 9.13 (a) Original data set (b) K means (c) KK means.Figure 9.14 Univariate Gaussian distribution.Figure 9.15 Data points from two different models.Figure 9.16 Gaussian distribution.Figure 9.17 Gaussians placed in random positions.Figure 9.18 Probability estimation for the randomly placed Gaussians.Figure 9.19 Outliers.Figure 9.20 Point outlier.Figure 9.21 Contextual outlier.Figure 9.22 Collective outlier.Figure 9.23 Optimization algorithm.Figure 9.24 Particle swarm algorithm.Figure 9.25 Individual particle.Figure 9.26 Particle swarm optimization algorithm flowchart.Figure 9.27 Generating random numbers of clusters.Figure 9.28 K‐means clustering.Figure 9.29 Implementation of elbow method.Figure 9.30 Types of fuzzy clustering.
10 Chapter