Fundamentals and Methods of Machine and Deep Learning. Pradeep Singh

Fundamentals and Methods of Machine and Deep Learning

they share information about sending or receiving the information about the training data [21, 22].

The BCC model the parameters which includes

hyperparameters

. Based on the values of the prior posterior probability distribution of random variables with observed label classes, the independence posterior density id computed as follows:

The inferences drawn are based on the unknown random variables, i.e., P, π, t, V, and α which are collected using Gibbs and rejection sampling methodology. A high-level representation of BCC is shown in Figure 2.4. First parameters of BCC model, hyperparameters, and posterior probabilities are summed to generate final prediction as output.

Some of the advantages offered by BCC in diagnosing the zonotic diseases are as follows: performs probabilistic prediction, isolates the outliers which causes noise, efficient handling of missing values, robust handling of irrelevant attributes, side effects caused by dependency relationships can be prevented, easier in terms of implementation, ease modeling of dependency relationships among the random variables, learns collectively from labeled and unlabeled input data samples, ease feature selection, lazy learning, training time is less, eliminates unstable estimation, high knowledge is attained in terms of systems variable dependencies, high accuracy achieved in interpretation of the results, confusion matrix–based processing of data, low level of computational complexity, easily operates with less computational resources, requires less amount of training data, capable enough to handle the uncertainty in the data parameters, can learn from both labeled and unlabeled data samples, precise selection of the attributes which yields maximum information gain, eliminates the redundant values, lower number of tunning parameters, less memory requirement, highly flexible classification of data, and so on [23].

Figure 2.4 A high-level representation of Bayesian classifier combination (BCC).

2.6 Bucket of Models

The bucket of models is one of the popular ensemble machine learning techniques used to choose the best algorithm for solving any computational intensive problems. The performance achieved by bucket of models is good compared to average of all ensemble machine learning models. One of the common strategies used to select the best model for prediction is through cross-validation. During cross-validation, all examples available in the training will be used to train the model and the best model which fits the problem will be chosen. One of the popular generalization approaches for cross-validation selection is gating. In order to implement the gating, the perceptron model will be used which assigns weight to the prediction product by each model available in the bucket. When the large number of models in the bucket is applied over a larger set of problems, the model for which the training time is more can be discarded. Landmark-based learning is a kind of bucket-based model which trains only fast algorithms present in the bucket and based on the prediction generated by fast algorithms will be used to determine the accuracy of slow algorithms in the bucket [24]. A high-level representation of bucket of models is shown in Figure 2.5. The data store maintains the repository of information, which is fed as input to each of the base learners. Each of the base learners generates their own prediction as output which is fed as input to the metalearner. Finally, the metalearner does summation of each of the predictions to generate final prediction as output.

Figure 2.5 A high-level representation of bucket of models.

One of the best suitable approaches for cross-validation among multiple models in ensemble learning is bake off contest, the pseudo-code of which is given below.

Pseudo-code: Bucket of models

For each of the ensemble model present in the bucket doRepeat constant number of timesDivide the training set into parts, i.e., training set and test set randomlyTrain the ensemble model with training setTest the ensemble model with test setChoose the ensemble model that yields maximum average score value

Some of the advantages offered by bucket of models in diagnosing the zonotic diseases are as follows: high quality prediction, provides unified view of the data, negotiation of local patterns, less sensitive to outliers, stability of the model is high, slower model gets benefited from faster models, parallelized automation of tasks, learning rate is good on large data samples, payload functionality will be hidden from end users, robustness of the model is high, error generation rate is less, able to handle the random fluctuations in the input data samples, length of the bucket is kept medium, easier extraction of features from large data samples, prediction happens by extracting the data from deep web, linear weighted average model is used, tendency of forming suboptimal solutions is blocked, and so on [25, 26].

2.7 Stacking

Stacking is also referred as super learning or stacked regression which trains the meta-learners by combining the results generated by multiple base learners. Stacking is one form of ensemble learning technique which is used to combine the predictions generated by multiple machine learning models. The stacking mechanism is used to solve regression or classification problems. The typical architecture of stacking involves two to three models which are often called as level-0 model and level-1 model. The level-0 model fit on the training data and the predictions generated by it gets compiled. The level-1 model learns how to combine the predictions generated by the predictions obtained from several other models. The simplest approach followed to prepare the training data is k-fold cross- validations of level-0 models. The implementation of stacking is easier and training and maintenance of the data is also easier. The super learner algorithm works in three steps first is to setup the ensemble, train the ensemble which is setup, and after sufficient training test for the new test data samples [27, 28].

The generalization approach in stacking splits the existing data into two parts one is training dataset and another is testing dataset. The base model is divided into K-NN base models, the base model will be fitted into K–1 parts which leads to the prediction of the K^th part. The base model will further fit into the whole training dataset to compute the performance over the testing samples. The process gets repeated on the other base models which include support vector machine, decision tree, and neural network to make predictions over the test models [29]. A high-level representation of stacking is shown in Figure 2.6. Multiple models are considered in parallel, and training data is fed as input to each of the model. Every model generated the predictions and summation of each of the predictions is fed as input to generalizer. Finally, generalizer generates final predictions based on the summation of the predictions generated by each of the model.

Some of the advantages offered by stacking in diagnosing the zonotic diseases are as follows: easily parallelized, easily solves regression problems, simple linear stack approach, lot more efficient, early detection of local patterns, lower execution time, produces high quality output, chances of misclassification is less, increased predictive accuracy, effect of outliers is zero, less memory usage, less computational complexity, capable of handling big data streams, works in an incremental manner, classification of new data samples is easy, used to solve multiple classification problems, approach is better than classical ensemble method, suitable to solve computation intensive applications, generalization of sentiment behind analysis is easy, able to solve nonlinear problems, robust toward large search space, training period is less, capable of handling noisy training data, collaborative filtering helps in removal of noisy elements from training data, suitable to solve multi-classification problems, less number of hyperparameters are involved in training, evolves naturally

Скачать книгу