Data Analytics in Bioinformatics. Группа авторов
to find the information. Here clusters are made [17–21]. A block diagram of unsupervised learning is shown in Figure 1.5.
The figure says that in unsupervised learning the inputs are collected as a set of features that are described as A1, A2, A3, A4 … … Ak. But, the output features are not available. The input parameters are passed to a learning algorithm module and diverse groups are formed that are called clusters [22–26].
Figure 1.5 Block diagram of unsupervised learning.
Unsupervised Learning has its role in Bioinformatics as concerning the heart disease scenario where inputs can be a lot of symptoms of heart diseases such as High Cholesterol, Chest Pain, and Blood Pressure, etc. These symptoms are passed onto the learning algorithm as input where clusters are made by the model and help the patient for identifying a disease (variables/values of similar types in one cluster) that may occur in near future.
1.2.3 Reinforcement Learning
In the field of ML, Reinforcement Learning was developed by John Andreae in the year 1963 when he invented a system called Stella [27]. It is a dynamic approach that works on the concept of feedbacks [28–31]. Reinforcement for a machine is the reward that it receives upon acting in the environment. When the machine acts on its environment, it receives some evaluation on its actions which is called reinforcement but is not told of which action is the correct one for achieving the goal. In this, the machine’s utility is defined by the feedback function [32]. The objective is to maximize the expected feedback. The block diagram of reinforcement learning is shown below in Figure 1.6.
The above figure tries to present that, a machine at first performs some actions in the environment. Once the actions are performed, then the machine starts to receive the feedbacks. The collected feedbacks may be positive or negative type. The positive feedbacks are kept inside the machines as knowledge. The machine tries to learn from the negative feedback so that in future such an incident may not happen again. Another important aspect of reinforcement learning is the state. The state also provides the input based on the situation to the machine for learning purposes.
Figure 1.6 Block diagram of reinforcement learning.
A few points of reinforcement learning are as follows:
The Input of the Reinforcement Learning Process: Initial state
The Output of the Reinforcement Learning Process: Diversified solutions can be present, depending on the feedbacks obtained
The training process is purely based on input.
This Reinforcement Learning model is a continuous process.
The best solution for this reinforcement learning is the maximum positive feedback.
An example of reinforcement learning could of a person who is suffering from high cholesterol and high blood pressure. He visits his family doctor and requests a medication regarding the same. After analyzing the symptoms, the doctor prescribed a diet chart and a set of medicines to minimize the cholesterol level and blood pressure. He took the medicines and felt better. Here, the patient gets positive feedback in the form of the results of the medication provided by the doctor. Now, the patient will be motivated and will consume only low-fat and low-sodium diet to keep down the levels of blood pressure & cholesterol. If the levels did not go down then the patient will ask the doctor about the same and more tests will be considered for the lowering of the parameters that are required to evaluate the heart of the patient.
1.3 Classification and its Types
Classification is a task in ML, which deals with the organized process of assigning a class label to an observation from the problem domain. It is a sub-group of the supervised form of ML. The traditional classification algorithm was invented by a Swedish botanist Carl Von Linnaeus and depicted in Ref. [33]. In the process of calculating the desired output in supervised learning, this classification is more effective when the input attribute is in the form of a discrete. The Classification approach always helps the user for taking decisions by providing the classified conclusions from the observed data, values as discussed in Refs. [34–36]. Figure 1.7 tries to present a classification graph by executing the data of different persons who are suffering from heart disease or not.
In the above figure, the patients that are suffering from Heart disease are represented by the triangle symbol, and those who are not, are represented by rectangle symbols. The hyperplane (partition) line depicts the bifurcation between these two classified entities. In general, there are four types of classification techniques. They are:
Figure 1.7 Concept of classification.
Binary Classification: It considers the tasks of classification where the class labels are two, and the two classes consider one in the normal state and the other in the abnormal state [37].
Imbalanced Classification: It involves the tasks of classification where the examples are unequally distributed in the class [38].
Multi-label Classification: It involves the tasks of classification where the number of class labels is two or greater than two where for every example one or more than one class label may be predicted [39].
Multi-Class Classification: It involves the tasks of classification where the number of class labels is greater than two [40].
Figure 1.8 Classification based on gender.
For Achieving the Classification approach more precisely, a heart disease dataset [41] has been used that comprises of a total of 1,025 people out of which 312 are females and 713 are males. A particular reason behind taking this dataset is that people are continuously suffering from heart diseases, this is so because people who consume alcohol excessively, consume oily and fast food and also inhale dangerous gases due to pollution. This Classification of gender is given below in Figure 1.8.
1.4 Regression
Regression is a very powerful type of statistical analysis. This is used for finding the strength as well as the character between one dependent variable and a series of independent variables [42–44]. This analysis provides the knowledge on the product that weather any updation in the future is possible or not. The operation of regression provides the ability to a researcher for identifying the best parameter of a topic that can be used for analysis. Also, it provides the parameters that are not to be used for analysis.
In the field of ML, Linear Regression is the most common type of regression analysis for the purpose of prediction [45]. In this process of statistical analysis, equations are made for identifying the useful and not-useful parameters. These are done by linear regression as well as multiple linear regression [46–49]. The representation of Linear Regression is presented in Equation (1.1) and the representation of Multiple