Data Analytics in Bioinformatics. Группа авторов
1.1 Introduction
In today’s world, businesses are moving towards the implementation of automatic intelligence for decision making. This is only possible with the help of a well-known intelligence technique otherwise known as Artificial Intelligence (AI). This intelligence technique also plays a vital role in the field of research, which is nothing but taking decisions instantly. The dimension of AI is bifurcated into sub-domains such as Machine Learning (ML) and Artificial Neural Networks (ANN) [1]. The term ML is also termed as augmented analytics [2] and depicts the development of machine’s performances. This is achieved through the previous experiences obtained by the machines, but the traditional learning (i.e. the intelligence used in the mid-1800s) works not so efficiently if compared with the ML [3]. In traditional learning, the user deals with data and programs as an input attribute and provides the output or results whereas, in the case of ML the user provides the data and output or desired results as an input attribute and produces the program or rules as an output attribute. This means that data is more important rather than the programs. This is so because the business world depends on the accuracy level of the program which is used for decision making. The block diagram of Traditional learning is shown below in Figure 1.1 for easy understanding.
Traditional Learning is a manual process whereas the functioning of ML is an automated one. Due to ML, the accuracy of analytic worthiness is increased in different diversified domains. These domains are utilized for the preparation of data (raw facts and figures), Outlier Detection (Automatic), Natural Language Interfaces (NLI), and Recommendations, etc. [4]. Due to these domains, the bias factor for taking decisions on a business problem is decreased.
Figure 1.1 Traditional learning.
Figure 1.2 Machine learning.
ML is a sub-group of AI and its primary work is allowing systems to learn automatically with the help of data or observations obtained from the environment through different devices [5]. The block-diagram of ML is shown below in Figure 1.2.
ML-based algorithms perform predictions as well as decisions by using mathematical models that are based on some training data [6–8]. Few popular implementations of Machine Learning are Filtering of E-mails [9], Medical Diagnosis [10], Classification [11], Extraction [12], etc. ML works for the growth of the accuracy level of the computer programs. This was done by accessing data from the surrounding, learn the data automatically, and enhancing the capacity of decision making. The main objective of ML is to minimize human intervention and assistance while performing any task. The next section of this chapter highlights the process of learning along with its different methodologies.
1.2 Learning Process & its Methodologies
In AI, Learning means a process to train a machine in such a way so that the machine can take decisions instantly. Hence, the performance of that machine is upgraded because of its accuracy. When a machine performs in its working environment it may get either success or failure. From these successes or failures machines are gaining experience itself. These newly gained experience, improve the machines through their actions and forms an optimal policy for the working environment. This process is known as learning from experience. This process of learning is possible in an unknown working environment. A general block diagram learning architecture for such a method is presented below in Figure 1.3. This figure tries to present the mechanism of learning a new experience by a machine. The sequence of learning behavior in a stepwise manner is given below.
Step 1. The IoT based sensors received input from the environment.
Step 2. Then, the sensor sends these inputs to the critics for performance evaluation, according to the previously stored performance standards. Simultaneously, the sensor sends the same input to the performance element for checking its effectiveness, if found OK then immediately return the same to the environment through effectors.
Step 3. The Critics provide the feedback to the learning element, if any new feedback occurs then it should be updated in the performance of the element. Then, the updated knowledge comes back to the learning element and send it to the problem generator as a learning goal for evaluating the same through experiments. The updates are sent to the performance of the element for future reference.
Figure 1.3 Learning behavior of a machine.
The learning process of ML is done in three different ways. These are supervised learning, unsupervised learning, and reinforcement learning. These three learning types have their importance in the different fields of bioinformatics research. Hence, they are explained with suitable examples in the next sections.
1.2.1 Supervised Learning
This is a very common learning mechanism in ML and used by most of the newcomer researchers in their respective fields. This learning mechanism trains the machine by using a labeled dataset in the form of compressed input–output pair as depicted in Refs. [13–15]. These datasets are available in continuous or discrete form. But the important thing is, it needs supervision with an appropriate training model. As supervised learning predicts accurate results [16], hence it is mostly used for Regression analysis and classification purposes. Figure 1.4 shows the execution model of supervised learning.
The figure shows that in supervised learning, a given set of input attributes (i. e. A1, A2, A3, A4 … … Ak) along with their output attributes (i.e. B1, B2, B3, B4 … … … Bk) are kept in a knowledge dataset. The Learning Algorithm takes an input Ai and executes with its model and produces the result Bi as the desired output. Supervised Learning has its importance in the field of Bioinformatics as concerning the heart disease scenario where inputs can be a lot of symptoms of heart diseases such as High Cholesterol, Chest Pain, and Blood Pressure, etc. and the output could be a person suffering from heart disease or not. Now all these inputs are passed on to the learning algorithm where it gets trained and if a new input is passed through the model then the machine gives an expected output. If the expected output’s accuracy is not up to the mark then there is a need for modification or up-gradation in the model.
Figure 1.4 Block diagram of supervised learning.
An example of supervised learning could be of a person who felt that he has a high cholesterol level and a chest pain and went to the doctor for a check-up. The Doctor fed the inputs given by the patient to the machine. The Machine predicted and told the doctor that the patient is suffering from a cardiac issue in his heart. It acts as an analogy to the supervised learning as the inputs given by the patient are the independent variables and their corresponding output from the machine acts as the dependent attribute. The Machine acted as a model that predicted and gave a relevant output as it is trained by similar inputs. Supervised Learning is itself a huge subfield of ML and useful for a variety of techniques used in research work. These techniques include Regression Analysis, Artificial Neural Networks (ANN), Support Vector Machines (SVM), etc.
1.2.2 Unsupervised Learning
In Unsupervised Learning, the user doesn’t have to supervise the model. Here, the model