Data Analytics in Bioinformatics. Группа авторов
Wiley products visit us at www.wiley.com.
Limit of Liability/Disclaimer of Warranty
While the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales materials, or promotional statements for this work. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read.
Library of Congress Cataloging-in-Publication Data
ISBN 978-1-119-78553-8
Cover image: Pixabay.Com
Cover design by Russell Richardson
Set in size of 11pt and Minion Pro by Manila Typesetting Company, Makati, Philippines
Printed in the USA
10 9 8 7 6 5 4 3 2 1
Preface
Machine learning has become increasingly popular in recent decades due to its well-defined algorithms and techniques that enable computers to learn and solve real-life problems which are difficult, time-consuming, and tedious to solve traditionally. Regarded as a subdomain of artificial intelligence, it has a gamut of applications in the field of healthcare, medical diagnosis, bioinformatics, natural language processing, stock market analysis and many more. Recently, there has been an explosion of heterogeneous biological data requiring analysis, retrieval of useful patterns, management and proper storage. Moreover, there is the additional challenge of developing automated tools and techniques that can deal with these different kinds of outsized data in order to translate and transform computational modelling of biological systems and its correlated disciplinary data for further classification, clustering, prediction and decision-making.
Machine learning has justified its potential with its application in extracting relevant information in various biological domains like bioinformatics. It has been successful in dealing with and finding efficient solutions for complex biomedical problems. Prior to the application of machine learning, traditional mathematical as well as statistical models were used along with the domain of expert intelligence to carry out investigations and experiments manually, using instruments, hands and eyes, etc. But such conventional methods alone are not enough to deal with large volumes of different types of biological data. Hence, the application of machine learning techniques has become the need of the hour in research in order to find a solution to complex bioinformatics applications for both the disciplines of computer science and biology. With this in mind, this book has been designed with a number of chapters from eminent researchers who relate and explain the machine learning techniques and their application to various bioinformatics problems such as classification and prediction of disease, feature selection, dimensionality reduction, gene selection, etc. Since the chapters are based on progressive collaborative research work on a broad range of topics and implementations, it will be of interest to both students and researchers from the computer science as well as biological domains.
This edited book is compiled using four sections, with the first section rationalizing the applications of machine learning techniques in bioinformatics with introductory chapters. The subsequent chapters in the second section flows with machine learning technological applications for dimensionality reduction, feature & gene selection, plant disease analysis & prediction as well as cluster analysis. Further, the third section of the book brings together a variety of machine learning research applications to healthcare domain. Then the book dives into the concluding remarks of machine learning applications to stock market behavioural analysis and prediction.
The Editors
December 2020
Acknowledgement
The editors would like to acknowledge and congratulate all the people who extended their assistance for this book. Our sincere thankfulness goes to each one of the chapter’s authors for their contributions, without whose support this book would not have become a reality. Our heartfelt gratefulness and acknowledgement also go to the subject matter experts who could find their time to review the chapters and deliver those in time to improve the quality, prominence as well as uniform arrangement of the chapters in the book. Finally, a ton of thanks to all the team members of Scrivener Publishing for their dedicated support and help in publishing this edited book.
1
Introduction to Supervised Learning
Rajat Verma, Vishal Nagar and Satyasundara Mahapatra*
PSIT, Kanpur, Uttar Pradesh, India
Abstract
Artificial Intelligence (AI) has enhanced its importance through machines in the field of present business scenario. AI delineates the intelligence illustrated by machines and performs in a contrasting manner to the natural intelligence signified by all living objects. Today, AI is popular due to its Machine Learning (ML) techniques. In the field of ML, the performance of a machine depend upon the learning performance of that machine. Hence, the improvement of the machine’s performance is always proportional to its learning behavior. These Learning behaviors are obtained from the knowledge of living object’s intelligence. An introductory aspect of AI through a detailed scenario of ML is presented in this chapter. In the journey of ML’s success, data is the only requirement. ML is known because of its execution through its diverse learning approaches. These approaches are known as supervised, unsupervised, and reinforcement. These are performed only on data, as its quintessential element. In Supervised, attempts are done to find the relationship between the independent variables and the dependent variables. The Independent variables are the input attributes whereas the dependent variables are the target attributes. Unsupervised works are contrary to the supervised approach. The former (i.e. unsupervised) deals with the formation of groups or clusters, whereas the latter (i.e. supervised) deals with the relationship between the input and the target attributes. The third aspect (i.e. reinforcement) works through feedback or reward. This Chapter focuses on the importance of ML and its learning techniques in day to day lives with the help of a case study (heart disease) dataset. The numerical interpretation of the learning techniques is explained with the help of graph representation and tabular data representation for easy understanding.
Keywords: Artificial intelligence, machine learning, supervised, unsupervised, reinforcement, knowledge, intelligence