Handbook on Intelligent Healthcare Analytics. Группа авторов
3.4 Big Data Knowledge System in Healthcare
Knowledge means facts, information, and skills obtained in the course of experience. Knowledge is nothing but better understanding of a subject. Healthcare practice is a knowledge based process. Giving the right knowledge during the decision-making process is very important. In healthcare, the knowledge is used for the diagnosis of the disease, the prescription of medicine and the clinical management.
Table 3.2 Big data technologies [12, 14, 26].
Big data capability | Primary technologies | Developer | Description |
Big data collection | Sqoop | Apache Software Foundation | The Apache Sqoop tool is used for transmitting the data among Apache Hadoop and structured databases like relational databases such as Oracle and MySQL. It exports and imports data among relational databases and the Hadoop file system. |
Flume | Apache Software Foundation | The Apache Flume is a service for collecting, aggregating, and moving huge volumes of log files to centralized data stores. The log files are collected from the internet and aggregated in the Hadoop Distributed File System for investigation. | |
Kafka | Apache Software Foundation | The Apache Kafka is used to collect data and to perform real-time analysis. The Kafka is used by Uber, Netflix, and PayPal. | |
Big data storage | Hadoop Distributed File System (HDFS) | Apache Software Foundation | The distributed storage system for Hadoop application is Hadoop Distributed File System (HDFS). This HDFS is a storage system based on a distributed file system. It is a majorcomponent of the Apache Hadoop. The HDFS accommodates large data sets. HDFS framework used for storing files in distributed environments. |
Oracle NoSQL | Oracle Corporation | NoSQL Database is also known as a non-SQL or non-relational database. It has a method for storing and retrieving data from other than tabular relational models. NoSQL is mainly used for storing big data. Unstructured data are managed by NoSQL using dynamic schemas. | |
Apache HBase | Apache Software Foundation | Apache HBase is a distributed and non-relational database in the Hadoop framework. HBase provides a way to store sparse data sets. It dynamically distributes the table when the data set is too big to manage. | |
Apache Cassandra | Apache Software Foundation | The Apache Cassandra database is a distributed database to manage huge amounts of data in several commodity software. Cassandra is a scalable, high-performance, NoSQL database. | |
Big data processing | MapReduce | Apache Software Foundation | The Apache MapReduce is used for the distributed processing of large amounts of data in a trustworthy, error-tolerant way. Hadoop MapReduce is a programming language. This framework is used to write applications for managing and processing huge data. |
Apache Hadoop | Apache Software Foundation | The ApacheHadoop framework is used for storing and processing big data in the cluster of computers that is in a distributed environment instead of a single computer. Hadoop uses a cluster of computers for analyzing large data sets in parallel. The Hadoop consists of following main layers: • Processing/computational layer: MapReduce• Storage layer: HDFS Hadoop MapReduce, Yet Another Resource Negotiator, Hadoop Common, and HDFS are important modules of Apache Hadoop framework. Apache Yarn: Yet Another Resource Negotiator used for resource management and job scheduling. YARN allows processing of HDFS data with different data processing engines. Hadoop Common consists of common libraries and utilities. These libraries and utilities are necessary for supporting the other modules of Hadoop. | |
Data integration capability | Oracle bigdata connector, Oracle data integrator | Oracle Corporation | The Oracle big data connector is software that combines the Apache Hadoop and oracle database. Apache Hadoop is used for data accumulation and pre-processing. The connector integrates data in the oracle database with Apache Hadoop for the analysis purpose. |
Statistical analysis capability | R, Oracle R Enterprise | Oracle Corporation | R programming language is an open-source package for Statistical analysis.Oracle R enterprise refers to the oracle database for advanced analysis. It consists of R software and oracle database features. This feature is used by R users to access the database data without Structured Query Language. |
Big data analytics | Apache Hive | Originally developed by Facebook and it’s an Apache licensed software. | The Apache Hive is open-source software for data warehouses. This framework is used for storing, querying and analyzing the large data. Hive is built on Apache Hadoop. Hive framework is used to store and process large data sets quickly. Hive is used for summarizing the data, query and analyzing the data. Hive is written in java. |
The complexity of massive data is the major problem with big data because it goes beyond the capacity of traditional data processing systems. Healthcare big data is the massive amount of patient’s information because of the innovations in digital technologies. The roles of big data in the healthcare profession have the greatest impact on human lives. The healthcare organization is the major sector that produce enormous quantity of information such as administration data, laboratory test data, medicine data, sensor data, smart wearable device data, pharmacy data, EHR, doctor’s appointment details, and health insurance. There is a need to analyze this healthcare data for the better treatment process and better healthcare services at lower costs and to increase the patients’ satisfactions. The healthcare professionals, the hospitals, and many researchers are analyzing the medical data to understand the clinical perspective and to prevent health issues. The meaningful information is extracted from the massive data using the data analytics tool. Big data analytics helps to examine the patient’s information quickly and to formulate better conclusions on the patient’s treatment process. The disease prediction and detection, the clinical and drug research, the hospital administration, and the personalized patient healthcare services are a few of the major uses of big data analytics in the healthcare professions.
Figure 3.2 represents the process of creating knowledge from raw facts. A knowledge system objective in big healthcare data is to transform the information into a knowledge asset.
The big data sets are analyzed to get the valuable information. The information is transformed into actionable insights. The actionable insights are turned into knowledge using analytical tools. This knowledge can be used for