Big Data. Seifedine Kadry

Big Data

strategy and increase the sales to increase revenue.

1.11.3 Financial Services

Financial services utilize big data technology in credit risk, wealth management, banking, and foreign exchange to name a few. Risk management is of high priority for a finance organization, and big data is used to manage various types of risks associated with the financial sector. Some of the risks involved in financial organizations are liquidity risk, operational risk, interest rate risk, the impact of natural calamities, the risk of losing valuable customers due to existing competition, and uncertain financial markets. Big data technologies derive solutions in real time resulting in better risk management.

Issuing loans to organizations and individuals is the major sector of business for a financial institution. Issuing loans is primarily done on the basis of creditworthiness of an organization or individual. Big data technology is now being used to find the credit worthiness based on latest business deals of an organization, partnership organizations, and new products that are to be launched. In the case of individuals, the credit worthiness is determined based on their social activity, their interest, and purchasing behavior.

Financial institutions are exposed to fraudulent activities by consumers, which cause heavy losses. Predictive analytics tools of big data are used to identify new patterns of fraud and prevent them. Data from multiples sources such as shopping patterns and previous transactions are correlated to detect and prevent credit card fraud by utilizing in‐memory technology to analyze terabytes of streaming data to detect fraud in real time.

Big data solutions are used in financial institutions call center operations to predict and resolve customer issues before they affect the customer; also, the customers can resolve the issues via self‐service giving them more control. This is to go beyond customer expectations and provide better financial services. Investment guidance is also provided to consumers where wealth management advisors are used to help out consumers for making investments. Now with big data solutions these advisors are armed with insights from the data gathered from multiple sources.

Customer retention is becoming important in the competitive markets, where financial institutions might cut down the rate of interest or offer better products to attract customers. Big data solutions assist the financial institutions to retain the customers by monitoring the customer activity and identify loss of interest in financial institutions personalized offers or if customers liked any of the competitors’ products on social media.

Chapter 1 Refresher

1 Big Data is _________.StructuredSemi‐structuredUnstructuredAll of the aboveAnswer:dExplanation: Big Data is a blanket term for the data that are too large in size, complex in nature, and which may be structured, unstructured, or semi‐structured and arriving at high velocity as well.

2 The hardware used in big data is _________.High‐performance PCsLow‐cost commodity hardwareDumb terminalNone of the aboveAnswer:bExplanation: Big data uses low‐cost commodity hardware to make cost‐effective solutions.

3 What does commodity hardware in the big data world mean?Very cheap hardwareIndustry‐standard hardwareDiscarded hardwareLow specifications industry‐grade hardwareAnswer:dExplanation: Commodity hardware is a low‐cost, low performance, and low specification functional hardware with no distinctive features.

4 What does the term “velocity” in big data mean?Speed of input data generationSpeed of individual machine processorsSpeed of ONLY storing dataSpeed of storing and processing dataAnswer:d

5 What are the data types of big data?Structured dataUnstructured dataSemi‐structured dataAll of the aboveAnswer:dExplanation: Machine‐generated and human‐generated data can be represented by the following primitive types of big dataStructured dataUnstructured dataSemi‐Structured data

6 JSON and XML are examples of _________.Structured dataUnstructured dataSemi‐structured dataNone of the aboveAnswer:cExplanation: Semi‐structured data are that which have a structure but do not fit into the relational database. Semi‐structured data are organized, which makes it easier for analysis when compared to unstructured data. JSON and XML are examples of semi‐structured data.

7 _________ is the process that corrects the errors and inconsistencies.Data cleaningData IntegrationData transformationData reductionAnswer:aExplanation: The data‐cleaning process fills in the missing values, corrects the errors and inconsistencies, and removes redundancy in the data to improve the data quality.

8 __________ is the process of transforming data into an appropriate format that is acceptable by the big data database.Data cleaningData IntegrationData transformationData reductionAnswer:cExplanation: Data transformation refers to transforming or consolidating the data into an appropriate format that is acceptable by the big data database and converting them into logical and meaningful information for data management and analysis.

9 __________ is the process of combining data from different sources to give the end users a unified data view.Data cleaningData integrationData transformationData reductionAnswer:b

10 __________ is the process of collecting the raw data, transmitting the data to a storage platform, and preprocessing them.Data cleaningData integrationData aggregationData reductionAnswer:c

Conceptual Short Questions with Answers

1 What is big data? Big data is a blanket term for the data that are too large in size, complex in nature, which may be structured or unstructured, and arriving at high velocity as well.

2 What are the drawbacks of traditional database that led to the evolution of big data? Below are the limitations of traditional databases, which has led to the emergence of big data.Exponential increase in data volume, which scales in terabytes and petabytes, has turned out to become a challenge to the RDBMS in handling such a massive volume of data.To address this issue, the RDBMS increased the number of processors and added more memory units, which in turn increased the cost.Almost 80% of the data fetched were of semi‐structured and unstructured format, which RDBMS could not deal with.RDBMS could not capture the data coming in at high velocity.

3 What are the factors that explain the tremendous increase in the data volume? Multiple disparate data sources are responsible for the tremendous increase in the volume of big data. Much of the growth in data can be attributed to the digitization of almost anything and everything in the globe. Paying e‐bills, online shopping, communication through social media, e‐mail transactions in various organizations, a digital representation of the organizational data, and so forth, are some of the examples of this digitization around the globe.

4 What are the different data types of big data? Machine‐generated and human‐generated data can be represented by the following primitive types of big dataStructured dataUnstructured dataSemi‐Structured data

5 What is semi‐structured data? Semi‐structured data are that which have a structure but does not fit into the relational database. Semi‐structured data are organized, which makes it easier for analysis when compared to unstructured data. JSON and XML are examples of semi‐structured data.

6 What does the three Vs of big data mean? Volume–Size of the dataVelocity–Rate at which the data is generated and is being processedVariety–Heterogeneity of data: structured, unstructured, and semi‐structured

7 What is commodity hardware? Commodity hardware is a low‐cost, low‐performance, and low‐specification functional hardware with no distinctive features. Hadoop can run on commodity hardware and does not require any high‐end hardware or supercomputers to execute its jobs.

8 What is data aggregation? The data aggregation phase of the big data life cycle involves collecting the raw data, transmitting the data to a storage platform, and preprocessing them. Data acquisition in the big data world means acquiring the high‐volume data arriving at an ever increasing pace.

9 What is data preprocessing? Data preprocessing is an important process performed on raw data to transform it into an understandable format and provide access to a consistent

Скачать книгу