Smarter Data Science. Cole Stryker
necessity for being able to use data for input into machine learning algorithms.
There will be many situations when an AI system needs to process or analyze a corpus of data with far less structure than the type of organized data typically found in a financial or transactional system. Fortunately, learning algorithms can be used to extract meaning from ambiguous queries and seek to make sense of unstructured data inputs.
Learning and reasoning go hand in hand, and the number of learning techniques can become quite extensive. The following is a list of some learning techniques that may be leveraged when using machine learning and data science:
Active learning
Deductive inference
Ensemble learning
Inductive learning
Multi-instance learning
Multitask learning
Online learning
Reinforcement learning
Self-supervised learning
Semi-supervised learning
Supervised learning
Transduction
Transfer learning
Unsupervised learning
Some learning types are more complex than others. Supervised learning, for example, is comprised of many different types of algorithms, and transfer learning can be leveraged to accelerate solving other problems. All model learning for data science necessitates that your information architecture can cater to the needs of training models. Additionally, the information architecture must provide you with a means to reason through a series of hypotheses to determine an appropriate model or ensemble for use either standalone or infused into an application.
Models are frequently divided along the lines of supervised (passive learning) and unsupervised (active learning). The division can become less clear with the inclusion of hybrid learning techniques such as semisupervised, self-supervised, and multi-instance learning models. In addition to supervised learning and unsupervised learning, reinforcement learning models represent a third primary learning method that you can explore.
Supervised learning algorithms are referred to as such because the algorithms learn by making predictions that are based on your input training data against an expected target output that was included in your training dataset. Examples of supervised machine learning models include decision trees and vector machines.
Two specific techniques used with supervised learning include classification and regression.
Classification is used for predicting a class label that is computed from attribute values.
Regression is used to predict a numerical label, and the model is trained to predict a label for a new observation.
An unsupervised learning model operates on input data without any specified output or target variables. As such, unsupervised learning does not use a teacher to help correct the model. Two problems often encountered with unsupervised learning include clustering and density estimation. Clustering attempts to find groups in the data, and density estimation helps to summarize the distribution of data.
K-means is one type of clustering algorithm, where data is associated to a cluster based on a means. Kernel density estimation is a density estimation algorithm that uses small groups of closely related data to estimate a distribution.
In the book Artificial Intelligence: A Modern Approach, 3rd edition (Pearson Education India, 2015), Stuart Russell and Peter Norvig described an ability for an unsupervised model to learn patterns by using the input without any explicit feedback.
The most common unsupervised learning task is clustering: detecting potentially useful clusters of input examples. For example, a taxi agent might gradually develop a concept of “good traffic days” and “bad traffic days” without ever being given labeled examples of each by a teacher.
Reinforcement learning uses feedback as an aid in determining what to do next. In the example of the taxi ride, receiving or not receiving a tip along with the fare at the completion of a ride serves to imply goodness or badness.
The main statistical inference techniques for model learning are inductive learning, deductive inference, and transduction. Inductive learning is a common machine learning model that uses evidence to help determine an outcome. Deductive inference reasons top-down and requires that each premise is met before determining the conclusion. In contrast, induction is a bottom-up type of reasoning and uses data as evidence for an outcome. Transduction is used to refer to predicting specific examples given specific examples from a domain.
Other learning techniques include multitask learning, active learning, online learning, transfer learning, and ensemble learning. Multitask learning aims “to leverage useful information contained in multiple related tasks to help improve the generalization performance of all the tasks” (arxiv.org/pdf/1707.08114.pdf). With active learning, the learning process aims “to ease the data collection process by automatically deciding which instances an annotator should label to train an algorithm as quickly and effectively as possible” (papers.nips.cc/paper/7010-learning-active-learning-from-data.pdf). Online learning “is helpful when the data may be changing rapidly over time. It is also useful for applications that involve a large collection of data that is constantly growing, even if changes are gradual” (Stuart Russell and Peter Norvig, Artificial Intelligence: A Modern Approach, 3rd edition, Pearson Education India, 2015).
LEARNING
The variety of opportunities to apply machine learning is extensive. The sheer variety gives credence as to why so many different modes of learning are necessary:
Advertisement serving
Business analytics
Call centers
Computer vision
Companionship
Creating prose
Cybersecurity
Ecommerce
Education
Finance, algorithmic trading
Finance, asset allocation
First responder rescue operations
Fraud detection
Law
Housekeeping
Elderly care
Manufacturing
Mathematical theorems
Medicine/surgery
Military
Music composition
National security
Natural language understanding
Personalization
Policing
Political
Recommendation engines
Robotics, consumer
Robotics, industry
Robotics, military
Robotics, outer space
Route planning
Scientific discovery
Search