Types of Machine Learning Problems
Artificial intelligence has many subfields, one of which is machine
learning. We list the key categories of issues it addresses on this page.
Supervised learning
The goal of supervised learning, which aims to learn how to
generate predictions from a collection of labeled instances (i.e. examples that
are accompanied by the value to be predicted), is to understand machine
learning problems. The labels supervise the algorithm's learning by acting as
"teachers" and providing feedback.
Definition of Supervised Learning
The area of machine learning known as supervised learning is
concerned with issues that can be formalized as follows: Given a function that
connects the data in space X to the labels in space Y, n observations described
in space X, and their labels described in space Y.
The plan is to identify this function using the data. The database, a significant data set, serves
as the beginning point. The algorithm can gain knowledge from this database.
However, in supervised learning, the computer is already aware of the predicted
responses. It uses labeled data to operate.
Let's take the example of an application designed to
automatically recognize spam. To train it, it is presented with e-mails labeled
as "desirable" or "spam". Using techniques derived from
statistics and probability, the algorithm then understands the characteristics
that enable it to classify these e-mails in each of these categories.
As it is presented
with new e-mails, it can identify them by assigning a probability score. For
example: "This email has a 95% chance of being spam". And his first
answers will be corrected by hand so that he can improve as he goes along.
Binary categorization
Binary labels serve as indicators of class membership. The
term for this is binary classification.
A binary classification problem is a supervised learning
issue where the label space is binary, Y ={0,1}.
Example
Examples of binary categorization issues are shown below:
- Determining whether a financial transaction is phony.
Regression
Regression is used when real-valued labels are involved.
Regression problems are supervised learning issues where the
label space is Y = R.
Example
Regression issues include the following:
- Predicting how many times a link will be clicked;
- Predicting how many people will be using an online service
at a certain moment;
- Predicting the price of a stock on the stock market;
- Predicting the binding affinity between two molecules;
- Predicting the yield of a corn plant.
Structured regression
We refer to structured regression if the label space is a
more intricately organized space than those previously described. For instance,
predicting vectors, pictures, graphs, or sequences may be necessary. Speech
recognition and machine translation are only two examples of the many issues
that can be formalized using structured regression.
Unsupervised learning
Because the answers we're attempting to anticipate are not present in the dataset, unsupervised learning differs from supervised learning in this regard. Here, an unlabeled dataset is used by the algorithm. The goal is to model the observations so that we can better comprehend them.
The machine
is then instructed to generate original responses. It suggests solutions based
on the grouping and analysis of data. The following are a few examples of jobs
that can be completed using this approach.
Unsupervised learning is the area of artificial intelligence
that deals with issues that can be formulated as follows: given n observations
{~x i}i=1,...,n described in a space X, the aim is to develop a function on X
that validates specific properties.
Clustering
The computer is instructed to divide up the items into as
many homogeneous data sets as it can. This method may appear to be comparable
to classification in supervised learning, but unlike that approach, the
computer "invents" its own classes with a level of sophistication
that isn't necessarily clear to a person.
The process of clustering or partitioning entails locating groups
within the data. As a result, it is possible to comprehend their general traits
and perhaps extrapolate an observation's qualities based on the group to which
it belongs.
Therefore, finding a partition of data can be described as
an unsupervised learning task known as partitioning or clustering. This
partition must be pertinent to one or more of the listed criteria.
Example
A few illustrations of partitioning issues are as follows:
- Market segmentation entails locating groups of consumers
or clients who exhibit comparable behaviors. This allows for a deeper
understanding of their profile and the ability to particularly target certain
demographics with an advertising campaign, content, or action.
- Locate collections of documents that share a common subject without first subject-tagging them. Large banks of texts can now be organized as a result.
- Similar pixels are grouped together to represent them more
effectively in the partitioning issue that can be used to develop the concept
of image compression.
- Identifying pixels in an image that are part of the same
region is known as image segmentation.
- Subtypes of an illness can be found by grouping people who
share the same symptoms, and these subtypes can then be treated differently.
Dimension reduction
Another significant family of unsupervised learning problems is dimension reduction. In order to do this, the data must be represented in a space with a lower dimension than the space in which it was initially represented.
This not only cuts down on the amount of time needed for
calculation and memory space needed to keep the data, but it also frequently
enhances the effectiveness of a supervised learning algorithm that has been
trained on this data in the past.
An unsupervised learning problem known as "dimension
reduction" involves finding a space Z with fewer dimensions than the space
X in which n observations are represented. Certain qualities must be confirmed
by the data projections onto Z.
Note:
The goal of several supervised dimension reduction
techniques is to identify the most pertinent representation for label
prediction.
Density estimation :
Last but not least, a sizable family of unsupervised
learning issues is a classic statistical issue: estimating a probability distribution assuming the dataset is a random sample.
Semi-supervised learning
As one might anticipate, semi-supervised learning involves extracting
labels from a dataset that has only been partially tagged. The first benefit of
this strategy is that it avoids labeling all the training instances, which is
important when accumulating data is simple but labeling it involves some manual
labor.
Consider the example of picture classification: obtaining a database with hundreds of thousands of images is simple, but assigning each image a label of interest can be quite time-consuming.
Furthermore, labels
provided by people will probably reflect their own prejudices, which a fully
supervised algorithm will also reflect. This pitfall can occasionally be
avoided by semi-supervised learning. This is a more complex topic that we won't
cover in this book.
Reinforcement learning
In reinforcement learning, the learning system can interact with its environment and take actions; in return, it receives a reward, which may be positive if the action was a good choice, or negative if it was not.
The
reward can occasionally come after a long sequence of actions, as in the case
of a system learning to play chess; in this case, learning entails defining a
policy, i.e. a strategy for methodically obtaining rewards.
In essence, everything for the user depends on the database
he wants the artificial intelligence to operate on and the issue he is trying
to find solutions for. Supervised learning is appropriate for him if his
database is labeled and he is confident of the categories he wishes to
classify his data. He should choose unsupervised learning if his data is not
categorized and doing so would be too expensive. He can create autonomous
devices with the help of reinforcement learning.