What is a convolutional neural network (CNN)?
Defining convolutional neural networks (or
CNNs) is essential to understanding their role in computer vision. They are
revolutionizing the way we process images and visual features.
Traditional image classification methods
Traditionally, to classify images, image
processing techniques were used which required the features of each image to be
extracted from the dataset. A classifier was then trained on these features.
The performance of this process depended largely on the quality of the
extracted features. Several feature extraction and description methods were
used, such as the SIFT algorithm, which was popular for its ability to detect
and describe relevant features.
However, it's important to note that, in practice, perfect classification is rarely achieved, as classification error is never zero. To improve results, we could either develop new feature extraction methods specific to the images under study or opt for a "better" classifier.
The Emergence of AlexNet: A Revolution in 2012
However, in 2012, a major revolution occurred at the annual ILSVRC computer vision competition: a new Deep Learning algorithm called AlexNet set new records. Convolutional neural networks take a similar approach to traditional supervised learning methods for image classification. They take images as input, automatically detect their features, and train a classifier on these features. The crucial difference is that CNNs automatically learn these features during the training phase.
The Fundamentals of Convolutional Neural Networks
During training, the aim is to minimize the
classification error by adjusting both the classifier parameters and the
features extracted from the images. In addition, the specific architecture of
CNNs enables the extraction of features of varying complexity, from the
simplest to the most sophisticated. This capacity for automatic, hierarchical
feature extraction, adapted to the given problem, is one of the major strengths
of convolutional neural networks. No need to manually implement extraction
algorithms such as SIFT or Harris-Stephens.
Unlike traditional supervised learning
methods, where features are extracted manually, CNNs learn features directly
from images, which is their distinctive advantage. Convolutional neural
networks, also known as CNNs or ConvNet, remain today's most powerful models
for image classification.
CNN Architecture: Automatic Feature Extraction and the CNN Classification Process
The distinctive feature of CNNs is their
multi-block architecture. The first block functions as a feature extractor,
using convolution filtering operations to perform template matching on the
image. The extracted features are then normalized and resized.
The second block, although not specific to
CNNs, is generally found at the end of all neural networks used for
classification. The values of the input vector are transformed using linear
combinations and activation functions to produce a new output vector. This
latter vector represents the probability of the image belonging to each class.
Each element of this vector is a probability between 0 and 1, and the sum of
all elements equals 1. These probabilities are calculated by the last layer of
the network, which uses a logistic function for binary classification or a
softmax function for multi-class classification as the activation function.
As with conventional neural networks, layer
parameters are adjusted by gradient backpropagation during the training phase.
However, in the case of CNNs, these parameters are mainly linked to image
characteristics.
Today, convolutional neural networks, also
known as CNNs or ConvNet, remain the models of choice for image classification,
thanks to their ability to automatically learn features from raw data. We'll be
coming back to this topic shortly for further clarification and details.
See also: