What is Big Data?
The history of big data
Big Data is a phenomenon that started when there was an abundance of data that could not be processed with conventional methods. Search engines like Google and Yahoo were responsible for the initial Big Data projects.
These players have to deal with the issues of
scalability and user query response times. Soon after, other businesses like
Amazon and Facebook did the same. Due to the advantages it provides in terms of
data storage, processing, and analysis, big data has become a trend that cannot
be avoided by many industrial actors.
Definitions
Big Data has been defined in a number of ways, but they all share many
of the same ideas. The key ones are as follows:
According to Gartner, big data refers to information assets with a high
volume, high velocity, and/or high diversity that call for novel processing
techniques in order to improve decision-making, uncover new insights, and
streamline processes.
The Library of Congress says that the term "big data" is a
changing target that includes both what can be handled and stewarded by a
single institution inside an organization using standard methods and what is
distinctive to that organization. The
idea of a large data set may be small to one researcher or institution and
large to another.
The three Vs—Volume, Variety, and Velocity—define big data. Other Vs,
including Value, have been introduced by some authors.
Volume: the amount of data that has been gathered (in gigabytes,
terabytes, etc.);
Variety and Velocity are terms used to describe the different origins of
data sources, which can be either structured or unstructured (e.g., photos,
emails, tweets, geo-location data, etc.). Velocity describes the rate at which
data is processed concurrently.
These qualities are also known by the term dimensions. Some experts believe that we are in a Big
Data situation as soon as one of these variables is present.
Use cases for big data:
Big Data has numerous applications in the business, retail, banking,
insurance, transportation, leisure, and telecommunications sectors. Here are a
few instances:
Transport:
- Traffic control: regulating traffic flow and precisely estimating
travel time from one location to another by the use of all sorts of data (GPS,
radar, probes, etc.);
- Travel planning: allowing the public access to data previously only
available to administrations (saving time/saving money),
- Applications of NTIC (New Information and Communication Technologies) for the transportation industry are called intelligent transport systems (ITS).
Autonomous vehicles, cooperative vehicles, and satellite positioning systems
were among the current topics highlighted at the 20th World Congress on
Intelligent Transport Systems3.
An illustration of the use of big data to show real-time transportation
information in the city of London, including data from buses, vehicles, trains,
bicycles, and planes.
Financial institutions and insurers:
Banks and insurance companies went to big data to identify the cause of customer unhappiness with the services they provided.
The significance of mobile services and the degree of customization was
the key conclusion. It turned out that they had a significant impact on how
much customers valued the caliber of the services.
Measures were taken by analyzing information that, for the most part,
these banks and insurance businesses already owned in order to develop a
long-lasting and appropriate customer relationship.
As a result, they were able to build their mobile product through the
appropriate channels and realize that innovation and client expectations go
hand in hand.
E-commerce platforms and businesses generally:
It became evident that mass
discourse and overly wide classification no longer correspond to the
current market when faced with the competition of the e-commerce sector and the
erratic nature of consumers (the average browsing time on an e-commerce site
has decreased to less than 5 minutes).
The greatest technique to draw the target's attention was immediately
determined to be through personalized navigation. Thanks in particular to Big
Data's facilitation of individualized product recommendations.
Following this investigation, a number of e-commerce sites today are
able to provide a fluid navigation that is tailored to their users.
For instance, Amazon customizes its home page based on customers' preferences,
interests, past searches, and data mining.
In contrast, Netflix creates over 33 million unique home pages to
provide its users with appealing content!
Health:
- The use of data for
epidemiological research. Consider the "Openhelth.fr" website, which
provides real-time data on French population health together with associated
maps (epidemics, allergies, etc.).
- Making use of data that has been sitting around for a while but has
never been used to discover causal relationships in "legacy data,"
- Follow-up with patients (patient medical records).
Economy:
- Increased pleasure, individualized and targeted activities, and
knowledge of the customer
- A quicker review of consumer data to spot unusual activity,
- Marketing segmentation (for instance, microsegmentation).
- Predictive analysis of consumer behavior.
Research:
voice-to-text technologies (automated transcription of spoken voice) and
machine translation technologies (automatic translation of written speech) are
two techniques that coexist in NLP. Automatic indexing of picture and video
streams, as well as facial and object recognition, are the two areas of image
processing that are currently gaining ground.
Using data analysis methods:
There are three primary categories of data analysis techniques for big
data:
- The goal of descriptive approaches is to draw attention to information
that is present but is obscured by the amount of data. In descriptive analysis,
the following methods and techniques are employed:
+ Factor analysis (PCA and ACM) or Moving center method
+ Hierarchical clustering
+ Neural clustering
+ Association search
- Predictive techniques try to infer new information from data already
available. The key artificial intelligence techniques used in this method include Bayesian classification, Decision trees, Neural networks, Support vector
machines (SVM), and K-nearest neighbors (KNN).
- Prescriptive approaches seek to pinpoint and foresee the best course
of action or decision to make in order to reach the intended outcome.
The necessity to manage enormous amounts of data gave rise to the phenomenon known as "Big Data," which has completely altered how information is processed today. It has its roots in the original web-based information retrieval initiatives like Google and Yahoo, which struggled with scalability and responsiveness issues.
Big Data became an unavoidable trend for
many businesses as other significant competitors, including Amazon and
Facebook, gradually did the same. Volume, Variety, and Velocity are the three
Vs that are highlighted in definitions of big data, however, others also include
value as a crucial component.
A problem must be "Big Data" if it can't be solved with conventional techniques and has a sizeable volume, such as Petabytes, Terrabytes, or Exabytes. Numerous industries, including transportation, healthcare, economics, and research, use big data applications.
Additionally, Big Data
analysis uses three types of methodologies: descriptive, predictive, and
prescriptive. Each of these methods aims to expose information, extrapolate new
data, or identify the best course of action to accomplish particular
objectives.