The Big Data Phenomenon – Everything You Should Know.! The quantitative explosion of digital data has forced researchers to find new ways of seeing and analyzing the world. It’s about discovering new orders of magnitude for capturing, searching, sharing, storing, analyzing and presenting data. So was born the ” Big Data “. Read the comprehensive article on Big Data from Coding compiler.
The Big Data Phenomenon
This is a concept for storing an unspeakable amount of information on a digital basis. According to the archives of the Association for Computing Machinery (ACM) digital library in scientific articles on the technological challenges of visualizing “large data sets”, this name appeared in October 1997.
What is Big Data?
Literally, these terms mean big data, large data or big data. They refer to a very large set of data that no conventional database management or information management tool can really work. In fact, we procreate about 2.5 trillion bytes of data every day.
This is information from everywhere: messages we send, videos we publish, weather information, GPS signals, transactional online shopping records and more. These data are called Big Data or massive volumes of data. The giants of the Web, first and foremost Yahoo ( but also Facebook and Google), were the first to deploy this type of technology.
However, no precise or universal definition can be given to Big Data. Being a polymorphic complex object, its definition varies according to the communities that are interested in it as a user or a service provider.
A transdisciplinary approach makes it possible to understand the behavior of the different actors: the designers and suppliers of tools (the computer scientists), the categories of users (managers, managers of companies, political decision-makers, researchers), the actors of the health and users.
Big data does not derive from the rules of all technologies, it is also a dual technical system. Indeed, it brings benefits but can also generate disadvantages. Thus, it serves speculators in the financial markets, autonomously with the key, the formation of hypothetical bubbles.
The advent of Big Data is now seen by many items like a new industrial revolution similar to the discovery of steam (early 19th century), electricity (late 19th century) and IT ( late 20th century). Others, a little more measured, describe this phenomenon as the last stage of the third industrial revolution, which is in fact that of “information”. In any case, Big Data is considered a source of profound upheaval in society.
Big Data – Mass Data Analysis
Invented by the giants of the web, Big Data is a solution designed to allow everyone to access real-time giant databases. It aims to offer a choice of traditional database and analysis solutions (Business Intelligence platform in SQL server …).
According to the Gartner, this concept brings together a family of tools that respond to a triple problem called the 3V rule. These include an issue considerable data to be processed, a large variety of information (from various sources, unstructured, organized, Open …), and a certain level of velocity to achieve, ie frequency creation, collection and sharing of these data.
Technological Developments Behind Big Data
The technological creations that have facilitated the advent and growth of Big Data can be broadly categorized into two families: on the one hand, storage technologies, driven particularly by the deployment of cloud computing.
On the other hand, the arrival of adapted processing technologies, especially the development of new databases adapted to unstructured data (Hadoop) and the development of high-performance computing modes (MapReduce).
There are several solutions that can come into play to optimize the processing time on giant databases, namely NoSQL databases (such as MongoDB, Cassandra or Redis ), the server infrastructures for the distribution of the processing on the nodes and storage of data in memory:
The first solution makes it possible to implement storage systems considered to be more efficient than the traditional SQL for mass data analysis (key/value oriented, document, column or graph).
The second is also called massively parallel processing. The Hadoop Framework is an example. It combines the HDFS distributed file system, the NoSQL HBase database, and the MapReduce algorithm.
As for the last solution, it speeds up the processing time of requests.
Evolution of Big Data: the development of Spark and the end of MapReduce
Each technology, belonging to the megadonna system, has its utility, its strengths and its disadvantages. As a constantly evolving environment, Big Data always seeks to optimize the performance of tools.
Thus, its technological landscape moves very quickly, and new solutions are born very frequently, with the aim of further optimizing existing technologies. To illustrate this evolution, MapReduce and Spark are very concrete examples.
Described by Google in 2004, MapReduce is a pattern implemented later in Yahoo’s Nutch project, which will become the Apache Hadoop project in 2008. This algorithm has a great capacity for data storage. The only problem is that it is a bit slow.
This slowness is particularly visible on modest volumes. In spite of this, the solutions, wishing to propose quasi-instantaneous treatments on these volumes, begin to leave MapReduce. In 2014, Google announced that it will be replaced by a SaaS solution called Google Cloud Dataflow.
Spark is also an iconic solution for simply writing distributed applications and offering classic processing libraries. In the meantime, with remarkable performance, he can work on data on disk or data loaded in RAM. Yes, he is younger but he has a huge community.
It is also one of the Apache projects with a fast development speed. In short, it is a solution that proves to be the successor of MapReduce, especially since it has the advantage of merging many of the tools needed in a Hadoop cluster.
The main players in the Big Data market
The Big Data industry has attracted many. They have positioned themselves quickly in various sectors. In the IT sector, there are the historical suppliers of IT solutions such as Oracle, HP, SAP or IBM. There are also Web players, including Google, Facebook, or Twitter.
Data and Big Data specialists include MapR, Teradata, EMC or Hortonworks. Capgemini, Sopra, Accenture or Atos are integrators, still major players in big data. In the area of analytics, as BI editors, we can mention SAS, Microstrategy, and Qliktech.
This sector also includes specialized suppliers in analytics like Datameer or Zettaset. In parallel with these main participants, many SMEs specializing in Big Data have appeared throughout the entire value chain of the sector.
Big Data: disruptive innovations that change the game
Big Data and analytics are used in almost every field. They have even built an important place in society. They translate in many forms to mention only the use of statistics in high-level sport, the PRISM surveillance program of the NSA, analytical medicine or Amazon recommendation algorithms.
In business especially, the use of Big Data & Analytics tools generally meets several objectives such as the improvement of the customer experience, the optimization of the processes and the operational performance, the reinforcement or diversification of the business model.
From new opportunities, significant competitive differentiation is generated by the era of managing large volumes of data and analysis. For organizations, there are a number of reasons to consider this new data administration: cost-effective data management, optimization of information storage, the ability to perform programmable scans, and the ease of manipulation of data.
Big Data, exclusively for marketing and sales functions?
This technology is a prime trade issue for all because of its ability to impact deep trade in the integrated global economy.
Indeed, companies, regardless of size, are among the first to benefit from the advantages derived from the proper handling of massive data.
However, big data also plays a key role in the transformation of processes, the supply chain, machine-to-machine exchanges in order to develop a better “information ecosystem”.
They also make decisions faster and more credible, taking into account information internally but also external to the organization. In the meantime, they can serve as support for risk and fraud management.
Before so much information, how to sort the wheat of the chaff?
As the old saying goes, ” too much information is killing information “. This is actually the main problem with big data. The huge amount of information is one of the obstacles. The other obstacle obviously comes from the level of certainty that we can have on a datum.
Indeed, the data that comes from digital marketing can be considered “uncertain” information, insofar as we can not be sure who is clicking on an offer included in a URL. The volume of data associated with the lack of credibility of these makes its exploitation more convoluted.
However, thanks to statistical algorithms, solutions exist. It is also, before even asking whether it would be possible to collect and store big data, that one should always start by questioning its ability to analyze them and their usefulness.
With an appropriately determined purpose and data of sufficient quality, statistical algorithms and methods now allow for the design of value when it was not yet feasible just a few years ago.
In this regard, we can distinguish from the types of schools in the predictive field namely artificial intelligence or “machine learning” and statistics. These two sectors, although distinct, finally come together more and more. In addition, they can be used simultaneously in a virtuous and intelligent way to carry out a project.
Where the use of big data in management becomes a vital issue for companies.
Among the most enthusiastic users of Big Data are managers and economists. These define this phenomenon by the 5V rule (Volume, Velocity, Variety, Veracity, Value).
Volume
The volume is the mass of information produced every second. According to studies, to get an idea of the exponential increase in the mass of data, it is considered that 90% of the data were generated during the years when the use of the Internet and social networks grew strongly.
The set of all the data produced since the beginning of time until the end of the year 2008, would be appropriate now to the mass of those which are generated every minute. In the business world, the amount of data collected each day is of vital importance.
Velocity
Velocity equals the speed of development and deployment of new data. For example, if you post messages on social networks, they can become “viral” and spread in no time. It is a question of analyzing the data in the course of their lineage (sometimes called analysis in memory) without it being essential that this information is stored in a database.
The variety
Only 20% of the data is structured and stored in relational database tables similar to those used in managed accounting. The remaining 80% is unstructured.
It can be images, videos, texts, voices, and many more … Big Data technology allows you to analyze, compare, recognize, categorize data of different types such as conversations or messages on social networks, pictures on different sites etc. These are the different elements that make up the variety offered by Big Data.
Veracity
The veracity concerns the reliability and credibility of the information collected. Since Big Data can collect an indefinite number and many forms of data, it is difficult to justify the authenticity of the contents, if we consider a Twitter post with abbreviations, colloquial language, hashtags, shells, etc. However, computer geniuses are developing new techniques that will facilitate the management of this type of data including the W3C.
The value
The notion of value is the profit that can be derived from the use of Big Data. These are usually the companies that are starting to get incredible benefits from their Big Data.
According to managers and economists, companies that are not serious about Big Data risk being penalized and dismissed. Since the tool exists, not using it would lead to losing a competitive privilege.
The rise of big data in medicine
Big data in medicine. Medicine is an art that uses science. Indeed, a medical practitioner is simultaneously a scientist who has obtained knowledge in biophysics, medical and surgical semiology, anatomy, biochemistry, physiology, biology … and an artist who masters skills to perform appropriate therapeutic gestures.
From now on, traditional knowledge is no longer sufficient to amplify the power of a physician in investigation and care. He has also learned to master increasingly sophisticated technologies in different medical specialties.
We are witnessing the rise of medical biological engineering or GBM. This alternative offers physicians new possibilities for diagnosis namely, imaging devices: scintigraphy, ultrasound, magnetic resonance imaging (MRI) etc.
Biological analyzers, signal analysis devices such as electrocardiogram (ECG) and electroencephalogram (EEG), as well as devices for the treatment of pathologies (dialysis, laser, respiratory assistance, nuclear medicine, etc.). ) are also among the fruits of the technology/medicine alliance.
Mostly driven by specialized computers that are directly or indirectly connected to a computer network, these devices can collect various information about patients.
They present themselves as new means of investigation, acquisition, and storage of data, comparison of the information that treating physicians can implement in order to increase their reactivity in the various clinical steps essential to the taking into account and charge of their patients. They can also use it to conduct epidemiological studies of diseases in the population.
The Future of Big Data
Being a big trend, Big Data is not a fashion. In the field of usage, it satisfies a need to work the data more deeply, to create value, together with technological skills that did not exist in the past. However, given the evolution of technologies that do not seem to fade, we can not then speak of a real standard or standards in the field of Big Data.
Many applications of Big Data are just their preludes and we can expect to see uses that are not expected today. In a way, Big Data is a turning point for organizations at least as important as the Internet in its day.
Every company has to start now. If not, there is a risk that they will realize within a few years that they have been overtaken by competition. Governments and public bodies are also addressing the issue through open data.
Massive data: a booming global market
In a few years, the big data market will be measured in hundreds of billions of dollars. It’s a new Eldorado for business. According to studies, it is even a wave of substance where we find the combination of BI (business intelligence), analytics and the Internet of Things.
IDC says it should grow beyond $ 125 billion before the end of 2015. Indeed, several studies are flocking to this assertion and all confirm that the budgets that companies will spend on Big Data will only know strong increases.
Thus, nothing but the market for visual solutions of discoveries of information related to the management of massive data will grow 2.5 times faster than BI solutions by 2018.
According to the calculations made by Vanson Bourne, worldwide, all spending on Big Data, in the IT budgets of large companies, should represent a quarter of the total IT budget in 2018, if there is one still at 18% currently. Capgemini also commissioned a study in March 2015. The result showed that 61% of companies are aware of the usefulness of Big Data as a “growth engine in its own right”.
As a result, it is given much more important than their existing products and services. The same study also indicated that 43% of them have already reorganized or are currently restructuring to exploit the potential of Big Data.
Hi,
This is a interesting post… but when can I get informatión about this “According to the archives of the Association for Computing Machinery (ACM) digital library in scientific articles on the technological challenges of visualizing “large data sets”, this name appeared in October 1997.” ?
Thanks in advance.