Big Data is the hottest trend right now in Enterprise Data Management and Analytics. If you spent some time on internet or read tech magazine, i am 100% sure that you know the term Big Data. In this post i am going to explain about Big Data Technology in very simple terms, so that it will be easy for you to understand.
What is Big Data?
To get you started lets say, Any kind of data source that has below three characteristics, can be defined as Big Data.
- Extremely large Volumes of data
- Extremely high Velocity of data
- Extremely wide Variety of data
You get the basic picture of Big Data, now before going into more detail what is big data? Let me tell you about Big Data Evolution.
In late 1960, Data used to store in normal Flat Files and has no structure and due to that getting the detailed information about customer habit and behavior was a very hectic programming task. Later with the introduction of Relational Data Model and Relational Database Management System(RDBMS) , it was possible to maintain structured Data. Now it was easier for programmers to satisfy business demands by extracting meaningful information from data. With these advancement in Data management now every company started to store huge amount of data.
When the volume of data that organizations needed to manage grew out of control, the data warehouse provided a solution. As long as stored data was structured it was easy to get the insight with the help of available Data Mining and Reporting Tools.
Now lets talk about the current situation, we have come very long way with the advancement of the internet and we are producing data in staggering amounts and 80% of these are unstructured Data. With unstructured Data i am referring to digitized documents, photographs, videos, audio files, Twitter tweets , social networking posts, e-mails, text messages, phone records, search engine queries and lot of other thing which you do online. Extracting meaningful information from these unstructured data using normal data mining tools was impossible task. And due to this specific requirement Big Data Technology comes into picture.
Big Data is a combination of old and new technology that helps business gain actionable insight. We can say Big Data is the capability to manage huge volume of disparate data, at the right speed, and within the right time frame to allow real-time analysis and reaction. As mentioned earlier its characteristics can be represented using three V’s
- Volume : How much data
- Variety : The various types of data
- Velocity : How fast that data is processed
Any Data set can be categorized as Big Data if only above three rules applies to that. To get the meaningful information from any business process, first all the structured and non structured data is captured then organised and finally integrated into one set to give single view. After this phase data can be analyzed based on the problem being addressed.
Let me give you a simple example, if you visit any shopping portal like Amazon, Best Buy and browse the product and then leave the site without doing and shopping. When next time you visit then based on your last visit you may see related product and discount coupon and may be due this you will buy the product. Let me explain you what happened here, when you visited first time then all your browsing behavior and interest are captured and stored and later in second visit Amazon has information about your interest and using this information amazon showed you related product and discount coupon. All this decision was done in fraction of second and it was possible with the help of Big Data Technology.
The real advancement in big data happened as companies like Yahoo!, Google, and Facebook came to the realization that they needed help in monetizing the massive amounts of data their offerings were creating. In particular, the innovations MapReduce, Hadoop, and Big Table proved to be the sparks that led to a new generation of data management. Now using these, companies like Facebook and Google are earning billions of dollar from advertising using captured user data and then targeting based on interest and behavior.
What is MapReduce, Hadoop, and Big Table?
It was designed by Google to run a set of functions against a large amount of data in batch mode. The “map” function is used to distribute task across a large number of system and manage the task in such a way that load is balanced and recoverable from failure. After these task executed on all system and computation is completed then another function “reduce” is used to aggregate all fraction of result together to give complete result.
It was also developed by Google to be a distributed storage system intended to manage highly scalable structured data. Data is organized into tables with rows and columns. Unlike a traditional relational database model, Big Table is a sparse, distributed, persistent multidimensional sorted map. It is intended to store huge volumes of data across commodity servers.
Hadoop is an open-source project of the Apache Software Foundation that can be installed on a set of standard machines, so that these machines can communicate and work together to store and process large datasets. It is derived from MapReduce and Big Table. Hadoop allows applications based on MapReduce to run on large clusters of commodity hardware. Yahoo!’s business architecture is based on Hadoop . Hadoop is designed to parallelize data processing across computing nodes to speed computations and hide latency.
Two major components of Hadoop are:
- HDFS : A massively scalable distributed file system that can support petabytes of data
- MapReduce : A massively scalable engine that computes results in batch.
Now you got the basic idea about What is Big Data, How Big Data came into picture and the main Big Data Technology. In coming days I will write detailed post about Hadoop, which is the most popular framework for BigData.
You can watch below video which will give you visual summary which i have explained above.
How Big Is Big Data?
Ok now lets come to the second question How big is big data? To give you a clear picture i will give some Big Data visualization Examples.
1. Visual Graph showing popularity of Facebook
This great visualization was created by a Facebook intern Paul Butler. He had access to Facebook’s Apache Hive database and its 500m records, and with that much amount of data result is this amazing visualization.
2. Earthquakes since 1898
Using data from NCEDC.org and the USGS and amazing visualization of Earthquakes since 1898 created by John Nelson from UXBlog
3. NASA Perpetual Ocean
Amazing video visualization of high-resolution, 3D model of the Earth’s oceans.
these are the some of examples which will give you insight about how big we are talking. Watch this informative video for more detail