Big Data Post #1 – What is Big Data? (′ʘ⌄ʘ‵)

The term Big Data is being increasingly used almost everywhere  – online and offline. And it is not related to computers only. It comes under a broader term called Information Technology, which is now part of almost all other technologies and fields of studies and businesses. Big Data is not a big deal. But the hype surrounding it is conveniently big enough  to confuse you. 


So what exactly is Big Data?

The data lying in the servers of your SNS was just data until yesterday – sorted and filed. Suddenly, the slang Big Data got popular and now the data in your SNS is Big Data. The term covers each and every piece of data your account has stored till now. It includes data stored in clouds and even the URLs that you bookmarked. Your domain server might not have digitized all the data. You may not have structured all the data already. But then, all the digital, papers, structured and unstructured data within your SSN is now Big Data.

In short, all the data – irrespective whether or not they are categorized – present in a server,that host the sites we commonly refer to as SNS, is collectively called BIG DATA. All this data can be used to get different results using different types of analysis. It is not necessary that all analysis use all the data. The different analysis uses different parts of the BIG DATA to produce the results and predictions necessary.

(Let’s think on a larger scale assuming that we are in a company from now on and that our organization is hosting a pretty big archive of data that is semistructured!)

Big Data is essentially the data that you analyze for results that you can use for predictions and for other uses. When using the term Big Data, one is bound to work with top level Information technology to deduce different types of results using the same data that was stored intentionally or unintentionally over years.

How big is Big Data

Essentially, all the data combined is Big Data but many researchers agree that Big Data – as such – cannot be manipulated using normal spreadsheets and regular tools of database management. They need special analysis tools like Hadoop (we’ll study this in a separate post) so that all the data can be analyzed at one go (or may include iterations of analysis).

Contrary to the above, though I am not an expert on the subject, I would say that data with any organization – big or small, organized or unorganized – is Big Data for that organization and that the organization may choose its own tools to analyze the data.

Normally, for analyzing data, people create different data sets based on one or more common fields so that analysis becomes easy. In case of Big Data, there is no need to create subsets for analyzing it. We now have tools that can analyze data irrespective of how huge it is. Probably, these tools themselves categorize the data even as they are analyzing it.

Big Data Concepts

This is another point where most people don’t agree. Some experts say that the Big Data Concepts are three Vs:

  • Volume
  • Velocity
  • Variety

Some others add few more Vs to the concept:

  • Veracity (Reliability)
  • Visibility
  • Value

I will cover concepts of Big Data in a separate article as this post is already getting big(or so I am told.. *sigh). In my opinion, the first three Vs are enough to explain the concept of Big Data.

The above summarizes what is Big Data in a non-technical language(right?). You can call it a very basic introduction. As usual, I plan to write few more articles on associated factors such as – Concepts, Analysis, Tools and uses of Big Data, Big Data 3 Vs yadda yadda yadda… 


