Big Data – Big Hype yet Big Opportunity February 14th, 2012

Vinod Kumar

“Big Data” seems to be the buzz word everywhere and the number of blogs on this very topic has been exponentially growing. Let me take a step back to understand what to expect. Even at India TechEd 2012 we plan to cover this very topic under the Architect track. Personally, I am really excited to see this session discussed from multiple angles. As budding Architects there are tons to look out for. Refer my previous post coming your way on Architecture. So at TechEd India we will have speakers discuss the problem statements and the possible solutions with recommendation on architecture. In this blog post, I am surely talking about some of them – I am not going to steal the awesome content they are lining up :).

Where does Big Data fit? Datasets that exceed the boundaries and size of normal processing capabilities forcing you to take non-traditional approaches.

Fundamental Problem

I was wanting to drop this topic before and strangely figured out that the SQL Community are anyways running the TSQL Tuesday on this very topic. Now with announcements at SQL PASS and investments of Microsoft also in this space – this is huge deal.

When we talk about Big Data we are fundamentally looking at 3 basic dimensions:

  1. Large Data (In ranges of Peta to exabytes and more)
  2. Complex Data (Write once – read many times, Dynamic Schema data)
  3. Unstructured data (Text mining, Images, Videos, Logs)

And these are the same problems we currently have in the industry when it comes to database / data store systems. Look at systems now with RFID tags, Web logs, sensors, medical images, telecom, public sector databases etc all are grappling with this problem.

Where to start?

Hadoop started as a way to quickly process Web log files. Web 2.0 sites were finding that they were accumulating logs that contained valuable click information and user behavior data. As an alternative to parsing log data and storing it in a relational database, Hadoop emerged as a way to keep the log files in their original format and allow processing and analysis.

Though the basic concept is simple and powerful, let me link to some basic explanation to the post Pinal Dave wrote today. He takes a stab at simply demystifying the basics on Hadoop, Pigs, Hives, MapReduce. Feel free to read more on them:

  1. Pig – A high-level language that lets non-programmers use Hadoop
  2. Hive – An SQL query implementation for Hadoop
  3. HBase – A key/value store for Hadoop

One other resource I would like to point in this context is Cloudera from learning resources. Cloudera is a for-profit company that produces integrated, tested, and commercially supported Hadoop releases. Look at some of the other extensions they support as extensions – some new releases make an interesting read.

  • Hue – Hadoop user interface
  • Sqoop – tool to import relational data
  • Flume – tool to import nonrelational data
  • Oozie – workflow engine and many more.

Relational or DW Database Obsolete?

Personally, I don’t think we are talking about this-or-that Boolean approach here. There is something that makes these concepts of Hadoop interesting and viable for organizations to start considering. Let me call out some of them (not exhaustive though)-

  1. Hadoop clusters can be on x86 commodity hardware
  2. No need build cubes for predictive analysis of large data
  3. Relational DB have their own limits on scale-out and scale-up scenarios
  4. Addition of scale-out options easy with Hadoop

With this steady stream of data, is this what the industry is also looking for? Check the McKinsey Global Institute – Big Data: The next frontier for innovation competition and productivity paper and the numbers are bind blowing.

  • 1.5 million more data saavy managers in the US alone
  • 140,000-190,000 deep analytical talent positions
  • €250 billion Potential annual value to Europe’s public sector
  • 15 out of 17 sectors in the US have more data stored per company than the US Library of Congress

Read the whitepaper and there are many more statistics that seem to make this Big Data really Big. Now take examples of big data patterns and sites like facebook or twitter with millions of data stream coming every minute and you want some analytics. Does this Big data architecture qualify here? or do you need a different architectural choices? Well, don’t forget to tune into our India TechEd Architecture track for the details :).

Microsoft Integration Points

From Microsoft, you are going to see lot of work to happen as it is data. Applications like Excel, PowerPivot, Power View, SQL Server Analysis Services, SQL Server Reporting Services are some of the integration we have seen in the recent past at SQL PASS. More about this can be read from the MS Big data home site.

Channel-9 Video: Lynn Langit and Dave Nielsen discuss "Big Data" in the Cloud

MSR Research Paper on Big Data – gives a nice read

Another Research Paper: Big Data and Cloud Computing: New Wine or just New Bottles?

What we can see is, as we get to know this more recent phenomenon of Big Data even the cloud seems to embrace it with two hands. You are going to see some serious integration across the platform and it is a great sign for us –

  1. Connectors for Hadoop, integrating it with SQL Server and SQL Sever Parallel Data Warehouse.
  2. An ODBC driver for Hive, permitting any Windows application to access and run queries against the Hive data warehouse.
  3. For developers, well now addition of JavaScript Layer to the Hadoop ecosystem is very compelling.
  4. An Excel Hive Add-in, which enables the movement of data directly from Hive into Excel or PowerPivot.

Where to start

I highly recommend using Apache Hadoop on Windows WIKI – please bookmark it. Now as a Microsoft ecosystem, there are 3 other interesting pages for reference you don’t want to miss.

On-Premise Deployment of Apache Hadoop for Windows

Windows Azure Deployment of Apache Hadoop for Windows

Windows Azure Deployment of Hadoop on the Elastic Map Reduce (EMR) Portal

This forms a great ecosystem from on-premise to the Cloud. As part of the whole bundle of links here, couldn’t resist from linking Rob Farley who has been kind enough to point out that Big Data now features in 24 hours of PASS too. Nice timing to talk more and more about Big data.


Personally, I see there is tons of learning with Big Data coming our way and 2012 will start the same conversation that we started about BI in Year 2005 timeframe. So get prepared for some Big Hype, Big Challenges, Big Insights and a Big Year of Big Data coming your way.

Tags: , , , , , , , , , , , , ,

This entry was posted on Tuesday, February 14th, 2012 at 22:14 and is filed under Personal, Technology. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

11 Responses to “Big Data – Big Hype yet Big Opportunity”

  1. Hello sir
    Thanx for this Awesome Article on big data it save my time and reduces my confusion on big data
    In my collage i participated in IMRB internship on Behalf of Microsoft they provide me job for Market research and understanding the problem of small business organisation and collect the data of 20 organisation and implement it in my project i am working on a project of supermarket which one of them some of my faculty member suggest me to use the big data (such as data like walmart) for better productivity and use bussiness intelligence tool for them so how can i get this data is that free to me or have some privacy issues kindly suggest me thanx

    • Vinod Kumar says:

      Unless the data is sold publically, there is no way to get this data. You might want to look at open datasets available in the market or Marketplace to do such analysis.

  2. […] keep in mind as they embark their journey of Big Data. I did write some of the basics in my blog: Big Data – Big Hype yet Big Opportunity. Do let me know if these questions make […]

  3. […] keep in mind as they embark their journey of Big Data. I did write some of the basics in my blog: Big Data – Big Hype yet Big Opportunity. Do let me know if these questions make […]

  4. Samith C Valsalan says:


    Got an Idea on “Big Data”
    Really helpful links also …

    thank u

  5. […] to take a stab at a technical concept and see if this makes sense. Sometime back I wrote about Big Data – Big Hype yet Big Opportunity and one title kept on fascinating me – “Data Scientist”. I have no problem when it comes to […]

  6. Oscar Zamora says:

    One of the few blog posts that clearly explains Big Data and what is Microsoft doing to leverage HDFS.


  7. […] Kumar has a nice summary of what the challenges of Big Data in general can be. Share this:LinkedInDiggStumbleUponPrintMoreTwitterEmailFacebookRedditLike […]

  8. Pinal Dave says:

    Excellent and welcome to very first T-SQL Tuesday Party!

    Wow, super write up.

Leave a Reply