Posts Tagged ‘Cloud’

Big Data – Big Hype yet Big Opportunity February 14th, 2012

Vinod Kumar

“Big Data” seems to be the buzz word everywhere and the number of blogs on this very topic has been exponentially growing. Let me take a step back to understand what to expect. Even at India TechEd 2012 we plan to cover this very topic under the Architect track. Personally, I am really excited to see this session discussed from multiple angles. As budding Architects there are tons to look out for. Refer my previous post coming your way on Architecture. So at TechEd India we will have speakers discuss the problem statements and the possible solutions with recommendation on architecture. In this blog post, I am surely talking about some of them – I am not going to steal the awesome content they are lining up :).

Where does Big Data fit? Datasets that exceed the boundaries and size of normal processing capabilities forcing you to take non-traditional approaches.

Fundamental Problem

I was wanting to drop this topic before and strangely figured out that the SQL Community are anyways running the TSQL Tuesday on this very topic. Now with announcements at SQL PASS and investments of Microsoft also in this space – this is huge deal.

When we talk about Big Data we are fundamentally looking at 3 basic dimensions:

  1. Large Data (In ranges of Peta to exabytes and more)
  2. Complex Data (Write once – read many times, Dynamic Schema data)
  3. Unstructured data (Text mining, Images, Videos, Logs)

And these are the same problems we currently have in the industry when it comes to database / data store systems. Look at systems now with RFID tags, Web logs, sensors, medical images, telecom, public sector databases etc all are grappling with this problem.

Where to start?

Hadoop started as a way to quickly process Web log files. Web 2.0 sites were finding that they were accumulating logs that contained valuable click information and user behavior data. As an alternative to parsing log data and storing it in a relational database, Hadoop emerged as a way to keep the log files in their original format and allow processing and analysis.

Though the basic concept is simple and powerful, let me link to some basic explanation to the post Pinal Dave wrote today. He takes a stab at simply demystifying the basics on Hadoop, Pigs, Hives, MapReduce. Feel free to read more on them:

  1. Pig – A high-level language that lets non-programmers use Hadoop
  2. Hive – An SQL query implementation for Hadoop
  3. HBase – A key/value store for Hadoop

One other resource I would like to point in this context is Cloudera from learning resources. Cloudera is a for-profit company that produces integrated, tested, and commercially supported Hadoop releases. Look at some of the other extensions they support as extensions – some new releases make an interesting read.

  • Hue – Hadoop user interface
  • Sqoop – tool to import relational data
  • Flume – tool to import nonrelational data
  • Oozie – workflow engine and many more.

Relational or DW Database Obsolete?

Personally, I don’t think we are talking about this-or-that Boolean approach here. There is something that makes these concepts of Hadoop interesting and viable for organizations to start considering. Let me call out some of them (not exhaustive though)-

  1. Hadoop clusters can be on x86 commodity hardware
  2. No need build cubes for predictive analysis of large data
  3. Relational DB have their own limits on scale-out and scale-up scenarios
  4. Addition of scale-out options easy with Hadoop

With this steady stream of data, is this what the industry is also looking for? Check the McKinsey Global Institute – Big Data: The next frontier for innovation competition and productivity paper and the numbers are bind blowing.

  • 1.5 million more data saavy managers in the US alone
  • 140,000-190,000 deep analytical talent positions
  • €250 billion Potential annual value to Europe’s public sector
  • 15 out of 17 sectors in the US have more data stored per company than the US Library of Congress

Read the whitepaper and there are many more statistics that seem to make this Big Data really Big. Now take examples of big data patterns and sites like facebook or twitter with millions of data stream coming every minute and you want some analytics. Does this Big data architecture qualify here? or do you need a different architectural choices? Well, don’t forget to tune into our India TechEd Architecture track for the details :).

Microsoft Integration Points

From Microsoft, you are going to see lot of work to happen as it is data. Applications like Excel, PowerPivot, Power View, SQL Server Analysis Services, SQL Server Reporting Services are some of the integration we have seen in the recent past at SQL PASS. More about this can be read from the MS Big data home site.

Channel-9 Video: Lynn Langit and Dave Nielsen discuss "Big Data" in the Cloud

MSR Research Paper on Big Data – gives a nice read

Another Research Paper: Big Data and Cloud Computing: New Wine or just New Bottles?

What we can see is, as we get to know this more recent phenomenon of Big Data even the cloud seems to embrace it with two hands. You are going to see some serious integration across the platform and it is a great sign for us -

  1. Connectors for Hadoop, integrating it with SQL Server and SQL Sever Parallel Data Warehouse.
  2. An ODBC driver for Hive, permitting any Windows application to access and run queries against the Hive data warehouse.
  3. For developers, well now addition of JavaScript Layer to the Hadoop ecosystem is very compelling.
  4. An Excel Hive Add-in, which enables the movement of data directly from Hive into Excel or PowerPivot.

Where to start

I highly recommend using Apache Hadoop on Windows WIKI – please bookmark it. Now as a Microsoft ecosystem, there are 3 other interesting pages for reference you don’t want to miss.

On-Premise Deployment of Apache Hadoop for Windows

Windows Azure Deployment of Apache Hadoop for Windows

Windows Azure Deployment of Hadoop on the Elastic Map Reduce (EMR) Portal

This forms a great ecosystem from on-premise to the Cloud. As part of the whole bundle of links here, couldn’t resist from linking Rob Farley who has been kind enough to point out that Big Data now features in 24 hours of PASS too. Nice timing to talk more and more about Big data.


Personally, I see there is tons of learning with Big Data coming our way and 2012 will start the same conversation that we started about BI in Year 2005 timeframe. So get prepared for some Big Hype, Big Challenges, Big Insights and a Big Year of Big Data coming your way.

Continue reading...


Cloud Computing – Trends watch December 26th, 2011

Vinod Kumar

Cloud Computing is surely a buzz word and is catching the IT industry slowly but surely. As part of my work, I do meet a lot of Indian customers asking for more when it comes to consuming and evaluating various Cloud implementation strategies. Yes, cost is one of the dimensions where this is evaluated, but the crux being – Have you designed for the cloud? I will spare that thought of designing for the cloud for a different discussion some other day. But let me call out some of the trends I am seeing and is worth investing our time when it comes to learning Cloud phenomenon in the coming year !!!

Dilbert.comSource – Dilbert Site

Cloud as Disaster Site

This is quite a viable option to think, as most companies are looking at options to get their online backups to the cloud as the storage costs are cheaper and available with redundancy on the cloud. The traditional method of using a DR site was only for business continuity and needed dedicated processes and replication infrastructure to be maintained at a different site. Not to add the human resources, electricity, A/C, infrastructure and more costs to maintain a dead weight. So this trend will become something to lookout for in the future especially when ingress traffic is at no cost from most cloud providers :).

Word of caution here would be – look at the SLA the cloud providers give you even with the storage. Given storage cheaper, just incase of disaster – look at the recovery time requirements that you have set with your customers. Having TB’s of backup will take some considerable time and don’t assume on your recovery strategy !!!

Enterprise moves with Hybrid-Private Cloud

More and more I get an opportunity to talk with the Enterprise CxO’s, I get to hear more of these terms coming up. Yes, the investments are already out there within the enterprises and these cannot go anywhere. So the need to use the existing storages and extend the future needs of storage to the cloud can be a viable option that exec’s always want to contemplate and discuss. The whole concept of limitless (based on costing) storage that hardly requires upgrades, replacements and with no additional capital investment means quite a lot to these CxO’s – especially the CFO’s love to see the ROI here.

Now that these decisions work, next is the need to seamlessly integrate your on-premise environments with the cloud infrastructure. This will become a critical part of any application designs moving forward. We cannot live in a Monolithic model moving forward – hybrid is here and there to stay for a long time.

With computing easy to get on the cloud, it is sometimes the storage that will need to transform from local storages to SAN to Private Cloud storages to Public Cloud storages. These are challenges to keep in mind but not far from implementation – get ready.

Bootstrap to Cloud

As more and more applications do get designed to move to the cloud, there are many more administrative tasks that organizations are contemplating to move to the cloud to reduce the maintenance overheads. The tools and steps required to migrate or move these applications will be something we need to aware and understand holistically. As IT Admins, the need to maintain VMs on the Public Cloud and keep them up-to-date and running is a trend not to miss. Yes, pure play cloud enabled apps are taking shape for newer applications but the legacy applications will stay, hence VMs might be something that cannot be avoided.

As much as applications migrate away from on-premise to cloud. Keep in mind the tools needed to bring them back to your environment anytime (if required).

Big Data – trends not to forget

As more and more people are getting stung with these industry phenomenon not very well understood implementations of NoSQL worlds, Non-relational databases – a trend that will hit the market sooner than later. Though I am a big relational DB fan for a long time, I can see it clearly the Big-Data story is something I will need to bite in the coming year without any doubt. Do we have a choice – Nope !!!


As 2012 nears, just like explained in “Crossing the Chasm” by Geoffrey Moore, the concepts of adoptions remain the same. There will be industry early adopters, others on the way to get themselves moved coming year and a vast majority still contemplating to move first in a sandwich mode with Private Cloud and then move slowly to the new era of completely public cloud infrastructures.

If you really ask me, there is more than what hits the eye and we will get a lot to learn as companies make these moves. There is anyways an opportunity in everything new that comes to the market and the phenomenon of Cloud is something to stay as we move into 2012 !!!

Continue reading...


Database Consolidation Considerations June 3rd, 2011

Vinod Kumar

Following the post around Multi-tenancy, there were interesting comments that have come my way to write more on such topics. Well, thanks for reading and dropping a line. In this post, thought will write around another interesting yet very common topic which I get a chance to discuss with customers – Consolidation of Servers. Though these can be driven by very specific business reasons, these do need some thought before implementing.

There are a significant number of manageability, availability, performance, scalability, and political considerations when deciding between dedicated (physical/virtual), instance level, database level or schema level consolidation. Fortunately, most of these are covered in the SQL Server 2008 Consolidation Guidance technical article. The article comes complete with excellent explanations and decision trees, and though it’s primarily focused on the decision between virtualization vs. instance level vs. database level – it is noted in the article that “Other possibilities include further optimizations on an existing approach such as schema-level consolidation. The key decision factors are similar to the higher-level consolidation options mentioned previously, so this paper will focus only on those.”

As mentioned before, business might see Consolidation from (not in any specific order):

  1. High-Availability – Instead of giving redundant multiple HA databases, some times consolidating will give an advantage for all the applications on a better server with a common HA option (like clustering).
  2. Centralized Management – Customers look at this as an opportunity to consolidate all the departmental applications and more so consolidate the DBAs inside their organization.
  3. Cost Saving – This is most likely the first thing that business sees as an opportunity. Ultimately, they want to look at maximizing the utilization of the hardware they have bought or more so get one beefy server to manage 10s of applications database away from outdated hardwares.
  4. Risk management – As discussed on above, centralization means it also standardizes the way DB code is developed, managed, deployed and maintained. Also it becomes easier for servicing, implementing processes and automated system administration.

Other Consolidation Considerations

Well, all the above reasons are valid, there are more that are missed between the lines.

  1. Operational Cost – Consolidation on newer hardware means reduced servers and power savings can also be achieved. And moving virtual also means you are saving on hosting costs.
  2. Increased uptime – Server consolidation makes it more economic to provide high-availability configurations and dedicated support staff. Organizations also benefit from implementing better storage management and service continuity solutions.
  3. Predictable Performance – Moving to a more standardized systems means we can assure a more predictable performance and can behind the scenes implement isolation of resources per application and a DBA can go ahead and implement compression like techniques without change in application code.
  4. Integration Benefits - A consolidated platform provides for easier, consistent and cheaper, systems integration, which improves data consistency and reduces the complexity of tasks such as extract, transform, and load (ETL) operations across systems.

Technical Considerations

For code already written, migrating to an alternative solution than was originally designed may require rework by developers – i.e. if they’ve already written their code in database A to access database B objects with 3-part name identifiers, they would have to make accommodating changes. Additionally, any database users with a database-wide role (db_datareader, db_ddladmin, etc) that are consolidation candidates might have to be changed.

Though schema consolidation with a single database is nice for logical groupings of data that benefit from being kept in synch [especially during a recovery] or must maintain referential integrity, the thought of combining unrelated entities raises all kinds of management and political questions.

Unrelated applications might have differing availability, maintenance, and isolation requirements – and combining them at a database level complicates this. And if these are financially sensitive data then the process of auditing or tracking becomes even more harder than ever.

Finally, It is great to collapse multiple databases from applications, but if these applications are in-house applications you are good. If these were third party applications – then application compatibility needs to be checked. Certifying the application to Versions of database, x86 Vs x64 architecture, version of the OS etc are all extra activities to be handled – not to mention on dependency services.

Misc Considerations

Well, Planning is the name of the game. Loved to see this hidden somewhere in documentation – a Checklist. Worth a read and consideration, though this is not complete – you will need to create one for your infrastructure needs.

But there are tons of other things that also come to mind -

Special Security context of databases
Limitations or dependencies that prevent consolidation (Agent jobs, Maintenance Plans, SQLMail, ETL or others)
Third party add-on dependency for application
OLTP VS OLAP features and frequency of use
Dependencies of Server / Instance names (hard coded inside application)
How many databases for a single app and proximity of all the dependent DBs
Data growth rate, Data Retention Policies followed by Archival
Backup windows and Special Backup technologies
Peak usage / Low usage time windows of each server
Replication frequency, duration, and volume
Specific Connectivity requirements (Protocols, SSL and others)
Internet / Public Access vs. Internal only
SLA’s to business units for uptime

In the 60 page whitepaper on Consolidation Using SQL Server 2008, the option to collapse multiple databases into a single database [via multiple schema management] is essentially written off citing challenges with security, object naming conflicts, performance issues, etc. It leaves off with an ominous warning that schema-level consolidation should be “used carefully” – but I’d just avoid it all-together for unrelated applications. And you have more dope above on why I say so.

HomePage for SQL Server Consolidation and Virtualization

If you did read it this far, feel free to drop a comment with suggestions if any. Obviously, your experience are also unique and valuable.

Continue reading...


Multi-Tenant – Marketing Buzz ? May 30th, 2011

Vinod Kumar

There has been lot of talk around various architecture – moving to cloud, SaaS, PaaS etc. And many ISVs we interact are exploring to achieve increased revenues and expand their customer base by moving to these marketing hype. Today, I wanted to just blog around things that come to mind of ISVs for building a strategy of moving their existing product to a SaaS model and this is exactly the time when we hear –

“How can we design for multi-tenancy?"

What does this mean?
Well, Maybe an online search can give you multiple definitions. From an product based ISVs point of view – they design software application to be consumed and used within a single company’s domain. Now what if you wanted to move the same application to the web for multiple users across multiple companies maintaining a single or simpler infrastructure.

A multi-tenant architecture in the SaaS world basically means one instance of your product serving multiple companies and multiple users within each company. There is a difference between users and tenants SaaS which is important to understand – think of a tenant as a company or a team – basically a collection of users. A user on the other hand is an individual person or system that is interacting with the product. There are already well written whitepapers on this very topic and repeating them here will be a waste of your time. I highly recommend reading the below papers for reference:

  1. Hosting a Multi-Tenant Application on Windows Azure :
  2. Multi-Tenant Data Architecture: – Article is quite dated but still has the concepts covered well. I was years ago directed to this paper when talking about SODA (Service Oriented Data Architecture).

Being a Database person, I get thrown with a question – “If we add the tenant_ID as part of every table, are we multi-tenant?” For me, that is a good start. Whenever I get a chance to talk on this topic – I bring out some more basic considerations:

  1. Data isolation – This will describe how each of the tenants want their data. And secondly, if the application mandates some tenants to move their data as part of the backup/restore or maintenance. Also, this can be an regulatory or compliance need. If thought properly, the Single Database (big-bang) approach or the Shared Schema approach fails here. Worst case, if that single DB gets corrupt – every other tenant gets affected?
  2. Data scoping and access control – When thinking this is also part of the above isolation technique. The important thing here is around who gets access to what part of data. The complications happen when you need to show different *master* data to different tenants.
  3. Horizontal scaling – This is to get the infinite scale and keeping manageability in mind, partitioning row-wise i.e. tenant-wise would give isolation and scales great. This also is an implementation of isolation tenant-wise. On-premise this can translate into an SQL Server Partitioning concept. In the Azure world we talk about SQL Server Sharding capability (Link) and recommend reading the whitepaper.
  4. Tenant wise customizability and configurability – Though in the SaaS world the scope for general profile customization are fine and easily implemented. At a business layer, many times the application needs to be extended and customized for different regions based on regulatory, compliance and other needs of a region or country. This is a very important factor – standardization will lead to ease of maintenance, single codebase, Reduced Opex etc. Be careful while taking a decision here, you don’t want to open a can of worms – “Evaluate flexibility at what cost?”. There are standard techniques for allowing customization as discussed in the Data Architecture whitepaper.
  5. Hosting on a hybrid but elastic infrastructure – When talking about SaaS or Cloud architecture people say Azure, AWS etc. Step back, when I mean hybrid or elastic – it can also mean at your own Data Centers offered to customers or DC of a Hoster or from standard cloud offering vendors. Your application needs to be built for such deployments – stateless, able to run independently on physical or virtual boxes in a dedicated and/or scale-out scenario without any problems seamlessly. Though the specifics on this very topic can be for a discussion some other day, keeping this in mind is an important factor.

Finally, All these when considered wisely could indeed give enterprises cost savings and benefits. If this is not achieved there is no point moving on first place. And if you don’t architect it right the first time and create a chatty application – remember you pay for data-in, data-out in the cloud world and that can dent your credit-card bills end of the month Smile.

There is more on this topic than what has got covered, atleast I got you a start and some thinking pointers if it has not been considered in the past. Do pass around your thoughts and comments.

Continue reading...


Patterns & practices- Windows Phone 7 Developer Guide December 15th, 2010

Vinod Kumar

Windows® Phone 7 provides an exciting new opportunity for companies and developers to build applications that travel with users, are interactive and attractive, and are available whenever and wherever users want to work with them. By combining Windows Phone 7 applications with on-premises services and applications, or remote services and applications that run in the cloud (such as those using the Windows Azure™ technology platform), developers can create highly scalable, reliable, and powerful applications that extend the functionality beyond the traditional desktop or laptop; and into a truly portable and much more accessible environment.

This guide describes a scenario around a fictitious company named Tailspin that has decided to include Windows Phone 7 as a client device for their existing cloud-based application. Their Windows Azure-based application, Surveys, is described in detail in a previous book in this series, Developing Applications for the Cloud on the Microsoft Windows Azure Platform. For more information about that book, see the MSDN® page at (

In addition to describing the client application, its integration with the remote services, and the decisions made during its design and implementation, this book discusses related factors, such as the design patterns used, the capabilities and use of Windows Phone 7, and the ways that the application could be extended or modified for other scenarios.

After reading this book, you will be familiar with how to design and implement applications for Windows Phone 7 that take advantage of remote services to obtain and upload data while providing a great user experience. The guide includes :-


"Introducing Windows Phone 7" provides an overview of the platform to help you understand the requirements and advantages of creating applications for Windows Phone 7. It provides a high-level description of the possibilities, features, and requirements for building applications for Windows Phone, and it includes references to useful information about designing and developing these types of applications. It also includes a glossary of terms commonly used in mobile application development. It’s probably a good idea to read this chapter before moving on to the scenarios.

"Designing Windows Phone 7 Applications" discusses planning and designing applications for Windows Phone 7. It covers the run-time environment and life cycle events for your application, how to maximize performance on the phone, and considerations for the user interface, resource management, storage, connectivity, and more.

"The Tailspin Scenario" introduces you to the Tailspin company and the Surveys application. It describes the decisions that the developers at Tailspin made when designing their application, and it discusses how the Windows Phone 7 client interacts with their existing Windows Azure-based services.

"Building the Mobile Client" describes the steps that Tailspin took when building the mobile client application for Windows Phone 7 that enables users to register for and download surveys, complete the surveys, and upload the results to the cloud-based service. It includes details of the overall structure of the application, the way that the Model-View-ViewModel (MVVM) pattern is implemented, and the way that the application displays data and manages commands and navigation between the pages. The following chapters describe the individual features of the application development in more detail.

"Using Services on the Phone" discusses the way that the Windows Phone 7 client application stores and manipulates data, manages activation and deactivation, synchronizes data with the server application, and captures picture and sound data.

"Connecting with Services" describes how the client application running on Windows Phone 7 uses the services exposed by the Windows Azure platform. This includes user authentication, how the client application accesses services and downloads data, the data formats that the application uses, filtering data on the server, and the push notification capabilities.

"Interacting with Windows Marketplace" describes how you can distribute and sell your applications through Windows Marketplace, and the restrictions and conditions Windows Marketplace places on your applications and content.

The appendices include additional useful information related to the topics described in the rest of the chapters. The appendices cover getting started with the Windows Phone developer tools; testing your applications; information about the development environments (Silverlight and XNA® development platform); a reference section for programming device capabilities, such as location services, messaging features, and the camera; information about the Prism Library for Windows Phone 7; and an overview of data and file synchronization using emerging technologies such as Microsoft Sync Framework.

Click here to download this release.


Walking on the same lines, here are some great free online Training on Making Windows Phone 7 Apps & Games. Don’t forget to check all the goodness from Channel9 Smile.

Also another must have for Windows Phone 7 development will be the Free ebook: Programming Windows Phone 7, by Charles Petzold. Feel free to download the book with the code. Will assure you it will be worth it !!!

Continue reading...