Posts Tagged ‘Cloud’

Getting started with PowerShell and Azure August 21st, 2014

Vinod Kumar

I am a big time supporter of PowerShell and automation when it comes to working with Azure. In the recent past, I have seen SQL Developer ask me – how I can start playing with PowerShell for Azure. Though the concepts of powershell are awesome and I use them inside SQL Server, it is always difficult to get them ready on their machines. Often, the question is – where do I start? This blog will get you started with installing and configuring your machine with PS and Azure.

We can install Azure PowerShell modules by running the Microsoft Web Platform Installer. I have initiated the same and go through the initial wizard.

We can see the installation progress and once done, we will be presented with information to various components that get installed as part of this.

Now to use PowerShell with out Azure subscription, we need to link our account. If you are using an organizational account, then use the Add-AzureAccount command from the Windows Azure PowerShell command. This is useful ONLY when we are using Azure AD for our account. If you are not using Azure AD, then move to the next step.

In most cases, we are going to have standard Live Account associated with our Subscription. In this case, we need to download the certificate. To do this,  invoke the command Get-AzurePublishSettingsFile command from the PowerShell command prompt.

We will be prompted with a save dialog and save this certificate file in a secure location. The next step is to Import the same so that we can start using PowerShell from our local machine to work with our subscription remotely.

For this we will invoke the – Import-AzurePublishSettingsFile “Path of certificate file”

Once this is done, we are connected and we will get an confirmation of our current subscription. being used.

We can quickly get details about our current subscription using the command Get-AzureSubscription as it outlines some of the information.

Now we are all set with are getting started with PowerShell and Azure. In future posts we will use this configuration as starting point to play around creating new VMs, Stopping VMs, Deleting VMs, Creating other services and much more. Stay tuned !!!

Continue reading...


Moving data to Cloud–Considerations and thoughts August 20th, 2014

Vinod Kumar

Being a data person and meeting customers almost every single day on considerations of movement to Azure (Cloud), I take the opportunity to demystify a lot basic concepts with customers. Post the session, it is about thinking in the right direction while taking data to the cloud. This blog reflects the lessons learned best practices, approaches and understanding from real world experiences. There are not written on stone sort of recommendations, but these approaches I highly recommend customers to set their minds at while taking data to cloud.

Know your workload

The basic conversation here is to identify if the application is a “Green Field” project or an “Existing Application”. In a “Green Field” project, we will be exposing data via a service for the first time. Like a literal green field, we are dealing with an undeveloped environment and don’t have any expectations, limitations, dependencies or other constraints that would result from having an existing service.

For existing applications, services may reside on our on-premise servers, partner hosted environment or third-party cloud environments. In such scenario, there may be a need to move, scale, or provide a redundant environment for these services. When such applications come, I look at asking if the end-user open on opening up just a subset of of existing data as a new service on cloud.

When working movement to Cloud, I generally ask if the customer wants to migrate as-is their current deployment or migrate data / services as an opportunity to enhance it providing a super set of functionality found in the original application.

If you already have an existing application, but it is reaching the limits of scale and is difficult or impossible due to technology or financial based constraints to scale – one option is to use the cloud as a “scale layer”. In this scenario, cloud infrastructure can be put in front of your existing data services to provide significant scale. This is yet another workload to look at.

Exposing your data

There can be different levels of motivations when exposing your data via API’s. I generally quiz customers around this – is it compliance requirement, is it self-motivated for others to integrate or is it a requirement from consumers. Based on this, the discussion will decide if it is need to have, want to have or nice to have.

Irrespective how the data gets exposed by applications, there are few more analysis we do, these are some typical queries we ask:

  1. Data Location: Are there constraints in location of where data can be stored? Is geography a contraint?
  2. Data type and Format: Are there any standards which mandate data to be transmitted in XML, JASON, Binary, Text, OData, DOM, Zip, Images format etc?
  3. Sharing: Is there a specific API to be implemented like WebAPI, WCF, Web Services, FTP, SAPI etc.
  4. Consumption: Are there requirements to retrieve the data in its entirety or be able to query the data and retrieve only a subset? Is queryability a required and do we need to certify the same?
  5. Commercial Use: It is critical to understand if it is internal systems consuming the data or available for commercial use? Is there a need to monetize the access of data and services delivered?
  6. Data discoverability: Is there a need to expose data and advertised somehow? Should it integrate with gateways, govt services or third party data services like Azure DataMarket?

Know your Data

There are multiple technologies which can be used to host data, each with their own pros, cons, and pricing. Data size is relevant as there can be key factors when determining which technology should be used to host the data. In addition to understanding the current data size, it’s important to also understand the data growth rate. Evaluating the data growth rate can be used to extrapolate storage needs in future years and help scope the technology used.

Know how often the data gets updated. If customer wants to expose data to public, then there maybe sync or out-of-sync updates to source data and data exposed via services.

Putting a copy of your data into the cloud requires the actual transfer of data from your location to the datacenter of cloud hosting provider. These transfers require time and typically have associated bandwidth charges that should be considered – both for the initial transfer of the current state of your data set and the periodic refreshes of data.

Sharding is something that most cloud storage mechanisms support. When looking to move large data sets to the cloud, consider sharding and think about the optimal ways to partition data for optimized for query performance. Examples of different types of sharding include sharding by region, state, by time period (month, year etc.), postal code, customer’s name etc.

Query pattern is yet another dimension we need to keep in mind. Since you pay based on bandwidth utilization, it is critical to understand the query patterns and how queries come to access data. Since we are paying for the data usage, it is important to understand the amount of data we transfer in every request, the number of columns queried etc.

As we build our applications, it is important to understand if there are any Exporting requirements for other systems to consume. In an hybrid scenario, we will need to run queries that result in the export of files to the local file system and then those must be transported and imported into a cloud store. Depending on the cloud technology being used for storage, the import may be trivial or it may require the creation of custom code or scripts to export the data from the source and import it into the cloud.

Are you into data selling business?

There have been very less customers who are in a B2C scenario, but for those who are on this business – need to understand there is new costs to handle while servicing data as a service. It is no more a freebie. Even though cloud computing can greatly reduce costs, there are still costs involved to host, load and support a data service.

As these costs add up, companies look at a minimum of Cost Recovery rather than Profit in the first run. They look at optimizing it later to make profit by volumes rather than profits out of single transactions. I have seen customers adopt different strategies for pricing in a Data as a Service or Software as a Service strategy. There is no one right way or the other -

  1. Tiered Pricing – This is like mobile plans, you charge by standard transactions rates per month, time, data etc.
  2. Pay as you go pricing – this is something to borrow from any cloud vendor charging.
  3. Peak Pricing and Non-Peak Pricing – We can have different pricing for different times of the day.
  4. One time usage Subscription – It is something like a trial costing.

There are many of these mechanisms customers adopt in their strategy, but these are just representative methods one might use.


I seriously hope I have outlined some thoughts in your mind as you migrate your workload to cloud or enable your data on cloud. Cloud is no different than what you do on-premise – just that we need to be double careful and take few extra steps as we enable our customers. In future posts, there will be more thoughts that I will try to bring out.

Continue reading...


Creating Excel Interactive View July 24th, 2014

Vinod Kumar

I have been wanting to write on this topic for ages but seem to have missed out for one reason or the other. How many times in your life seen a web page with a bunch of tables and it is so boring to read them? The numbers or tables sometimes might have sorting capability but lacks a striking visualization to say the least. So I am going to borrow a table from a Wikipedia page about Indian Population. There are a number of tables and the table of interest to be in Literacy rate. So the rough table looks like:

State/UT Code India/State/UT Literate Persons (%) Males (%) Females (%)
01 Jammu and Kashmir 86.61 87.26 85.23+-
02 Himachal Pradesh 83.78 90.83 76.60
03 Punjab 76.6 81.48 71.34
04 Chandigarh 86.43 90.54 81.38

Well, this is as boring as it can ever get even when pasted as-is on this blog. Now here is the trick we are going to do called as Excel Interactive View. As the name suggests, we are going to use the power of Excel to make this mundane table into some fancy charts for analysis. This includes a couple of scripts that needs to be added as part of the HTML Table and we are done. It is really as simple as that. So let me add the complete table with the script added. Just click on the button provided above to see the magic:

State/UT Code India/State/UT Literate Persons (%) Males (%) Females (%)
01 Jammu and Kashmir 86.61 87.26 85.23+-
02 Himachal Pradesh 83.78 90.83 76.60
03 Punjab 76.6 81.48 71.34
04 Chandigarh 86.43 90.54 81.38
05 Uttarakhand 79.63 88.33 70.70
06 Haryana 76.64 85.38 66.77
07 Delhi 86.34 91.03 80.93
08 Rajasthan 67.06 80.51 52.66
09 Uttar Pradesh 69.72 79.24 59.26
10 Bihar 63.82 73.39 53.33
11 Sikkim 82.20 87.29 76.43
12 Arunachal Pradesh 66.95 73.69 59.57
13 Nagaland 80.11 83.29 76.69
14 Manipur 79.85 86.49 73.17
15 Mizoram 91.58 93.72 89.40
16 Tripura 87.75 92.18 83.15
17 Meghalaya 75.48 77.17 73.78
18 Assam 73.18 78.81 67.27
19 West Bengal 77.08 82.67 71.16
20 Jharkhand 67.63 78.45 56.21
21 Odisha 72.9 82.40 64.36
22 Chhattisgarh 71.04 81.45 60.59
23 Madhya Pradesh 70.63 80.53 60.02
24 Gujarat 79.31 87.23 70.73
25 Daman and Diu 87.07 91.48 79.59
26 Dadra and Nagar Haveli 77.65 86.46 65.93
27 Maharashtra 83.2 89.82 75.48
28 Andhra Pradesh 67.66 75.56 59.74
29 Karnataka 75.60 82.85 68.13
30 Goa 87.40 92.81 81.84
31 Lakshadweep 92.28 96.11 88.25
32 Kerala 93.91 96.02 91.98
33 Tamil Nadu 80.33 86.81 73.86
34 Puducherry 86.55 92.12 81.22
35 Andaman and Nicobar Islands 86.27 90.11 81.84

So how cool is this Excel visualisation? I am sure you will want to build or use this capability in your webpages or internal sites in your organizations too. I hope you learnt something really interesting.

If you want to learn more about using this feature in your dataset and web pages, well read the documentation from Excel Interactive View.

PS: the data comes from Wikipedia and I have just used a snapshot to show the same. So please dont read too much into the data etc, look at the Excel view capabilities.

Continue reading...


Managed Databases on Cloud July 18th, 2014

Vinod Kumar

Recently my good friend and colleague Govind wrote about this topic on what are customers looking forward to when it comes to Cloud and working with Azure. The fundamental tenants that customers look at for cloud be it PaaS, SaaS or IaaS has been around:

  1. Reduced Maintenance headaches
  2. SLA backed for HA/DR
  3. Performance
  4. Synchronization with on-prem
  5. Security
  6. Backups
  7. No worry about hardware

and a few more. But for most parts the above fits the quizzing we get into. In a recent conversation, I had to outline some of the options when it comes to backup requirements with the customer which I thought is worth a share here. I am looking at this from an Azure standpoint:

For IaaS:

  1. You will need to use SQL Server Agent and build your maintenance plans that can automated. This can be scripted (powershell, TSQL or others) and done for all workloads.
  2. For SQL Server 2008 R2 CU2 onwards, we can use Backup to URL option wherein backups from Azure VM – SQL box we can point backups to a blob storage. I wrote about this a while back and you can try the same –
  3. SQL Server 2014 also supports Encrypted backups to Blob and the same article shows the same.
  4. Also from SQL Server 2014 we have option to use Managed Automated backups configured. This will take backups automatically to  blob on a predefined time or based on workload pattern. Documentation for this can be found at:

For PaaS:

  1. Since we already make sure of consistency in the Azure world, we dont have to worry on this.
  2. For Basic, Standard and Premium editions there are SLA for Point-In-Time recovery which is 7, 14 and 35 days respectively. You can read more about this at: . I highly recommend to use Powershell scripts to automate this, if you plan to use the them.
  3. In the past, I have also seen customers use Database Copy functionality to keep a copy of their database in a ready to use state every couple of days. This gives them an opportunity to go back to that version immediately without any problems. This is also an viable option if you like to use. – Since point-in-time restores are available, I am more inclined to use that for cold standby and restores. Having said that, we can still use that feature for creating a copy for Dev, Test environments from our prod servers for testing.

These are my customer notes and I plan to start publishing these customer notes from time to time here in my blog. Since we are talking about Azure, I am sure some more additional capabilities and SLA’s can change over a period of time. So please keep an eye on the documentation for the latest values.

Continue reading...


Big Data – Big Hype yet Big Opportunity February 14th, 2012

Vinod Kumar

“Big Data” seems to be the buzz word everywhere and the number of blogs on this very topic has been exponentially growing. Let me take a step back to understand what to expect. Even at India TechEd 2012 we plan to cover this very topic under the Architect track. Personally, I am really excited to see this session discussed from multiple angles. As budding Architects there are tons to look out for. Refer my previous post coming your way on Architecture. So at TechEd India we will have speakers discuss the problem statements and the possible solutions with recommendation on architecture. In this blog post, I am surely talking about some of them – I am not going to steal the awesome content they are lining up :).

Where does Big Data fit? Datasets that exceed the boundaries and size of normal processing capabilities forcing you to take non-traditional approaches.

Fundamental Problem

I was wanting to drop this topic before and strangely figured out that the SQL Community are anyways running the TSQL Tuesday on this very topic. Now with announcements at SQL PASS and investments of Microsoft also in this space – this is huge deal.

When we talk about Big Data we are fundamentally looking at 3 basic dimensions:

  1. Large Data (In ranges of Peta to exabytes and more)
  2. Complex Data (Write once – read many times, Dynamic Schema data)
  3. Unstructured data (Text mining, Images, Videos, Logs)

And these are the same problems we currently have in the industry when it comes to database / data store systems. Look at systems now with RFID tags, Web logs, sensors, medical images, telecom, public sector databases etc all are grappling with this problem.

Where to start?

Hadoop started as a way to quickly process Web log files. Web 2.0 sites were finding that they were accumulating logs that contained valuable click information and user behavior data. As an alternative to parsing log data and storing it in a relational database, Hadoop emerged as a way to keep the log files in their original format and allow processing and analysis.

Though the basic concept is simple and powerful, let me link to some basic explanation to the post Pinal Dave wrote today. He takes a stab at simply demystifying the basics on Hadoop, Pigs, Hives, MapReduce. Feel free to read more on them:

  1. Pig – A high-level language that lets non-programmers use Hadoop
  2. Hive – An SQL query implementation for Hadoop
  3. HBase – A key/value store for Hadoop

One other resource I would like to point in this context is Cloudera from learning resources. Cloudera is a for-profit company that produces integrated, tested, and commercially supported Hadoop releases. Look at some of the other extensions they support as extensions – some new releases make an interesting read.

  • Hue – Hadoop user interface
  • Sqoop – tool to import relational data
  • Flume – tool to import nonrelational data
  • Oozie – workflow engine and many more.

Relational or DW Database Obsolete?

Personally, I don’t think we are talking about this-or-that Boolean approach here. There is something that makes these concepts of Hadoop interesting and viable for organizations to start considering. Let me call out some of them (not exhaustive though)-

  1. Hadoop clusters can be on x86 commodity hardware
  2. No need build cubes for predictive analysis of large data
  3. Relational DB have their own limits on scale-out and scale-up scenarios
  4. Addition of scale-out options easy with Hadoop

With this steady stream of data, is this what the industry is also looking for? Check the McKinsey Global Institute – Big Data: The next frontier for innovation competition and productivity paper and the numbers are bind blowing.

  • 1.5 million more data saavy managers in the US alone
  • 140,000-190,000 deep analytical talent positions
  • €250 billion Potential annual value to Europe’s public sector
  • 15 out of 17 sectors in the US have more data stored per company than the US Library of Congress

Read the whitepaper and there are many more statistics that seem to make this Big Data really Big. Now take examples of big data patterns and sites like facebook or twitter with millions of data stream coming every minute and you want some analytics. Does this Big data architecture qualify here? or do you need a different architectural choices? Well, don’t forget to tune into our India TechEd Architecture track for the details :).

Microsoft Integration Points

From Microsoft, you are going to see lot of work to happen as it is data. Applications like Excel, PowerPivot, Power View, SQL Server Analysis Services, SQL Server Reporting Services are some of the integration we have seen in the recent past at SQL PASS. More about this can be read from the MS Big data home site.

Channel-9 Video: Lynn Langit and Dave Nielsen discuss "Big Data" in the Cloud

MSR Research Paper on Big Data – gives a nice read

Another Research Paper: Big Data and Cloud Computing: New Wine or just New Bottles?

What we can see is, as we get to know this more recent phenomenon of Big Data even the cloud seems to embrace it with two hands. You are going to see some serious integration across the platform and it is a great sign for us -

  1. Connectors for Hadoop, integrating it with SQL Server and SQL Sever Parallel Data Warehouse.
  2. An ODBC driver for Hive, permitting any Windows application to access and run queries against the Hive data warehouse.
  3. For developers, well now addition of JavaScript Layer to the Hadoop ecosystem is very compelling.
  4. An Excel Hive Add-in, which enables the movement of data directly from Hive into Excel or PowerPivot.

Where to start

I highly recommend using Apache Hadoop on Windows WIKI – please bookmark it. Now as a Microsoft ecosystem, there are 3 other interesting pages for reference you don’t want to miss.

On-Premise Deployment of Apache Hadoop for Windows

Windows Azure Deployment of Apache Hadoop for Windows

Windows Azure Deployment of Hadoop on the Elastic Map Reduce (EMR) Portal

This forms a great ecosystem from on-premise to the Cloud. As part of the whole bundle of links here, couldn’t resist from linking Rob Farley who has been kind enough to point out that Big Data now features in 24 hours of PASS too. Nice timing to talk more and more about Big data.


Personally, I see there is tons of learning with Big Data coming our way and 2012 will start the same conversation that we started about BI in Year 2005 timeframe. So get prepared for some Big Hype, Big Challenges, Big Insights and a Big Year of Big Data coming your way.

Continue reading...