Eight scenarios with Apache Spark on Azure that will transform any business

Source: https://blogs.technet.microsoft.com/dataplatforminsider/2016/08/29/eight-scenarios-with-apache-spark-on-azure-that-will-transform-any-business/

This post was authored by Rimma Nehme, Technical Assistant, Data Group.


Since its birth in 2009, and the time it was open sourced in 2010, Apache Spark has grown to become one of the largest open source communities in big data with over 400 organizations from 100 companies contributing to it. Spark stands out for its ability to process large volumes of data 100x faster, because data is persisted in-memory. Azure cloud makes Apache Spark incredibly easy and cost effective to deploy with no hardware to buy, no software to configure, with a full notebook experience to author compelling narratives, and integration with partner business intelligence tools. In this blog post, I am going to review of some of the truly game-changing usage scenarios withApache Spark on Azure that companies can employ in their context.

Scenario #1: Streaming data, IoT and real-time analytics

Apache Spark’s key use case is its ability to process streaming data. With so much data being processed on a daily basis, it has become essential for companies to be able to stream and analyze it all in real time. Spark Streaming has the capability to handle this type of workload exceptionally well. As shown in the image below, a user can create an Azure Event Hub (or an Azure IoT Hub) to ingest rapidly arriving data into the cloud; both Event and IoT Hubs can intake millions of events and sensor updates per second that can then be processed in real-time by Spark.

Scenario 1_Spark Streaming

Businesses can use this scenario today for:

  • Streaming ETL: In traditional ETL (extract, transform, load) scenarios, the tools are used for batch processing, and data must be first read in its entirety, converted to a database compatible format, and then written to the target database. With Streaming ETL, data is continually cleaned and aggregated before it is pushed into data stores or for further analysis.
  • Data enrichment: Streaming capability can be used to enrich live data by combining it with static or ‘stationary’ data, thus allowing businesses to conduct more complete real-time data analysis. Online advertisers use data enrichment to combine historical customer data with live customer behavior data and deliver more personalized and targeted ads in real-time and in the context of what customers are doing. Since advertising is so time-sensitive, companies have to move fast if they want to capture mindshare. Spark on Azure is one way to help achieve that.
  • Trigger event detection: Spark Streaming can allow companies to detect and respond quickly to rare or unusual behaviors (“trigger events”) that could indicate a potentially serious problem within the system. For instance, financial institutions can use triggers to detect fraudulent transactions and stop fraud in its tracks. Hospitals can also use triggers to detect potentially dangerous health changes while monitoring patient vital signs and sending automatic alerts to the right caregivers who can then take immediate and appropriate action.
  • Complex session analysis: Using Spark Streaming, businesses can use events relating to live sessions, such as user activity after logging into a website or application, can be grouped together and quickly analyzed. Session information can also be used to continuously update machine learning models. Companies can then use this functionality to gain immediate insights as to how users are engaging on their site and provide more real-time personalized experiences.

Scenario #2: Visual data exploration and interactive analysis

Using Spark SQL running against data stored in Azure, companies can use BI tools such as Power BI, PowerApps, Flow, SAP Lumira, QlikView and Tableau to analyze and visualize their big data. Spark’s interactive analytics capability is fast enough to perform exploratory queries without sampling. By combining Spark with visualization tools, complex data sets can be processed and visualized interactively. These easy-to-use interfaces then allow even non-technical users to visually explore data, create models and share results. Because wider audience can analyze big data without preconceived notions, companies can test new ideas and visualize important findings in their data earlier than ever before. Companies can identify new trends and new relationships that were not apparent before and quickly drill down into them, ask new questions and find ways to innovate in new and smarter ways.

Scenario 2_Spark visual data exploration and interactive analysis

This scenario is even more powerful when interactive data discovery is combined with predictive analytics (more on this later in this blog). Based on relationships and trends identified during discovery, companies can use logistic regression or decision tree techniques to predict the probability of certain events in the future (e.g., customer churn probability). Companies can then take specific, targeted actions to control or avert certain events.

Scenario #3: Spark with NoSQL (HBase and Azure DocumentDB)

This scenario provides scalable and reliable Spark access to NoSQL data stored either in HBase or our blazing fast, planet-scale Azure DocumentDB, through “native” data access APIs. Apache HBase is an open-source NoSQL database that is built on Hadoop and modeled after Google BigTable. DocumentDB is a true schema-free managed NoSQL database service running in Azure designed for modern mobile, web, gaming, and IoT scenarios. DocumentDB ensures 99% of your reads are served under 10 milliseconds and 99% of your writes are served under 15 milliseconds. It also provides schema flexibility, and the ability to easily scale a database up and down on demand.

The Spark with NoSQL scenario enables ad-hoc, interactive queries on big data. NoSQL can be used for capturing data that is collected incrementally from various sources across the globe. This includes social analytics, time series, game or application telemetry, retail catalogs, up-to-date trends and counters, and audit log systems. Spark can then be used for running advanced analytics algorithms at scale on top of the data coming from NoSQL.

Scenario 3_Spark NoSQL

Companies can employ this scenario in online shopping recommendations, spam classifiers for real time communication applications, predictive analytics for personalization, and fraud detection models for mobile applications that need to make instant decisions to accept or reject a payment. I would also include in this category a broad group of applications that are really “next-gen” data warehousing, where large amounts of data needs to be processed inexpensively and then served in an interactive form to many users globally. Finally, internet of things scenarios fit in here as well, with the obvious difference that the data represents the actions of machines instead of people.

Scenario #4: Spark with Data Lake

Spark on Azure can be configured to use Azure Data Lake Store (ADLS) as an additional storage. ADLS is an enterprise-class, hyper-scale repository for big data analytic workloads. Azure Data Lake includes all the capabilities required to make it easy for developers, data scientists, and analysts in an enterprise environment to store data of any size, shape and speed, and do all types of processing and analytics across platforms and languages. Because ADLS is a file system compatible with Hadoop Distributed File System (HDFS), it makes it very easy to combine it with Spark for running computations at scale using pre-existing Spark queries.

Scenario 4_Spark with Data Lake

The data lake scenario arose because new types of data needed to be captured and exploited by companies, while still preserving all of the enterprise-level requirements like security, availability, compliance, failover, etc. Spark with data lake scenario enables a truly scalable advanced analytics on healthcare data, financial data, business-sensitive data, geo-location coordinates, clickstream data, server log, social media, machine and sensor data. If companies want an easy way of building data pipelines, have unparalleled performance, insure their data quality, manage access control, perform change data capture (CDC) processing, get enterprise-level security seamlessly and have world-class management and debugging tools, this is the scenario they need to implement.

Scenario #5: Spark with SQL Data Warehouse

While there is still a lot of confusion, Spark and big data analytics is not a replacement for traditional data warehousing. Instead, Spark on Azure can complement and enhance a company’s data warehousing efforts by modernizing the company’s approaches to analytics. A data warehouse can be viewed as an ‘information archive’ that supports business intelligence (BI) users and reporting tools for mission-critical functions of company. My definition of mission-critical is any system that supports revenue generation or cost control. If such a system fails, companies would have to manually perform these tasks to prevent loss of revenue or increased cost. Big data analytics systems like Spark help augment such systems by running more sophisticated computations, smarter analytics and delivering deeper insights using larger and more diverse datasets.

Azure SQL Data Warehouse (SQLDW) is a cloud-based, scale-out database capable of processing massive volumes of data, both relational and non-relational. Built on our massively parallel processing (MPP) architecture, SQLDW combines the power of the SQL Server relational database with Azure cloud scale-out capabilities. You can increase, decrease, pause, or resume a data warehouse in seconds with SQLDW. Furthermore, you save costs by scaling out CPU when you need it and cutting back usage during non-peak times. SQLDW is the manifestation of elastic future of data warehousing in the cloud.

Scenario 5_Spark with SQLDW

Some of the use cases of Spark with SQLDW scenario may include: using data warehouse to get a better understanding of its customers across product groups, then using Spark for predictive analytics on top of that data. Running advanced analytics using Spark on top of the enterprise data warehouse containing sales, marketing, store management, point of sale, customer loyalty, and supply chain data, then run advanced analytics using Spark to drive more informed business decisions at the corporate, regional, and store levels. Using Spark with the data warehousing data, companies can literally do anything from risk modeling, to parallel processing of large graphs, to advanced analytics, text processing – all on top of their elastic data warehouse.

Scenario #6: Machine Learning using R Server, MLlib

Another and probably one of the most prominent Spark use cases in Azure is machine learning. By storing datasets in-memory during a job, Spark has great performance for iterative queries common in machine learning workloads. Common machine learning tasks that can be run with Spark in Azure include (but are not limited to) classification, regression, clustering, topic modeling, singular value decomposition (SVD) and principal component analysis (PCA) and hypothesis testing and calculating sample statistics.

Typically, if you want to train a statistical model on very large amounts of data, you need three things:

  • Storage platform capable of holding all of the training data
  • Computational platform capable of efficiently performing the heavy-duty mathematical computations required
  • Statistical computing language with algorithms that can take advantage of the storage and computation power

Microsoft R Server, running on HDInsight with Apache Spark provides all three things above. Microsoft R Server runs within HDInsight Hadoop nodes running on Microsoft Azure. Better yet, the big-data-capable algorithms of ScaleR takes advantage of the in-memory architecture of Spark, dramatically reducing the time needed to train models on large data. With multi-threaded math libraries and transparent parallelization in R Server, customers can handle up to 1000x more data and up to 50x faster speeds than open source R. And if your data grows or you just need more power, you can dynamically add nodes to the Spark cluster using the Azure portal. Spark in Azure also includes MLlib for a variety of scalable machine learning algorithms, or you can use your own libraries. Some of the common applications of machine learning scenario with Spark on Azure are listed in a table below.

Vertical Sales and Marketing Finance and Risk Customer and Channel Operations and Workforce
Retail Demand forecastingLoyalty programs

Cross-sell and upsell

Customer acquisition

Fraud detectionPricing strategy PersonalizationLifetime customer value

Product segmentation

Store location demographicsSupply chain management

Inventory management

Financial Services Customer churnLoyalty programs

Cross-sell and upsell

Customer acquisition

Fraud detectionRisk and compliance

Loan defaults

PersonalizationLifetime customer value Call center optimizationPay for performance
Healthcare Marketing mix optimizationPatient acquisition Fraud detectionBill collection Population healthPatient demographics Operational efficiencyPay for performance
Manufacturing Demand forecastingMarketing mix optimization Pricing strategyPerf risk management Supply chain optimizationPersonalization Remote monitoringPredictive maintenance

Asset management


Scenario 6_Spark Machine Learning

Examples with just a few lines of code that you can try out right now:

Scenario #7: Putting it all together in a notebook experience

For data scientists, we provide out-of-the-box integration with Jupyter (iPython), the most popular open source notebook in the world. Unlike other managed Spark offerings that might require you to install your own notebooks, we worked with the Jupyter OSS community to enhance the kernel to allow Spark execution through a REST endpoint.

We co-led “Project Livy” with Cloudera and other organizations to create an open source Apache licensed REST web service that makes Spark a more robust back-end for running interactive notebooks.  As a result, Jupyter notebooks are now accessible within HDInsight out-of-the-box. In this scenario, we can use all of the services in Azure mentioned above with Spark with a full notebook experience to author compelling narratives and create data science collaborative spaces. Jupyter is a multi-lingual REPL on steroids. Jupyter notebook provides a collection of tools for scientific computing using powerful interactive shells that combine code execution with the creation of a live computational document. These notebook files can contain arbitrary text, mathematical formulas, input code, results, graphics, videos and any other kind of media that a modern web browser is capable of displaying. So, whether you’re absolutely new to R or Python or SQL or do some serious parallel/technical computing, the Jupyter Notebook in Azure is a great choice.

Scenario 7_Spark with Notebook

You can also use Zeppelin notebooks on Spark clusters in Azure to run Spark jobs. Zeppelin notebook for HDInsight Spark cluster is an offering just to showcase how to use Zeppelin in an Azure HDInsight Spark environment. If you want to use notebooks to work with HDInsight Spark, I recommend that you use Jupyter notebooks. To make development on Spark easier, we support IntelliJ Spark Tooling which introduces native authoring support for Scala and Java, local testing, remote debugging, and the ability to submit Spark applications to the Azure cloud.

Scenario #8: Using Excel with Spark

As a final example, I wanted to describe the ability to connect Excel to Spark cluster running in Azure using the Microsoft Open Database Connectivity (ODBC) Spark Driver. Download it here.

Scenario 8_Spark with Excel

Excel is one of the most popular clients for data analytics on Microsoft platforms. In Excel, our primary BI tools such as PowerPivot, data-modeling tools, Power View, and other data-visualization tools are built right into the software, no additional downloads required. This enables users of all levels to do self-service BI using the familiar interface of Excel. Through a Spark Add-in for Excel users can easily analyze massive amounts of structured or unstructured data with a very familiar tool.



How to select a Machine Learning Algorithm? Use this Cheat Sheet

Following Machine Learning Algorithm selection cheat sheet is put together by Azure ML Team.

You can follow the flow chart to select a suitable algorithm for your purpose. You might need to do additional analysis to ensure that the recommended algo is indeed the most suited for your specific use case.

You can download the PDF from this link: Microsoft Azure Machine Learning Algorithm Selection Flowchart


Machine Learning Algorithm Selction Flowchart

Microsoft Authenticator – Coming August 15th!

Microsoft just Announced that new Authenticator app will be available on all mobile app stores by Aug 15th.

Coolest part seems to be the support of wearable like Apple watch. Users would be able to approve multi factor authentication challenges right from their watch.

Source: https://blogs.technet.microsoft.com/enterprisemobility/2016/07/25/microsoft-authenticator-coming-august-15th/

Here are some of the new benefits you will see in the app updates:

  • User experience refresh. We’ve made the app experience incredibly simple while maintaining the highest level of security.
  • Best in breed MFA experience through one-click push notifications. You only need to click the “approve” button in the notification to complete your login. (And in most cases, you won’t even need to open the app to complete the approval.)
  • Support for wearables. You can use an Apple Watch or Samsung Gear device to approve MFA challenges.
  • Finger prints instead of passcodes. We’ve added support finger print based approvals on both iPhone and Android.
  • Certificate based authentication. Support for enterprise customers to sign in through certificates instead of passwords.

Azure Diagnostic Logs can now be streamed to Event Hubs

Source: https://azure.microsoft.com/en-us/blog/diagnostic-logs-streaming-to-event-hubs/

Author: John Kemnetz


Microsoft has announced an improved and expanded way to consume Azure Diagnostic Logs: streaming via Event Hubs. Azure Diagnostic Logs are logs emitted by a resource and provide insight into the operation of that resource. The content of these logs varies by resource type; for example, Windows event system logs are one category of Diagnostic Log for VMs, and blob, table, and queue logs are categories of Diagnostic Logs for storage accounts. These differ from Audit Logs, which provide insight into the operations that were performed on resources in your subscription. Previously, you could only store these logs for audit in a Storage account. Now, you can easily set up streaming Diagnostic Logs to Event Hubs using the Portal or the Azure PowerShell Cmdlets.

What Can I Do with Diagnostics Logs and Event Hubs?

Here are just a few ways you might use the streaming capability for Diagnostic Logs:


Azure ML adds support for multiple R & Python Versions, Including Microsoft R Open & Python 3

Source: https://blogs.technet.microsoft.com/machinelearning/2016/07/19/azure-ml-now-supports-multiple-r-python-versions-including-microsoft-r-open-python-3/

This post is authored by Roope Astala, Senior Program Manager at Microsoft.

We are thrilled to announce the support for multiple versions of R and Python in Azure ML.

As a data scientist building new models in R or Python, you often want to use the latest runtime and package versions which have the newest features and bug fixes. Meanwhile, you might have existing production models that you have to maintain and that rely on older versions.

Until recently, Azure ML solved the latter part of this problem: it provided a stable, backwards-compatible runtime for your R and Python models, both for experimentation and operationalization as a web service. However, you were limited to a single version of each runtime and packages that were compatible with that runtime.

Now Azure ML provides support for multiple R and Python versions. You can choose a newer version when building a new experiment, or you can update existing scripts to run under a newer version. Or you can keep using an old version if your legacy model depends on it. The supported versions include:

Always Encrypted in Azure SQL Database is Generally Available

Source: https://blogs.msdn.microsoft.com/sqlsecurity/2016/07/11/always-encrypted-in-azure-sql-database-is-generally-available/

Authored by 


Always Encrypted in Azure SQL Database is now generally available!

Always Encrypted is a feature designed to ensure sensitive data and its corresponding encryption keys are never revealed in plaintext to the database system. With Always Encrypted enabled, a SQL client driver encrypts and decrypts sensitive data inside client applications or application servers, by using keys stored in a trusted key store, such as Azure Key Vault or Windows Certificate Store on a client machine. As a result, even database administrators, other high privilege users, or attackers gaining illegal access to Azure SQL Database, cannot access the data.

Based on the feedback, we received during the preview of the feature, many of you have found Always Encrypted useful to achieve one or both of the following two goals:

  • To ensure the data is protected from malicious cloud database admins, co-tenants, or/and malware in Azure SQL Database.  Always Encrypted can guarantee full isolation of data from the cloud provider when client applications or middle tier services are hosted on premises, but even for all-Azure apps, Always Encrypted substantially reduces attack the surface area, by removing the database from it.
  • To prevent the disclosure of sensitive data within customer’s organization. With Always Encrypted, DBAs, who do not have access to the keys, can administer the database without having access to sensitive data in plaintext.
To get started using Always Encrypted in Azure SQL Database, see the Always Encrypted – Protect sensitive data in SQL Database tutorial that shows how to configure Always Encrypted and develop a .NET application querying encrypted columns.
For more information:

Announcing the general availability of Azure SQL Data Warehouse, an elastic, parallel, columnar data warehouse as a service

Source: https://blogs.technet.microsoft.com/dataplatforminsider/2016/07/12/the-elastic-future-of-data-warehousing/

This post was authored by Joseph Sirosh, Corporate Vice President, Data Group.

A defining characteristic of cloud computing is elasticity – the ability to rapidly provision and release resources to match what a workload requires – so that a user pays no more and no less than what they need to for the task at hand. Such just-in-time provisioning can save customers enormous amounts of money when their workloads are intermittent and heavily spiked. And in the modern enterprise, there are few workloads that have a desperate need for such elastic capabilities as data warehousing. Traditionally built on-premises with very expensive hardware and software, most enterprise Data Warehouse (DW) systems have very low utilization except during peak periods of data loading, transformation and report generation.

With the general availability of the Azure SQL Data Warehouse, we are delivering the true promise of cloud elasticity to data warehousing. It is a fully managed DW as a Service that you can provision in minutes and scale up to 60 times larger in seconds. With a few clicks in the Azure Portal, you can launch a data warehouse, and start analyzing or querying data at the scale of hundreds of terabytes. Our architecture separates compute and storage so that you can independently scale them, and use just the right amount of each at any given time. A very unique pause feature allows you to suspend compute in seconds and resume when needed while your data remains intact in Azure storage. And SQL Data Warehouse offers an availability SLA of 99.9% – the only public cloud data warehouse service that offers an availability SLA to customers.

According to Gartner, “For years, many data warehousing vendors have been operating from a playbook of tightly balanced storage and compute configuration units. Cloud architectures are forcing a shift in this approach, with vendors starting to decouple storage and compute, and allowing them to independently scale. We believe this approach to be the correct one, and that other vendors in the space will need to adopt this methodology if they are to stay competitive.1

Azure SQL Data Warehouse uses an elastic massively parallel processing (MPP) architecture built on top of the industry-leading SQL Server 2016 database engine. It allows you to interactively query and analyze data using the broad set of existing SQL-based tools and business intelligence applications that you use today. It uses column stores for high performance analytics and storage compression, a rich collection of aggregation capabilities of SQL Server, and state of the art query optimization capabilities. With built-in capabilities such as Polybase, it allows you to query Hadoop systems directly, enabling a single SQL-based query surface for all your data.

Azure SQL Data Warehouse is also part of the Cortana Intelligence Suite, which is a fully managed big data and advanced analytics suite to transform your data into intelligent action. It easily integrates with components of the suite such as Azure Data Factory for data integration pipelines, with Azure Machine Learning for predictive analytics, Power BI for business intelligence, HDInsight for big data insights, R and Sparkfor big data analytics. For an example of such integration, see the airline industry sample on PowerBI.com. This shows a Power BI report based on a real world predictive maintenance solution for a major airline. The data for this report comes from a variety of sources including IoTstreams from aircraft engines, air traffic control information, route restrictions and fuel usage data. All this is integrated and landed into a Azure SQL DW and processed with Azure Machine Learning to detect operational anomalies and trends.  The report is “live” and you can interact with it and experience Power BI in conjunction with Azure SQL DW and Azure ML.

The distinct capabilities of Azure SQL Data Warehouse include:

Data warehousing as a service

Gone are the pains associated with administering, managing, patching and manual tuning of data warehouses. There are no knobs to turn, no physical or virtual infrastructure to manage and the service is simple, resilient and secure with reliable storage. This enables the focus on driving the analytics and getting the value from your data rather than on managing your data warehousing software and hardware; Azure SQL Data Warehouse handles it all for you.

Unmatched security and access control

With malicious and even insider attacks becoming a key concern for enterprises, an alarm system over your critical enterprise data is a must have to avoid serious damage to your business and reputation. Only Azure SQL Data Warehouse delivers auditing and threat detection built into the service, with advanced machine learning to detect abnormal query patterns and alert you of potential security issues before it is too late. Data at rest is protected by Transparent Data Encryption.

Additionally, SQL Data Warehouse is the only cloud data warehouse service that works seamlessly with Azure Active Directory which currently supports 1.3 billion daily authentications across 600 million user accounts. This enables Single Sign-On (SSO) and role-based access control. You can even have finer-granularity permissions that let you control which operations a user can do on individual columns, tables, views, procedures, and other objects in the database. These features further protect data by ensuring just the right users have access to the right data—a critical capability when centralizing vast amounts of proprietary and sensitive data for analytics in an enterprise.

Multidimensional elasticity

Currently the majority of cloud database and data warehouse services are provisioned with fixed storage and compute resources. Resizing of resources typically compromises availability and/or performance. This means that service users typically end up with over-provisioned and expensive underutilized resources to accommodate possible peak demand or in the worst case, under-provisioned resources unable to handle sudden work overloads.

Unlike existing cloud services which can take anywhere from a couple of hours to a couple of days to do the data warehouse resizing, SQL Data Warehouse’s unique elastic technology decouples storage and compute, enabling each layer to become independently scalable almost instantaneously. This makes it possible to provision one or more data warehouses in minutes, and then independently scale users, data, and workloads in seconds to optimally match the demand. Further, elastic scaling also makes it possible to simultaneously load and query data, because every user and workload can have exactly the resources needed, without contention, and with minimal impact to production queries.

Getting featured in the iOS App Store was a big deal for a small company like ours as our users increased from 3,000 to 300,000 in 48 hours. To keep up with this 100x increase in workload, we simply added data warehouse compute capacity by moving a slider and our services just scaled in minutes—we didn’t miss an insight,” notes Paul Ohanian, CTO, PoundSand.

Save as you go, with fast pause and resume

Starting and shutting data warehouse clusters may take a considerable amount of time. Leaving the data warehouse running continuously incurs potentially high and unnecessary costs, especially if you are running your jobs periodically and the data warehouse is sitting idle in-between for extended periods of time. Now you can pause your data warehouse for the required time, saving compute costs, and quickly resume it later when needed. You can even write a PowerShell script, then automate the schedule with Azure Automation to automatically pause or resume the cluster based on the specific needs of your business.

When we learned about the pause and resume capabilities of SQL Data Warehouse and integrated services like Azure Machine Learning and Data Factory, we switched from AWS Redshift, migrating over 7TB of uncompressed data over a week for the simple reasons of saving money and enabling a more straight-forward implementation for advanced analytics. To meet our business intelligence requirements, we load data once or twice a month and then build reports for our customers. Not having the data warehouse service running all the time is key for our business and our bottom line,” said Bill Sabo, managing director of information technology at Integral Analytics.

Seamless querying of structured and unstructured data

An increasing amount of data in today’s rapidly digitizing world is unstructured data such as clickstreams, sensor data, location data, customer support emails and chat transcripts, much of which is harnessed for analysis in big data systems. The ability to integrate and join such data with your core relational enterprise data is often a highly desired capability. With built-in PolyBase technology, SQL Data Warehouse allows you to access and combine both non-relational and relational data. You can run queries on external data in Hadoop or Azure blob storage using familiar SQL, often without making any changes to the existing queries. Underneath, the queries are optimized for optimal execution without any burden on the user for tuning. Furthermore, you can quickly import and export data back and forth between relational tables in SQL Data Warehouse and non-relational data in Hadoop or Azure Blob Storage using simple T-SQL statements. The rich SQL programmability support (stored procs, functions and PolyBase) empower users to query the data however they want.

Christoph Leinemann, senior director data engineering at Jet.com says, “with Azure SQL Data Warehouse, we use PolyBase to ingest data from HDInsight then run thousands of analytical queries per day over tens of billions of records—about 20TB of data. This enables us to monitor price history and market dynamics to adjust pricing and ensure we’re offering our customers the best price.”

Integration with the SQL Server tool ecosystem you already use and love

Azure SQL Data Warehouse already fits into the tool ecosystem you already use, with native JBDC and ODBC connectors, and with a broad set of independent software vendors and partners who already support SQL Server, such as Alteryx, Attunity, Informatica, Redgate and SnapLogic. For BI capabilities, it integrates with the industry-leading Power BI service in Azure, and even with Microsoft Excel. For a beautifully visualized walkthrough of Microsoft Power BI and SQL Server 2016 Reporting Services including Mobile BI, please watch this demo. Microsoft also works with a set of popular BI partners to ensure the tools your teams use work great with SQL Data Warehouse, including Looker Data Sciences,Tableau Software and Qlik Technologies.

Experience modern data warehousing in the cloud for yourself

Today we have thousands of customers who are already using Azure SQL Data Warehouse. Many of these customers are experiencing significant performance gains over existing multi-million dollar data warehouses on-premises. With SQL Data Warehouse, some multi-hour queries in our customer environments finish now in under an hour, and some queries that took five to ten minutes now complete in seconds.Get started with SQL Data Warehouse today and experience the speed, scale, elasticity, security and ease of use of a true modern data warehouse as a service for yourself.


1Source: Gartner, The Data Warehouse and DMSA Market: Current and Future States, 201, June 16, 2016.