Azure Synapse Analysis

Scale Out with Azure Synapse Analytics

April 22, 2020

Azure Synapse Analytics is the next phase of Azure SQL Data Warehouse and can handle structured, streaming, and unstructured data. Microsoft's vision is to create a seamless experience across all data and analytics needs in the cloud. Synapse will basically unify the core of SQL Server with Spark-enabled big data and streaming data processing.

There are some very interesting features coming with Azure Synapse Analytics.

·         SQL Analytics Massively Parallel Processing Architecture

·         Scaling Compute

·         Query both relational and non-relational data

·         SQL query on-demand (Preview)

·         SQL Pool (GA)

·         Workload Isolation (Preview)

·         Integrated Apache Spark (Preview)

·         Data Integration (Preview)

·         Studio (Preview)

Some of the new features are still in preview. In this blog we will have a quick look at the first two; SQL Analytics Massively Parallel Processing Architecture and Scaling Compute.

 

SQL Analytics Massively Parallel Processing Architecture

A Scale-out architecture which utilizes distributed computational processing of data across multiple nodes.

Compute is separate from storage, which allows us to independently scale out the compute power. Decoupled storage and compute means:

·         Pausing and resuming Compute => cost of running saving

·         Scaling without any change to storage level

All data is store in relational tables with columnar storage. This dramatically improves performance and reduce cost of storage.

 

Scaling Compute

You can scale out or scale back compute by adjusting the data warehouse unit settings for your SQL pool. Loading and query performance can increase linearly as you add more data warehouse units.

For a scale operation, the system detaches the storage layer from the compute nodes, adds compute nodes, and then reattaches the storage layer to the Compute layer. Each SQL pool is stored as 60 distributions, which are evenly distributed to the compute nodes. Adding more compute nodes adds more compute power. As the number of compute nodes increases, the number of distributions per compute node decreases, providing more compute power for your queries. Likewise, decreasing data warehouse units reduces the number of compute nodes, which reduces the compute resources for queries.

Scaling duration depends on whether there is a huge or small data volume to be rolled-back

Flexibly match the increase load demands of data loading or data querying anytime

Scale-out only benefits when the volume of data in DW exceeds 1-TB

 

Scale-out impacts

·         Linearly improves performance of the system for scans, aggregations, and CTAS statements.

·         Increases the number of readers and writers for loading data.

·         Maximum number of concurrent queries and concurrency slots.

 Scale-out might not help when

·         Data is skewed across the distributions

·         The huge volume of data moved around is required by queries execution plan

 

Contact one of our sales representitives today if you would like more information on leveraging the newest features of Microsoft Azure in your organisation!