Data Modernization

Build real-time analytics with Microsoft Azure Synapse Analytics and Cosmos DB

16 August 2023

In today’s fast-paced world, the ability to quickly access, analyze, and visualize data in real time is essential for businesses that want to remain competitive. Microsoft Azure Synapse Analytics is a cloud-based analytics service that allows you to ingest, prepare, manage, and serve data for immediate business intelligence and machine learning needs. It is a powerful solution combining big data and data warehousing into a modern cloud analytics service.

With Synapse Analytics, you can quickly process data at scale and generate insights from all your data, regardless of where it is stored. Microsoft Azure Synapse Analytics workspace offers various capabilities that cater to real-time analytics, advanced analytics, and reporting with Power BI. Understanding these capabilities is crucial when deciding whether this platform fits a particular organization.

Here is a high-level overview of the process to build real-time analytics with Microsoft Azure Synapse Analytics.

Step 1: Data ingestion

The first step towards building real-time analytics with Microsoft Azure Synapse Analytics is to ingest the data. Synapse Analytics can ingest streaming data in real time using Azure Stream Analytics, which supports a variety of sources, including Event Hubs, IoT Hub, and Kafka. It can also ingest batch data from various sources such as Azure Blob Storage, Azure Data Lake Storage, and SQL Server.

Step 2: Data preparation

Once ingested, the data is prepared for analysis. Microsoft Azure Synapse Analytics provides various tools and services for data preparation, including Azure Data Factory, which allows you to create data pipelines to transform and prepare data. You can also use tools such as Azure Databricks and Azure HDInsight to clean and transform your data quickly.

Step 3: Data warehousing

After the data is prepared, it must be stored in a data warehouse for further analysis. Microsoft Azure Synapse Analytics provides an Azure SQL data warehouse that can store petabytes of data and allows you to analyze your data at scale. You can also use Azure Databricks and Azure HDInsight on the same data set and choose the solution that best fits your needs.

Step 4: Analytics and visualization

You can now perform analytics and create visualizations to derive insights from the data stored in a data warehouse. Microsoft Azure Synapse Analytics provides various tools for analytics and visualization, including Power BI and Azure Analysis Services. You can use these tools to create custom dashboards, reports, and visualizations that help you make real-time data-driven decisions.

Azure Cosmos DB

Cosmos DB has multi-master support, which enables global data storage and reduces latency. It means data can be written into different databases, which brings data to the users’ nearest region for faster access. However, there can be a milliseconds difference between the replicated data, which can affect the consistency. Consistency indicates whether the data are in sync and the same state at any given time. Cosmos DB offers different levels of consistency with varying performances and availability. The various features provided by Cosmos DB also make it easier to index, scale, and make high availability possible.

Maintaining a robust data management system is essential to operations as businesses collect an ever-increasing amount of information. However, with traditional methods, transactional and analytical data stores can compete for resources, leading to inefficiencies and increased costs. Azure Synapse Link solves this issue by allowing Azure Cosmos DB to generate real-time insights on operational data without sacrificing transactional performance. This expedites the analytics, BI, and machine learning processes and reduces the cost of maintaining transactional and analytical stores.

Traditional approach before Microsoft Azure Synapse Analytics


1. Azure Cosmos DB container stores application data in the “transactional store,” which is indexed and row-based.

2. Azure Data Factory retrieves the data from Cosmos DB and periodically pushes it to the Data Lake (Azure Data Lake Store Gen2).

3. This data is transformed, aggregated, and pushed to the Cosmos DB for consumption.

Limitations of the traditional approach

1.    Performance and cost impact the transactional workloads.

2.   Data pulled in batched intervals from transactional store delay insights.

3.   Managing data formats and storage layers for analytics can be complex.

How Synapse Link overcomes these limitations?

Azure Synapse Link seamlessly integrates Azure Cosmos DB and Microsoft Azure Synapse Analytics.

1. All data in an Azure Cosmos DB container is stored in a row-based “transactional store,” optimized for quick transactional read, write, and operational queries—with response times measured in milliseconds. This format ensures that all your application data is effectively indexed, highly optimized for performance, and always ready to serve your needs.

2. Operational data transactions are automatically synced to an analytical store, also known as a column store, in just two minutes. The analytical store data instantly reflects all inserts, updates, and deletes to the transactional store data. The analytical store data is stored in parquet format, designed for efficient data compression, encoding schemes, and bulk data storage and retrieval. While most column format files are append-only, this analytical store handles updates and deletes, which ensures the latest data is always available for analysis.

3. Synapse Analytics offers a hybrid transactional and analytical processing (HTAP) capability called Synapse Link that enables seamless querying of an analytical store without any impact on the provisioned throughput of the transactional store. This is possible by using Synapse Apache Spark and a serverless SQL pool.

Below are some of the common Synapse Link use case patterns for Cosmos DB:

1. Run T-SQL analytical ad hoc queries — Query Cosmos DB data in seconds using the serverless SQL pool and full expressiveness of T-SQL language.

2. Build real-time dashboards — Create a Power BI workspace and integrate it with Synapse Analytics using the Synapse Link.

3. Build a cloud data warehouse — Analyze unified view across Azure Data Lake Store, Azure Cosmos DB, and Azure Blob Storage.

4. Prepare and train predictive models — With Synapse Spark pools and Azure ML integration in Microsoft Azure Synapse Analytics. You can quickly generate insights over operational data using machine learning. With Spark ML algorithms, you can build machine learning models rapidly without complex data engineering. Finally, you can write back the results after model inference into Azure Cosmos DB for near-real-time scoring.

PreludeSys’s expertise with Microsoft Azure Synapse Analytics implementation

Synapse Analytics is a modern Microsoft analytics platform that can unlock the full potential of your data. Building real-time analytics with Microsoft Azure Synapse Analytics is an easy and cost-effective way to bring real-time business intelligence to your organization. From understanding customer trends to optimizing campaigns, Azure Synapse Analytics helps businesses make informed decisions in an ever-changing environment. An expert service provider, such as PreludeSys, can support your organization’s full scope of action—from strategy setting through development and ongoing operations. With their expertise, you can build a comprehensive real-time analytics environment on Microsoft Azure Synapse Analytics that opens up new possibilities for digital transformation.

Get expert assistance today so that you can start discovering valuable insights.

Recent Posts