Wednesday, November 12, 2025

CES - Change Event Streaming in SQL Server 2025

As we know that Microsoft's data team has released a preview of SQL Server 2025, which will include a lot of great features, such as Database Mirroring to Microsoft Fabric, Flexible AI Model Management and many more. One of the most interesting feature "Change Event Streaming" is coming within SQL Server 2025 as the new feature release. 


SQL Server has already very interesting existing features like Change Data Capture (CDC) and Change Tracking (CT), as well as the introduction of new mechanisms for capturing and streaming data changes in real-time or near real-time. 

Change Event Streaming (CES) is a new native engine capability in SQL Server 2025 that allows you to stream database changes (inserts, updates, deletes, and possibly schema changes) as real-time events to external systems (e.g., Azure Event Hubs, Kafka) instead of relying purely on older batch/change capture approaches.

Current Limitations and Opportunities

SQL Server currently offers two primary mechanisms for tracking data changes: Change Data Capture (CDC) and Change Tracking (CT). While both are valuable, they have limitations:

  • Change Data Capture (CDC): Provides detailed historical changes, but can be complex to configure and manage. It also introduces overhead due to the asynchronous capture process and the need for cleanup jobs. CDC is also not available in all SQL Server editions.

  • Change Tracking (CT): Simpler to configure than CDC, but only provides information about which rows have changed, not the actual changes themselves. This requires additional queries to retrieve the changed data, potentially impacting performance.

SQL Server 2025 has the opportunity to address these limitations and provide a more robust and versatile change event streaming solution. Key areas for improvement include:

  • Real-time or Near Real-time Streaming: Reducing latency between data changes and their availability to downstream consumers.

  • Simplified Configuration and Management: Making it easier to set up and maintain change event streams.

  • Improved Performance and Scalability: Minimizing the impact on the source database and supporting high-volume change rates.

  • Enhanced Data Transformation and Enrichment: Providing mechanisms to transform and enrich change events before they are streamed.

  • Integration with Modern Data Architectures: Seamlessly integrating with cloud-based data lakes, message queues, and stream processing platforms.

  • Support for a wider range of data types: Expanding support for data types like JSON, XML, and spatial data.

Potential Enhancements in SQL Server 2025

Several enhancements could be implemented in SQL Server 2025 to improve change event streaming:

1. Enhanced Change Data Capture (CDC)

  • Real-time CDC: Implement a more real-time CDC mechanism, potentially using transaction log streaming or other techniques to capture changes with minimal latency. This could involve a new CDC agent that directly streams changes to a message queue or other destination.

  • Simplified Configuration: Introduce a more user-friendly interface for configuring CDC, potentially using a wizard-driven approach or declarative configuration files.

  • Automated Cleanup: Improve the automated cleanup process for CDC change tables, reducing the need for manual intervention.

  • Selective CDC: Allow users to specify which columns to capture changes for, reducing storage overhead and improving performance.

  • CDC for Temporal Tables: Extend CDC support to temporal tables, allowing users to track changes to historical data.

2. Improved Change Tracking (CT)

  • Change Tracking with Values: Enhance CT to optionally capture the actual changed values, eliminating the need for separate queries to retrieve the data. This could be implemented as a new option when enabling CT on a table.

  • Change Tracking with History: Provide an option to retain a history of changes within the change tracking system, allowing users to query past changes without relying on CDC.

  • Improved Performance: Optimize the performance of CT queries, especially for large tables with high change rates.

3. New Change Event Streaming Mechanism

  • Transaction Log Streaming: Introduce a new mechanism for streaming transaction log records directly to downstream consumers. This would provide a highly scalable and low-latency solution for capturing all data changes. This would require careful consideration of security and data privacy.

  • Database Triggers with Streaming: Allow users to define database triggers that automatically stream change events to a message queue or other destination. This would provide a flexible and customizable solution for capturing specific types of changes.

  • Change Event Queue: Introduce a built-in change event queue within SQL Server, allowing users to publish and subscribe to change events. This would provide a centralized and reliable mechanism for managing change event streams.

4. Data Transformation and Enrichment

  • Integration with SQL Server Integration Services (SSIS): Provide seamless integration with SSIS to allow users to transform and enrich change events before they are streamed.

  • Support for User-Defined Functions (UDFs): Allow users to define UDFs that can be applied to change events to perform custom transformations and enrichments.

  • Built-in Transformation Functions: Provide a library of built-in transformation functions for common tasks such as data masking, data type conversion, and data aggregation.

5. Integration with Modern Data Architectures

  • Native Support for Message Queues: Provide native support for popular message queues such as Apache Kafka, Azure Event Hubs, and RabbitMQ.

  • Integration with Cloud-Based Data Lakes: Provide seamless integration with cloud-based data lakes such as Azure Data Lake Storage and Amazon S3.

  • Support for Stream Processing Platforms: Provide support for stream processing platforms such as Apache Spark Streaming and Azure Stream Analytics.

  • REST API for Change Events: Expose a REST API for accessing change events, allowing applications to easily consume change data over HTTP.

6. Security and Governance

  • Role-Based Access Control (RBAC): Implement RBAC for change event streaming, allowing administrators to control which users and applications have access to change data.

  • Data Masking and Encryption: Provide mechanisms for masking and encrypting sensitive data in change events.

  • Auditing: Audit all change event streaming activities, providing a record of who accessed what data and when.

  • Data Retention Policies: Allow administrators to define data retention policies for change events, ensuring that data is not retained longer than necessary.

Considerations for Implementation

Implementing these enhancements will require careful consideration of several factors:

  • Performance Impact: Minimizing the impact on the source database is crucial. Any new change event streaming mechanism should be designed to be highly performant and scalable.

  • Backward Compatibility: Maintaining backward compatibility with existing CDC and CT implementations is important to avoid breaking existing applications.

  • Complexity: Simplifying the configuration and management of change event streams is essential to make them accessible to a wider range of users.

  • Security: Ensuring the security and privacy of change data is paramount. Robust security measures should be implemented to protect sensitive data.

  • Cost: The cost of implementing and operating change event streaming should be considered. The solution should be cost-effective for both small and large deployments. 

How It Works

CES operates by scanning the SQL Server transaction log for DML changes on tracked tables within defined "event stream groups." Each change is emitted as an individual event containing:

  • Row-level details: Previous and new values, schema metadata (e.g., column names/types), and operation type.
  • Transaction context: Grouping for atomicity (all changes from a single transaction are bundled).
  • CloudEvents metadata: Standardized attributes like event ID, timestamp, source (table name), and type (e.g., com.microsoft.sqlserver.dml.insert).

Events are batched and sent asynchronously to Azure Event Hubs via AMQP (default) or Apache Kafka protocols. Large messages (e.g., those exceeding ~25% of the configured max size) are automatically split into multiple events. Delivery is guaranteed at least once, with retries on transient failures; the transaction log is protected from truncation until events are successfully processed, ensuring no data loss.

Key T-SQL components:

  • Event Stream Group: Defines the destination (e.g., Event Hubs endpoint), authentication, partitioning scheme (e.g., by table or hash), and max message size (up to 1 MB).
  • Tracked Tables: Added to groups; changes from these tables trigger events.

CES ignores DDL operations (e.g., ALTER TABLE), but subsequent DML events reflect the updated schema. It supports BLOB columns (configurable inclusion) and does not require primary keys on tables.

Benefits and How It Helps

CES addresses key pain points in data integration by providing a lightweight, scalable alternative to polling-based or CDC-heavy approaches. Here's how it helps:

  • Low Overhead and High Performance: Reads directly from the transaction log without writing changes back to the database (unlike CDC), reducing I/O and storage needs. It supports high-throughput scenarios (e.g., millions of events/hour) with sub-second latency, as Event Hubs handles scaling independently.
  • Real-Time Data Flow: Enables immediate downstream processing, preventing delays in analytics or notifications. For instance, it protects the log from truncation until delivery succeeds, ensuring consistency without manual intervention.
  • Decoupling and Flexibility: Producers (SQL Server) and consumers (e.g., microservices, analytics pipelines) operate independently. A single stream can fan out to multiple subscribers, supporting diverse formats (JSON/Avro) and tools like Azure Stream Analytics or Kafka Connect.
  • Cost Efficiency: Minimizes database resource usage and leverages Azure's pay-per-throughput model for Event Hubs, avoiding the need for custom ETL jobs or third-party connectors.
  • Reliability: At-least-once semantics with built-in retries and error monitoring via DMVs/Extended Events. Transactional grouping ensures atomic updates in consumers.

In real-time scenarios, CES helps by streaming changes for instant synchronization (e.g., updating a cache) or triggering actions (e.g., fraud alerts), all while maintaining SQL Server's ACID guarantees.

BenefitHow It HelpsComparison to CDC
Low OverheadDirect log scan; no extra tables/indexesCDC writes to change tables
ScalabilityEvent Hubs auto-scales; up to 40K tables/groupCDC limited by DB resources
Real-Time LatencyNear-real-time emission (~ms)CDC requires periodic jobs
DecouplingMulti-consumer streamsTightly coupled to DB

Use Cases

  • Event-Driven Architectures: Stream customer orders from an e-commerce DB to trigger inventory updates, notifications, or personalization engines in real-time.
  • Real-Time Analytics: Pipe changes to Azure Synapse or Power BI for live dashboards, e.g., monitoring sales metrics without batch ETL.
  • Data Synchronization: Replicate changes across hybrid/multi-cloud systems, like syncing on-premises SQL data to Azure Cosmos DB for global apps.
  • Auditing and Compliance: Capture sensitive data modifications (e.g., financial transactions) for immutable logging in Event Hubs, integrated with SIEM tools.
  • Microservices Integration: Enable loose coupling in distributed systems, where services subscribe to relevant table streams for reactive processing.

Limitations and Considerations

  • Preview Status: Subject to changes; not supported in production yet. Incompatible with CDC, transactional replication, Fabric Mirrored Databases, or Always On AG secondaries (streams only from primary).
  • Unsupported Elements: DDL events, certain data types (e.g., json, vector, spatial), features (e.g., temporal tables, In-Memory OLTP), and operations (e.g., TRUNCATE, partition switches). No multi-destination per table.
  • Performance: Large BLOBs or high churn can increase log growth; monitor via sys.dm_change_feed_log_scan_sessions. Message splitting may add complexity for consumers.
  • Security: SAS tokens for auth (no Entra ID yet); events bypass row-level security/masking.
  • Troubleshooting: Failures cause log bloat; requires manual restart with potential data loss for interim changes.

Summary

The introduction of Change Event Streaming in SQL Server 2025 marks a significant evolution: making the relational database not just a storage engine for transactions, but also a real-time source of change events for modern application scenarios. With it, you can build event-driven architectures, real-time analytics, and hybrid data solutions more naturally, with less custom plumbing.

Enhancing change event streaming capabilities in SQL Server 2025 is a critical step in enabling real-time data integration and analytics. By addressing the limitations of existing mechanisms and introducing new features, SQL Server can become a more powerful and versatile platform for modern data architectures. The proposed enhancements, including real-time CDC, improved CT, a new transaction log streaming mechanism, and seamless integration with cloud-based data lakes and message queues, will empower users to build more responsive and data-driven applications. Careful consideration of performance, security, and cost will be essential to ensure the success of these enhancements.

References: Microsoft

 

No comments:

Post a Comment