Blog

Microservices Communication Change Data Capture (CDC) And Debezium

Microservices Communications

Microservices are an architectural style that develops a single application as a set of small services. Each service runs in its own process. The services communicate with clients, and often each other, using lightweight protocols, often over messaging or HTTP.

A microservice architecture is all about communication. How should services communicate in any given business scenario? Should they call each other synchronously? Or should they communicate via asynchronous messaging? As always, this is not a black-or-white decision. In this article we deep dive to async communication. In this communication model the services push messages to a message broker that other services subscribe to. This eliminates a lot of complexity associated with HTTP communication.

It doesn’t require services to know how to talk to one another; it removes the need for services to call each other directly. Instead, all services know of a message broker, and they push messages to that broker. Other services can choose to subscribe to the messages in the broker that they care about.

Change Data Capture

Change Data Capture is a method used to detect changes in data and apply those changes to a different data store. When a change occurs in a source system, such as a database, action is taken by a process or another system to store or replicate those changes. Change Data Capture can be used for the replication of data, from one system to another, such as a big data platform, or to ensure that a source system and a target system are in sync.

There are two main approaches to implementing Change Data Capture, namely, pull-based and push-based. Both approaches have benefits and drawbacks which are unique to each solution.

In the pull-based approach to Change Data Capture, it is the responsibility of the target (destination) system to periodically pull updated data from the source system. This approach is not efficient and is the less preferred approach when it comes to doing CDC in real-time. Essentially done by polling the source for changes, this is usually done in set intervals of time and isn’t based on changes to the data on the source.

In a push-based system, the source system broadcasts any changes to data, and the target system which is subscribed to those broadcasts performs actions that updates its own copy of the data. The push-based approach to Change Data Capture (CDC) experiences less latency as the target system is aware of changes the moment they occur. It also works well when there are multiple destinations for where the data will be pushed to. This is extremely relevant in modern times where organizations tend to use multiple platforms for storage, analytics, and insights.

Why You Should Be Using Change Data Capture

There are many benefits CDC can provide to businesses. The following are the top seven reasons why you should begin using Change Data Capture today.

1. Provides Real-time Data Loading Into a Data Warehouse and Connects Different Database Systems

One significant benefit of CDC is that it allows companies to complete real-time data loading into a data warehouse. CDC also allows incompatible database systems in near real-time. These features make CDC especially beneficial for mid to large-sized organizations who work with multiple database systems when handling their data.

2. Minimizes Disruptions to Production Workloads

Another benefit of CDC is that its efficiency helps to minimize disruptions to production workloads. Because CDC can push data to multiple lines of business in near real-time, organizations can utilize CDC to continuously update data marts with everything from sales to canceled orders to comprehensive customer data. Ultimately, minimizing disruptions to production workloads by processing data fast and efficiently is very beneficial to companies of all sizes.

3. Helps to Improve a Company’s Master Data Management System

If a company is looking to get a better handle on or even significantly improve its master data management system, CDC can prove quite beneficial. Ultimately, CDC allows IT teams to quickly draw data from multiple sources while also continuously updating the organization’s master data management system or master record of critical data. Of course, CDC’s ability to keep an organization’s critical data safe, secure, and up to date is greatly beneficial.

4. Helps to Integrate Apps With Otherwise 4. Helps to Integrate Apps With Otherwise Incompatible Databases

Another benefit CDC offers to organizations is its capabilities to integrate software tools that are generally incompatible with in-house database systems. Ultimately, this functionality allows organizations to be more flexible when it comes to choosing business applications and provides access to more app options overall. This also ensures that more time is spent finding apps that will help companies reach their business goals rather than spending countless hours worrying about database compatibility.

5. Accelerates Reporting and Business Intelligence Capabilities

One of the top benefits of CDC is that it accelerates reporting and business intelligence capabilities. Because CDC enables data to move quickly between different databases, faster collection of data is possible. Of course, this makes it possible for more timely reporting and significantly improved business intelligence capabilities.

6. Allows for the Integration of Apps with Otherwise Incompatible Databases

Another reason change data capture is beneficial is that it allows certain software tools to easily integrate with otherwise incompatible database systems. This also means that companies gain more flexibility when it comes to deciding which apps to use and deploy for the business.

7. Helps to Reduce Pressure on Operational Databases

Through the implementation of CDC, companies can create a copy of operational databases. Users can then access this copy which helps to reduce the overall stress on production systems as excess traffic can be sent to the secondary database instead.

Debezium

Debezium is a distributed platform that turns your existing databases into event streams, so applications can see and respond immediately to each row-level change in the databases. Debezium is built on top of Apache Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, from where your application consumes them. This makes it possible for your application to easily consume all of the events correctly and completely. Even if your application stops (or crashes), upon restart it will start consuming the events where it left off so it misses nothing. In my next article, I will discuss in more detail how Debezium handled this situation.

Unlike other approaches such as polling or dual writes, log-based CDC as implemented by Debezium:

  • Makes sure that all data changes are captured.
  • Produces change events with a very low delay (e.g. ms range for MySQL or Postgres) while avoiding increased CPU usage of frequent polling.
  • Requires no changes to your data model (such as the “last_updated” column).
  • Can capture deletes.
  • Can capture old record state and further metadata such as transaction id and causing a query (depending on the database’s capabilities and configuration).