Skip to main content

Change Data Capture

Change data capture (CDC) is a data integration technique that allows you to track all the changes made to one or more tables in a relational database.

My primary use-case for CDC has been to sync data between a relational database and a Kafka cluster. Debezium is the de-facto standard way to do change data capture with Kafka. It integrates perfectly with the ecosystem via Kafka connect.

Debezium embedded

Recently (nov 2023), I found out that you can embed Debezium in your database and couldn't help by trying it out. I took some notes along the way:

  • The example in the official docs is bare but functional.
  • Obviously I run into the "Postgres server wal_level property must be 'logical' but is: 'replica'" error. I found a cool way of changing this in my dev postgres server:
    1. ALTER SYSTEM SET wal_level = logical;
    2. restart the server. Nice!
  • I got a sample app running in no time:
    • The api is really nice, in fact you can change the format in which you receive the data with one line of code.
    • I wish there was a simple way to get the schema on its own (you can get it inside the data values itself but that's a little cumbersome)

The code looks like this:

val engine = DebeziumEngine.create(KeyValueChangeEventFormat.of(Json::class.java, Json::class.java))
.using(props("engine-${UUID.randomUUID()}", "foo"))
.notifying { record ->
println(record)
}.build()

val executor = Executors.newSingleThreadExecutor()
executor.execute(engine)

Runtime.getRuntime().addShutdownHook(Thread {
engine.close()
executor.shutdown()
})

I have an example repo in Kotlin if you want to play around with this.