If You Have Data, You Need CDC for ETL

This is a writing sample from Scripted writer Jane Haynie

Most companies have data. A lot of it. And most companies that have data also have multiple databases, data warehouses, applications, and tools for storing, processing, and delivering that data. And, of course, most companies that have multiple repositories also have a data integration strategy to make sure the right data gets to the right place at the right time. And…do you see where this is going yet…most companies that fit that scenario also need extract, transform, load (ETL) processes to move that data between the source databases and all those other little tools, apps, and data lakes that want access to it, too.
 
ETL is at the heart of most data integration strategies because without it - or at least, without ETL tools to automate it - data either sits idle and siloed, or lucky IT administrators get to manually move it around with those pretty Excel spreadsheets - a technique that is ineffective and more than a little annoying.
 
But ETL alone has its own challenges. Even with automated tools, it is still typically executed in batches - often overnight during off-hours - which means some systems and applications are delivering reports that are up to 24 hours old. This might not seem like a big deal to anyone born before 1990, but with the current pace of commerce, data that is even one hour old can be too old to be useful. We want our data delivered consistently and accurately so it is accessible how and when we need it.
 
Enter change data capture.
 
Change data capture simplifies ETL by only transforming and loading data that have been identified as ‘updated’ by the source system. In other words, it scans the source system for changed data, logs those changes, and then sends the updated data to the target system, rather than extracting, transforming, and loading every byte of data in the source database. It also does this continuously so the flow of data between sources and targets never stops or slows down. 
 
Why is this important? Well, real-time data is the new black, and it gives companies some serious advantages, such as the ability to:
  • Access to real-time dashboards
  • Accurately assess the frequency of transactions
  • Share data via APIs
  • Make time-sensitive decisions
  • Integrate data from mergers and acquisitions
  • Quickly respond to customer demands
As streaming databases and operational analytics increase in popularity and as their technology becomes more sophisticated, CDC is going to become critical to core IT. It can be implemented using a homegrown, scripted, or coded solution, or you can use any number of tools available on the market to get the job done. Regardless, there are a few techniques to be aware of:
  • Timestamp-based CDC - in this case, a timestamp column must be created that updates every time the data changes. The system recognizes the updated time stamp, logs the data, and loads it to the target database.
  • Trigger-based CDC - for this scenario, change data capture is executed based on a trigger identified in the rows and columns of the source database. That trigger becomes the action point for CDC to activate and update.
  • Log-based CDC - this is often the ideal technique for ETL. Every database has transactional logs that record any changes to the data as a backup in case there’s a crash or failure. CDC can access these logs and use them to update the data in the target system.
Whatever technique you use, start the process of using change data capture in your data integration strategy. It will not only speed the process of ETL and free up time for your data management team, it will also set your company on a path to adopt the cutting-edge technologies that will propel your organization into the future.
 

Written by:

Jane Haynie
Hire Jane H
I have been a professional writer for tech and SaaS companies for over ten years. My specialty is taking complex subjects and making them compelling and simple to digest. I like to infuse creativity and humor into my work, when appropriate, and excel at staying focused on the mindset and lifestyle of the target audience. I also own a local gym and can write about fitness, diet, lifestyle, and related topics.
Customer Ratings:
Star Star Star Star Half-star
22 reviews
Hire Jane H

Power your marketing with great writing.

Get Started