An Introduction to Change Data Capture (CDC)

Category : Articles Published : July 29, 2014 User Rating : 4 Stars      Views : 3 Stars
One Click Feedback
Please help us to improve the site by rating the quality of this article by clicking a button below.




Change data capture (CDC) is a method for tracking changes to data values in tables. This is useful if, for instance, you want to identify which rows of data have changed in order to load only changed rows into a data warehouse. Prior to the introduction of CDC you would have typically had to do this by adding a trigger or maybe a timestamp (rowversion) column to identify changed rows.

In this short article I'll show how to set up CDC to track data changes to the Production.Product table on the AdventureWorks2008R2 example database. This takes just two lines of code ! You can download the AdventureWorks2008R2 database from Codeplex if you don't already have it.

Steps to set up CDC

First of all we need to enable CDC on the current database (AdventureWorks2008R2) :
EXEC sys.sp_cdc_enable_db
Then we need to set up the CDC on the table we are interested in tracking changes on :
EXEC sys.sp_cdc_enable_table @source_schema='Production'
                            ,@source_name='Product'
                            ,@role_name=NULL
                            ,@capture_instance='SQLMattersDemo'
                            ,@captured_column_list='Name,Color,ProductID'
In this case I've chosen to track changes on the Production.Product table, on the Name, Color and ProductId columns. If I had left the @captured_column_list blank then all columns would have been tracked. Note that you do need to include the column (or columns) that make up the primary key so that the CDC mechanism can uniquely identify a row (though instead of a primary key it's also possible to specify a separate unique index for this using the @index_name argument). The @capture_instance is a name that identifies this cdc instance (more on this later).

That is all that is required to set up CDC !

Tracking Changed Data and CDC Metadata

Before we look at where the captured data changes are stored, I'll briefly show you how to review the current CDC settings on the database. The following commands will list which databases, tables and columns have CDC enabled :
-- List all databases and whether they have CDC enabled :
SELECT is_cdc_enabled, name FROM sys.databases
-- List metadata about cdc config on the current database :
EXEC sys.sp_cdc_help_change_data_capture
-- Can also see if cdc tracking is occuring on a table by checking sys.tables
SELECT name, is_tracked_by_cdc FROM sys.tables
-- Get information about the columns that are being captured for this capture instance
EXEC sys.sp_cdc_get_captured_columns @capture_instance='SQLMattersDemo'
Now that CDC is set up I'll show you what changes it has made to the database. If you look at the SQL Server agent jobs in Object Explorer there will be two new jobs as shown below. These are named cdc.<dbname>_capture and cdc.<dbname>_cleanup (replace <dbname> with the name of the database). Similarly if you open up 'System Tables' under 'Tables' then you'll see a number of new tables as shown below :
Change Data Capture (CDC) agent jobs Change Data Capture (CDC) system tables

However Microsoft recommend that these tables are not accessed directly and that the functions they provide are used instead, which I'll demonstrate below. First I'll make some changes to the table so we can see how they are picked up by CDC.

The code below updates three columns, only two of which are in the list of columns to be detected by CDC. I've run a select before and after so that we can see the changes.
-- Look at the current values
SELECT ProductID,Name,Color,ProductNumber FROM Production.Product WHERE ProductID=1
-- Update the values
UPDATE Production.Product SET Name='Fixed Race',Color='Red',ProductNumber='XX-1111' WHERE ProductID=1
-- Look at the updated values
SELECT ProductID,Name,Color,ProductNumber FROM Production.Product WHERE ProductID=1
The result, showing the data before and after the update, is below :
Change Data Capture (CDC) agent jobs

We can now move on to see how CDC has detected these changes by running the following code (note the use of the capture instance name from above) :
DECLARE @from_lsn binary(10), @to_lsn binary(10)
SET @from_lsn = sys.fn_cdc_get_min_lsn('SQLMattersDemo')
SET @to_lsn = sys.fn_cdc_get_max_lsn()

-- Get net changes
SELECT * FROM cdc.fn_cdc_get_net_changes_SQLMattersDemo(@from_lsn, @to_lsn, 'all')
-- Get all changes
SELECT * FROM cdc.fn_cdc_get_all_changes_SQLMattersDemo(@from_lsn, @to_lsn, 'all')
This gives the following results :
Change Data Capture (CDC) agent jobs

The first result gives the net result (i.e. if there are intermediate updates then only the final result is shown) and all changes (i.e. any intermediate updates are included). In this case there is only one update, which updates the Name, Color and ProductNumber columns. However only Name and Color are included in the columns for change detection, so the update to ProductNumber is not included in the results.

Possible Issues

1. The database owner is important for CDC, if the database owner does not exist you may get the following error :
Msg 22830, Level 16, State 1, Procedure sp_cdc_enable_db_internal, Line 186
Could not update the metadata that indicates database AdventureWorks2008R2 is enabled for Change Data Capture. The failure occurred when executing the command 'SetCDCTracked(Value = 1)'. The error returned was 15517: 'Cannot execute as the database principal because the principal "dbo" does not exist, this type of principal cannot be impersonated, or you do not have permission.'. Use the action and error to determine the cause of the failure and resubmit the request.
Msg 266, Level 16, State 2, Procedure sp_cdc_enable_db_internal, Line 0
Transaction count after EXECUTE indicates a mismatching number of BEGIN and COMMIT statements. Previous count = 0, current count = 1.
Msg 266, Level 16, State 2, Procedure sp_cdc_enable_db, Line 0
Transaction count after EXECUTE indicates a mismatching number of BEGIN and COMMIT statements. Previous count = 0, current count = 1.
Msg 3998, Level 16, State 1, Line 1
Uncommittable transaction is detected at the end of the batch. The transaction is rolled back.
The solution is to change the database owner to an existing login (using sp_changedbowner or ALTER AUTHORIZATION for instance).

2. If SQL Server agent is not running then you'll probably get the following message when you try and set up cdc on a table :
SQLServerAgent is not currently running so it cannot be notified of this action.
3. You must include the primary key column(s) in the columns to be captured, or alternatively specify a unique key in the @index_name parameter and include the columns that comprise the unique key in the columns to be captured. This is because CDC needs a way to uniquely identify each row in the table. If it can't uniquely identify each row you'll get an error like the following when trying to enable CDC on a table (using sp_cdc_enable_table) :
Msg 22943, Level 16, State 1, Procedure sp_cdc_enable_table_internal, Line 279 Columns used to uniquely identify a row for net change tracking must be included in the list of captured columns. Add either the primary key columns of the source table, or the columns defined for the index specified in the parameter @index_name to the list of captured columns and retry the operation.

Conclusion

I hope that gives a brief overview of CDC, though there is much more capability that I've not covered in this brief article. Although CDC will work well in many situations there are circumstances where it won't work. For instance it does need SQL Agent to be running and only works on the Enterprise, Developer or Evaluation editions (so not with Standard or Express). It also doesn't record who or when the changes were made (which will probably rule it out for auditing type requirements).

Related Articles

The following articles may also be of interest :

Link back to this article : http://www.sqlmatters.com/Articles/An Introduction to Change Data Capture (CDC).aspx

Keywords

SQL 2012, 2008, 2008R2,CDC,Change Data Capture,Data Warehouse,Delta


Comments

Post a comment   No login required !

 
Name :   Email :   Website :  
Will be displayed alongside your comment
Not displayed Optional, but displayed if entered