Change data capture (CDC) is a method for tracking changes to data values in tables. This is useful if, for instance, you want to identify
which rows of data have changed in order to load only changed rows into a data warehouse. Prior to the introduction of CDC you
would have typically had to do this by adding a trigger or maybe a timestamp (rowversion) column to identify changed rows.
In this short article I'll show how to set up CDC to track data changes to the Production.Product table on the AdventureWorks2008R2 example
database. This takes just two lines of code ! You can download the AdventureWorks2008R2 database from Codeplex if you don't already have it.
Steps to set up CDC
First of all we need to enable CDC on the current database (AdventureWorks2008R2) :
EXEC sys.sp_cdc_enable_db
Then we need to set up the CDC on the table we are interested in tracking changes on :
EXEC sys.sp_cdc_enable_table
@source_schema='Production'
,@source_name='Product'
,@role_name=NULL
,@capture_instance='SQLMattersDemo'
,@captured_column_list='Name,Color,ProductID'
In this case I've chosen to track changes on the Production.Product table, on the Name, Color and ProductId columns. If I had left the @captured_column_list
blank then all columns would have been tracked. Note that you do need to include the column (or columns) that make up the primary key so that the
CDC mechanism can uniquely identify a row (though instead of a primary key it's also possible to specify a separate unique index for this using
the @index_name argument). The @capture_instance is a name that identifies this cdc instance (more on this later).
That is all that is required to set up CDC !
Tracking Changed Data and CDC Metadata
Before we look at where the captured data changes are stored, I'll briefly show you how
to review the current CDC settings on the database. The following
commands will list which databases, tables and columns have CDC enabled :
-- List all databases and whether they have CDC enabled :
SELECT is_cdc_enabled,
name FROM sys.databases
-- List metadata about cdc config on the current database :
EXEC sys.sp_cdc_help_change_data_capture
-- Can also see if cdc tracking is occurring on a table by checking sys.tables
SELECT name,
is_tracked_by_cdc FROM sys.tables
-- Get information about the columns that are being captured for this capture instance
EXEC sys.sp_cdc_get_captured_columns
@capture_instance='SQLMattersDemo'
Now that CDC is set up I'll show you what changes it has made to the database. If you look at the SQL Server agent jobs in Object Explorer
there will be two new jobs as shown below. These are named cdc.<dbname>_capture and cdc.<dbname>_cleanup (replace <dbname> with the
name of the database). Similarly if you
open up 'System Tables' under 'Tables' then you'll see a number of new tables as shown below :
However Microsoft recommend that these tables are not accessed directly and that the functions they provide are used instead, which I'll demonstrate below.
First I'll make some changes to the table so we can see how they are picked up by CDC.
The code below updates three columns, only two of which are in the list of columns to be detected by CDC. I've
run a select before and after so that we can see the changes.
-- Look at the current values
SELECT ProductID,Name,Color,ProductNumber
FROM Production.Product WHERE ProductID=1
-- Update the values
UPDATE Production.Product SET Name='Fixed Race',Color='Red',ProductNumber='XX-1111' WHERE ProductID=1
-- Look at the updated values
SELECT ProductID,Name,Color,ProductNumber FROM Production.Product WHERE ProductID=1
The result, showing the data before and after the update, is below :
We can now move on to see how CDC has detected these changes by running the following code (note the use of the capture instance name from above) :
DECLARE @from_lsn binary(10),
@to_lsn binary(10)
SET @from_lsn = sys.fn_cdc_get_min_lsn('SQLMattersDemo')
SET @to_lsn = sys.fn_cdc_get_max_lsn()
-- Get net changes
SELECT * FROM cdc.fn_cdc_get_net_changes_SQLMattersDemo(@from_lsn, @to_lsn, 'all')
-- Get all changes
SELECT * FROM cdc.fn_cdc_get_all_changes_SQLMattersDemo(@from_lsn, @to_lsn, 'all')
This gives the following results :
The first result gives the net result (i.e. if there are intermediate updates then only the final result is shown) and
all changes (i.e. any intermediate updates are included). In this case there is only one update, which updates the Name, Color and ProductNumber columns. However
only Name and Color are included in the columns for change detection, so the update to ProductNumber is not included in the results.
Possible Issues
1. The database owner is important for CDC, if the database owner does not exist you may get the following error :
Msg 22830, Level 16, State 1, Procedure sp_cdc_enable_db_internal, Line 186
Could not update the metadata that indicates database AdventureWorks2008R2 is enabled for Change Data Capture. The failure occurred when executing the command 'SetCDCTracked(Value = 1)'. The error returned was 15517: 'Cannot execute as the database principal because the principal "dbo" does not exist, this type of principal cannot be impersonated, or you do not have permission.'. Use the action and error to determine the cause of the failure and resubmit the request.
Msg 266, Level 16, State 2, Procedure sp_cdc_enable_db_internal, Line 0
Transaction count after EXECUTE indicates a mismatching number of BEGIN and COMMIT statements. Previous count = 0, current count = 1.
Msg 266, Level 16, State 2, Procedure sp_cdc_enable_db, Line 0
Transaction count after EXECUTE indicates a mismatching number of BEGIN and COMMIT statements. Previous count = 0, current count = 1.
Msg 3998, Level 16, State 1, Line 1
Uncommittable transaction is detected at the end of the batch. The transaction is rolled back.
The solution is to change the database owner to an existing login (using
sp_changedbowner or
ALTER AUTHORIZATION for instance).
2. If SQL Server agent is not running then you'll probably get the following message when you try and set up cdc on a table :
SQLServerAgent is not currently running so it cannot be notified of this action.
3. You must include the primary key column(s) in the columns to be captured, or alternatively specify a unique key in the @index_name parameter
and include the columns that comprise the unique key in the columns to be captured. This is because CDC needs a way to uniquely identify each row in the table.
If it can't uniquely identify each row you'll get an error like the following when trying to enable CDC on a table (using sp_cdc_enable_table) :
Msg 22943, Level 16, State 1, Procedure sp_cdc_enable_table_internal, Line 279
Columns used to uniquely identify a row for net change tracking must be included in the list of captured columns. Add either the primary key columns of the source table, or the columns defined for the index specified in the parameter @index_name to the list of captured columns and retry the operation.
Conclusion
I hope that gives a brief overview of CDC, though there is much more capability that I've not covered in this brief article.
Although CDC will work well in many situations there are circumstances where it won't work. For instance it does need
SQL Agent to be running and only works on the Enterprise, Developer or Evaluation editions (so not with Standard or Express). It
also doesn't record who or when the changes were made (which will probably rule it out for auditing type requirements).