Friday, June 12, 2009

How to Identify and Delete Duplicate SQL Server Records

Recently, I was asked to help someone clean up their database after they had double loaded an import file. The problem they were having in identifying and deleting the duplicate information was the fact that a timestamp is applied to each row of data as it is inserted into the table. While the rest of the row of data was duplicated, the timestamp made the row unique. It was this uniqueness that caused the simple methods of determining and deleting duplicate data to fail. They needed a way to delete data from a table in which they determine the criteria of what made the data duplicate.

After helping them out with their problem, I decided to write a short article to show the simple solution I came up with to delete the duplicate data from a table, even if that data is considered unique by SQL Server. I know there are many ways to delete duplicate data, but bear with me as I explain my way. As always, if you have another way, great write it up and let us know about it. If not, look over these scripts and see if you can use them to create your own method.
Before I get into the example that actually deals with the described problem, I am going to start by showing a method to delete simple duplicate data for those who may be new to SQL Server and do not know how to clean up duplicate data from a table.
/**********************************************
Example of a simple duplicate data delete script.
**********************************************/
/**********************************************
Set up test environment
**********************************************/
See full detail:
http://www.sql-server-performance.com/articles/dba/delete_duplicates_p1.aspx

No comments: