Remove Duplicate Records from Table in SQL Server
Introduction
SQL Server tables should never have duplicate rows, nor non-unique primary keys.Duplicate rows are a violation of entity integrity, and should be disallowed in a relational system. SQL Server has various techniques for enforcing entity integrity, including indexes, UNIQUE constraints, PRIMARY KEY constraints, and triggers.
Despite this, under unusual circumstances duplicate rows may occur, and if so they must be eliminated. One way they can occur is if duplicate row exist in non-relational data outside SQL Server, and the data is imported while row uniqueness is not being enforced. Another way they can occur is through a wrong database design, such as not enforcing entity integrity on each table.
This article shows how to locate and remove duplicate rows from a table.
Problem
Suppose we have a tblCustomer table have some duplicate rows as:
Customer_id | Customer_name | Customer_gender | Customer_age |
c0023 | abc | Male | 30 |
c0023 | abc | Male | 30 |
c0023 | abc | Male | 30 |
c0050 | xyz | Male | 45 |
c0050 | xyz | Male | 45 |
c0050 | xyz | Male | 45 |
c0022 | sdf | FeMale | 20 |
c0022 | sdf | FeMale | 20 |
In the above table we see that there are many duplicate rows.
Solution (I)
You can use ROW_NUMBER() with PARTITION BY to delete duplicate.
Use following CTE Query to delete duplicate record from the table tblCustomer:
DELETE FROM tmp WHERE tmp.duplicateCount>1
Solution (II)
The first step is to identify which rows have duplicate values:
This will return three row for each set of duplicate rows in the table. The last column in this result is the number of duplicates for the particular row.
Customer_id | Customer_name | Customer_gender | Customer_age | duplicatecount |
c0023 | abc | Male | 30 | 3 |
c0050 | xyz | Male | 45 | 3 |
c0022 | sdf | FeMale | 20 | 2 |
If there are only a few sets of duplicate rows, the best procedure is to delete these manually on an individual basis. For example:
The rowcount value should be n-1 the number of duplicates for a given row. In this example, there are 3 duplicates so rowcount is set to 3-1=2
If there are many distinct sets of duplicate rows in the table, it may be too time-consuming to remove them individually. In this case the following procedure can be used:
First, run the above GROUP BY query to determine how many sets of duplicate rows exist, and the count of duplicates for each set.
Select the distinct rows into a tmpCustomer table. For example:
At this point, the tmpCustomer table should have unique rows. Verify that each row in tmpCustomer is unique, and that you do not have duplicate rows. For example, the query:
should return a count of 1 for each row.
Delete the duplicate rows from the original table (tblCustomer). For example:
Put the unique rows back in the original table. For example:
Now the tblCustomer table looks as:
Customer_id | Customer_name | Customer_gender | Customer_age |
c0023 | abc | Male | 30 |
c0050 | xyz | Male | 45 |
c0022 | sdf | FeMale | 20 |
Conclusion
In this article we uses simple T-SQL to delete duplicate rows from a table by Solution (I) and Solution (II). Hopefully, this article will help you do that.