Friday, July 11, 2008

Deleting duplicate records from a table

One of my recent task has required me to get rid of multiple duplicate records from a table. While doing that I came across a useful feature introduced in SQL Server 2005 which I would like to share with everyone.
Example : A CustomerOrder table has duplicate records of the same customerid and orderid. If you got records for 1 customer, you can directly run the sql query as follows:

with temp as (SELECT ROW_NUMBER ( ) OVER ( PARTITION BY CUSTOMERID, ORDERID ORDER BY ID ) AS RNUM FROM CUSTOMERORDER )
delete from temp where rnum > 1

If there are more than 1 customer, it's better to write a storedprocedure as follows:

CREATE PROCEDURE DeleteDuplicateRecords
-- Add the parameters for the stored procedure here
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;

-- Create a temporary table to store all those customer which have duplicate records

create table #temp(customerid,orderid,totalcount)

insert into #temp(customerid,orderid,totalcount)
select customerid,orderid,count(*)
from CustomerOrder
group by customerid, orderid
having count(*) > 1
order by count(*) desc

declare @cid int,
@orderid int

declare delete_cursor cursor for
select customerid,orderid from #temp

open delete_cursor
fetch next
from delete_cursor
into @cid, @orderid
while (@@fetch_status = 0 )
begin

with temp as (SELECT ROW_NUMBER ( ) OVER ( PARTITION BY CUSTOMERID, ORDERID ORDER BY ID ) AS RNUM FROM CUSTOMERORDER WHERE CustomerId = @cid and OrderId = @orderid )
delete from temp where rnum > 1

fetch next
from delete_cursor
into @cid,@orderid
end

close delete_cursor
deallocate delete_cursor
END
GO

ROWNUMBER() Returns the sequential number of a row within a partition of a result set, starting at 1 for the first row in each partition.
http://msdn.microsoft.com/en-us/library/ms186734.aspx
 
Google