This post will show how you can write a select query to select only unique rows from a table that has many duplicate rows. In other words, this post will show the query to select distinct rows from a table that has duplicate rows.
Let’s say we have a table duplicated_table that has the following rows.
|and so on|
Now, we can see that all the rows are repeated at least once and we need the output as shown below:
|and so on — Only unique rows will appear here — no duplicates|
First thing to observe here is that there is no unique column in the table. So we need something to uniquely identify the column in the duplicated table. For this, we can use the ROW_NUMBER() function of SQL server. ROW_NUMBER() returns a unique row number for the current row.
So now, the logic that we can use for our purpose is:
- Create a data source that will select all the required data that is grouped together by a column, along with a row number to each row
- From this dynamically created data source, select those columns that are required but with a row number of “1″
It is important to understand point number 2 above. The output of point number 1 will be as shown below:
|and so on|
The above table will be the output of the first point mentioned above. That is, create a data source that selects all the data along with the row number. Observe the “Row Number” column in the above table. Though the table has duplicate rows, each duplicate row has a unique row number.
Now, the only thing left to do is to select those rows from the above table that have row number = 1. And you will end up with the desired output.
Now let’s see the queries for both the steps.
For step – 1: Create a data source that selects all the data along with row number.
;with dataSrc as ( select ROW_NUMBER() over(partition by Name order by Name) as RowNum, ID, Name, email, age from duplicated_table )
For step – 2: Selecting those rows with row number = 1;
select Id, Name, email, age from dataSrc where RowNum = 1;
Final Complete Query :
;with dataSrc as ( select ROW_NUMBER() over(partition by Name order by Name) as RowNum, ID, Name, email, age from duplicated_table ) select Id, Name, email, age from dataSrc where RowNum = 1;
Hope this helps!