Tuesday, September 13, 2016

T-SQL: Tips and Tricks - How to pull most updated unique records

Sometimes, we have the requirement to pull the most recent updated record from our employee table or any customer tables. This is necessary because we try to take care of every single action which is performed through the business application. In this case, we have multiple records for any customer or employee but we need to show only most updated record of the employee/customer.
So, How can we do this in SQL? To understand the scenario, we are using customer table where created date is used to capture most updated records. In this case, we cannot use distinct keyword to pull the data because our data would be like as given below:
Use tempdb
Go

------ Design Customer table with the help of CTE
------ Where created date will used to pull most recent records
;with cte as
(
select FirstName='abc', LastName='yxz',
emailId='abc@xyz.com', CreatedDate=convert(varchar(10),getdate()-5 ,101)
union
select FirstName='abc', LastName='yxz',
emailId='abc@xyz.com', CreatedDate=convert(varchar(10),getdate()-2,101)
union
select FirstName='cde', LastName='mno',
emailId='cde@xyz.com', CreatedDate=convert(varchar(10),getdate()-5,101)
)
---- Pull all records
select FirstName, LastName,
emailId, CreatedDate
from cte

---- Output for all records
FirstName
LastName
emailId
CreatedDate
abc
yxz
abc@xyz.com
9/12/2016
abc
yxz
abc@xyz.com
9/9/2016
cde
mno
cde@xyz.com
9/9/2016
In the above table, you can see that there are duplicate records for abc customer due to having two created dates. Now, we want to pull the most recent record for abc customer. To do this, we will use the row_number() function with partition option of SQL to get the row id against each customer as given  below:
Use tempdb
Go

------ Design Customer table with the help of CTE
------ Where created date will used to pull most recent records
;with cte as
(
select FirstName='abc', LastName='yxz',
emailId='abc@xyz.com', CreatedDate=convert(varchar(10),getdate()-5 ,101)
union
select FirstName='abc', LastName='yxz',
emailId='abc@xyz.com', CreatedDate=convert(varchar(10),getdate()-2,101)
union
select FirstName='cde', LastName='mno',
emailId='cde@xyz.com', CreatedDate=convert(varchar(10),getdate()-5,101)
)
------ Pull all records and assign row id for each customer
select  FirstName, LastName, emailId, CreatedDate,
row_id from
(
select FirstName, LastName,
emailId, CreatedDate,
----- use order by CreatedDate desc
row_id=row_number() over (partition by emailId order by CreatedDate desc)
from cte
)tbl
---- Output for all records
FirstName
LastName
emailId
CreatedDate
row_id
abc
yxz
abc@xyz.com
9/12/2016
1
abc
yxz
abc@xyz.com
9/9/2016
2
cde
mno
cde@xyz.com
9/9/2016
1
In the above query, you can see that we have use cte table as derived data table and add new column as row_id to store data in created date as descending order because in this case we are forcing to generate the row id for most recent created date for every customer and partition by is used to start from 1 for every new customer.
Now, we need to pull the records for each customer where row_id=1 as given below:
Use tempdb
Go

------ Design Customer table with the help of CTE
------ Where created date will used to pull most recent records
;with cte as
(
select FirstName='abc', LastName='yxz',
emailId='abc@xyz.com', CreatedDate=convert(varchar(10),getdate()-5 ,101)
union
select FirstName='abc', LastName='yxz',
emailId='abc@xyz.com', CreatedDate=convert(varchar(10),getdate()-2,101)
union
select FirstName='cde', LastName='mno',
emailId='cde@xyz.com', CreatedDate=convert(varchar(10),getdate()-5,101)
)
------ Pull all records and assign row id for each customer
select  FirstName, LastName,
emailId, CreatedDate,
from
(
select FirstName, LastName,
emailId, CreatedDate,
----- use order by CreatedDate desc
row_id=row_number() over (partition by emailId order by CreatedDate desc)
from cte
)tbl
---- filter unique  data condition
---- and row_id=1 is the first record
where  row_id=1

---- Output for all records
FirstName
LastName
emailId
CreatedDate
abc
yxz
abc@xyz.com
9/12/2016
cde
mno
cde@xyz.com
9/9/2016
Conclusion
By using Row_Number() fuction with partition, we can assign them some number based on our requirement and filter all the unwanted data from the final output and store them in the persistent data table.

No comments:

Post a Comment