Friday, October 9, 2020

Python — Show unmatched rows from two dataframes

For an example, you have some users data in a dataframe-1 and you have to new users data in a dataframe-2, then you have to find out all the unmatched records from dataframe-2 by comparing with dataframe-1 and report to the business for the reason of these records. 

If you are working as a Python developer and you have to validate the existing data with new incoming datasets then it would not be an easy job for you.

So, we are here to show you the logic to get these unmatched records from two datasets/dataframes in Python.

# pandas library for data manipulation in python
import pandas as pd
#create NaN Values in Pandas DataFrame by numpy
import numpy as np

#creating dataframe-1
df1 = pd.DataFrame({
‘Name’: [‘Ryan’,’Rosy’,’Wills’,’Tom’,’Alice’,’Volter’,’Jay’,’John’,’Ronny’],
‘Age’: [25,26,14,19,22,28,30,32,28],
‘Height’: [189.0,193.0,200.0,155.0,165.0,170.0,172.0,156.0,165.0]})

#creating dataframe-2
df2 = pd.DataFrame({
‘Name’: [‘Ryan’,’Rosy’,’Wills’,’Tom’,’Alice’,np.nan,’Jay’,’John’,’Ronny’],
‘Age’: [25,26,14,0,22,28,30,32,28],
‘Height’: [189.0,np.nan,200.0,155.0,np.nan,170.0,172.0,156.0,165.0]})

Display Values from Dataframe -1 and Dataframe -2 Now, we have populated the both dataframes and these are the below values from dataframes -

Image for post

Unmatched rows from Dataframe-2 : Now, we have to find out all the unmatched rows from dataframe -2 by comparing with dataframe-1. For doing this, we can compare the Dataframes in an elementwise manner and get the indexes as given below:

# compare the Dataframes in an elementwise manner
indexes = (df1 != df2).any(axis=1)

and then check for those rows where any of the items differ from dataframe-2 as given below:

#looking unmatched indexes in dataframe-2
# and store unmatched rows in dataframe-3
df3 = df2.loc[indexes]

#displaying unmatched values from dataframe-2
df3

Image for post

Unmatched rows from Dataframe-1 : Now, we have to find out all the unmatched rows from dataframe -1 by comparing with dataframe-2.

#looking unmatched indexes in dataframe-1
# and store unmatched rows in dataframe-4
df4 = df1.loc[indexes]

#displaying unmatched values from dataframe-1
df4

Image for post

Unmatched rows from Dataframe-1 & Dataframe-2 if you want to display all the unmatched rows from the both dataframes, then you can also merge the unmatched rows from dataframe-1 and unmatched rows from dataframe-2 as given below :

#merge both unmatched dataframes by using outer join
df5=pd.merge(df3, df4,how=’outer’)
#display all unmatched rows
df5

Image for post

Now, you can see this is very easy task to find out the unmatched records from two dataframes by index comparing the dataframes in an elementwise. 

To Learn more, please visit our YouTube channel at - 
http://www.youtube.com/c/Sql-datatools
To Learn more, please visit our Instagram account at -
https://www.instagram.com/asp.mukesh/
To Learn more, please visit our twitter account at -
https://twitter.com/macxima
To Learn more, please visit our Medium account at -
https://medium.com/@macxima

No comments:

Post a Comment