Thursday, October 22, 2020

Python- How to Merge two Dictionaries

Python Dictionaries
If you are working as Python developer, data analytics or data scientists for any organisation then it is very important for you to know how to merge two dictionaries.

In Python, Dictionaries are written with curly brackets{}, and they are the combination of keys and values (key: value). The basic use of dictionaries in Python to store data values like a map, which unlike other Data Types that hold only single value as an element. Dictionaries is known as a collection which is unordered, changeable and indexed. If you knows about your keys in dictionaries then you can optimized it to retrieve values in a better ways.

Suppose that you have to add two dictionaries x and y in Python as given below -

By creating a normal function — There is a very simple way to define a function in python by taking two dictionary variables as given below -

Now, you have to call the above function to merge two dictionaries as given below -

Efficient Way :Use lambda functionality — Some Python users think that lambdas are evil because lambda expressions are an odd and unfamiliar syntax to many Python programmers. But Lambdas are closures, which makes the space savings even bigger, since now you don’t have to add a parameterized constructor, fields, assignment code, etc to your replacement class.

You can write the fast and memory-efficient solution with one expression:

As suggested above, using two lines of code or writing a function is probably a better way to go.
To learn more, please follow us -
http://www.sql-datatools.com
To Learn more, please visit our YouTube channel at —
http://www.youtube.com/c/Sql-datatools
To Learn more, please visit our Instagram account at -
https://www.instagram.com/asp.mukesh/
To Learn more, please visit our twitter account at -
https://twitter.com/macxima
To Learn more, please visit our Medium account at -
https://medium.com/@macxima

Tuesday, October 20, 2020

Python — Read multiple SQL database tables into csv

If you are working as Python developer, data analytics or data scientists for any organisation then it is very important for you to know how to store and access your data such as you need to pull some database tables into csv or txt files for the further analytics process.
Image for post
Export all the database tables in the database into multiple csv files in python

If you want to dump each table in CSV format, it does call for a bit of code. Create a python loop to iterate through all the tables and then execute a SELECT query on each of those tables.

Relational databases are the most common storage used for web content, large business storage, and, most relevant, for data platforms. RDMSs will return results with all the same columns, so what you’re doing doesn’t really fit. What I would do is query the metadata table and get a list of all tables, then query each table and append that to the respective file.

Before we begin, you should make sure you have the necessary tools installed. The most important thing is to have a local version of SQL Server installed on your computer.

Package and Libraries — You need to have some Python packages and libraries to communicate your Python code with your source (SQL Server database tables) to destination (csv or txt files) areas as given below-
# pandas library for data manipulation in python
import pandas as pd
#pyodbc module that makes accessing ODBC databases
import pyodbc
#urllib module — allows you access websites via your program
import urllib
#SQLAlchemy SQL toolkit and object-relational mapper
from sqlalchemy import create_engine

Connecting to SQL Server Database — If you wish to communicate to the SQL Server, you need to use a type of client that speaks the database protocol described earlier. We’ll use pyodbc, an open source library that implements the SQL Server protocol as given below -

#create a database connection function
def db_connections():
quoted = urllib.parse.quote_plus(“Driver={SQL Server}; Server=server; Database=dbname; uid=sqluserid; pwd=password”)
engine = create_engine(‘mssql+pyodbc:///?odbc_connect={}’.format(quoted))
return engine
Define data tables structures Now, it would be very useful if you can create a method to define your data columns structures based on your data tables. For an example, you want to pull your Employee, Department, Project tables from your database then you can create a method to define the structures of those respective data tables as given below -

# function — define columns structures
# txt — table name
def df_cols(txt):
#check table name
if txt==’TB_Employee’:
df=pd.DataFrame(columns=[‘Emp_NO’,’Date_Of_Joining’,’Emp_Name’,’DOB’,’Address’,’DeptId’])
elif txt==”TB_Account”:
df=pd.DataFrame(columns=[‘Account_Key’,’AccountNum’,’AccountName’])
elif txt==”TB_Category_Dim”:
df=pd.DataFrame(columns=[‘Cat1Id’, ’Catgeory1', ’Cat2Id’, ’Catgeory2', ’Cat3Id’, ’Catgeory3'])
elif txt==”TB_Department”:
df=pd.DataFrame(columns=[‘DeptId’,’Dept_Name’,’DeptHead’])
elif txt==”TB_Contract_DIM”:
df=pd.DataFrame(columns=[‘Contract_Key’ ,’Contract Number’ ,’Contract’ ,’Contract Limit’ ,’Contract End Date’ ,’Line Type’])
elif txt==”TB_CostCenter_DIM”:
df=pd.DataFrame(columns=[‘CostCenter_Key’ ,’CostCenter’ ,’CostCenterDesc’])
elif txt==”TB_DATE_MAP”:
df=pd.DataFrame(columns=‘DayDate’ ,’DAYNO’ ,’WEEK_NO’ ,’YEAR_NO’ ,’FIN_WEEK_NO’ ,’FIN_PD_NO’ ,’FIN_YR_NO’,‘FIN_PD_WEEK_NO’ ,’AH_WEEK’,’FIN_QTR_NO’])
elif txt==”TB_Item_DIM”:
df=pd.DataFrame(columns=[‘Item_Key’,’ItemNo’,’Category1'])
elif txt==”TB_Project_Dim”:
df=pd.DataFrame(columns=[‘Project_Key’ ,’Project Number’ ,’Project Name’])
elif txt==”TB_Requestor_Dim”:
df=pd.DataFrame(columns=[‘Requestor_Key’,’Requestor Name’,’Requestor Location’])
elif txt==”TB_Supplier_DIM”:
df=pd.DataFrame(columns=[‘Supplier_Key’,’Supplier Number’,’Supplier Name’,’Supp Address Line 1',
‘Supp Address Line 2’,’Supp Address Line 3',’Supp City’,’Supp Post Code’,
‘Supp Country’,’Supplier Terms’])
return df

Create List for Data tables — Now, we have to create a static list of your data tables as given below —

#table list
MyList=[‘TB_Employee’,’TB_Account’,’TB_Category_Dim’,’TB_Department’, ‘TB_Contract_DIM’,’TB_CostCenter_DIM’,’TB_DATE_MAP’,
‘TB_Item_DIM’,’TB_Project_Dim’,’TB_Requestor_Dim’,’TB_Supplier_DIM’]

Define File Path  if you want to store these data files into any specific location then you can set a path for this as given below —

#file path to database tables
mypath=r’E:/MyPath/Database Files/’
#initaite database engine
engine=db_connections()
# set table variables
txt=’’

FOR Loop — Create a python loop to iterate through all the tables and then execute a SELECT query on each of those tables and set the table name in local variable to pull the columns structures and you already know the sql statements that you want to execute as given below —

#calling mylist items within for loop
for l in MyList:
print(l)
#set database table name into variable
txt=l
#set filepath with table name with csv extension
filepath=mypath+l+’.csv’
#set query for data pulling from db
sqlQuery=’SELECT * FROM DBO.’+l
#binding dataframe with respective data columns
df=df_cols(txt)
#pulling data into dataframe from database
df=pd.read_sql_query(sqlQuery, engine)
#drop indexes from dataframe
df=df.reset_index(drop=True)
#write dataframe into csv files on the defined location
df.to_csv(filepath,index=None, header=True)

Now, you can see this is very easy task to pull some database tables into csv or txt files by using Python. To learn more, please follow us -
To Learn more, please visit our YouTube channel at —
To Learn more, please visit our Instagram account at -
To Learn more, please visit our twitter account at -

Friday, October 16, 2020

Python — Retrieve matching rows from two Dataframes

This is the most common requirement to pull the common records from the two dataframes in Python if you are working as a Python developer/data analytics or data scientist for any organisation.

For an example, you have some users data in a dataframe-1 and you have to new users data in a dataframe-2, then you have to find out all the matched records from dataframe-2 and dataframe-1 by using pandas and retrieve matching rows and report to the business for the reason of these records.

So, we are here to show you the logic to get these matched records from two datasets/dataframes in Python.

# pandas library for data manipulation in python
import pandas as pd
#create NaN Values in Pandas DataFrame by numpy
import numpy as np
#creating dataframe-1
df1 = pd.DataFrame({
‘Name’: [‘Ryan’,’Rosy’,’Wills’,’Tom’,’Alice’,’Volter’,’Jay’,’John’,’Ronny’],
‘Age’: [25,26,14,19,22,28,30,32,28],
‘Height’: [189.0,193.0,200.0,155.0,165.0,170.0,172.0,156.0,165.0]})
#creating dataframe-2
df2 = pd.DataFrame({
‘Name’: [‘Ryan’,’Rosy’,’Wills’,’Tom’,’Alice’,np.nan,’Jay’,’John’,’Ronny’],
‘Age’: [25,26,14,0,22,28,30,32,28],
‘Height’: [189.0,np.nan,200.0,155.0,np.nan,170.0,172.0,156.0,165.0]})
Display Values from Dataframe -1 and Dataframe -2 Now, we have populated the both dataframes and these are the below values from dataframes -
Image for post
Verify the datatypes for each column in both dataframes — You have to check the datatypes of your columns and ensure they are the same, as we mentioned here —

# check datatypes for each column
df1 = df1.astype(df2.dtypes.to_dict())

How to pull the matched records? — Now, we have to find out all the matched or common rows from both dataframes by comparing through merge by right_index as given blow-
#matched rows through merge by right_index
commondf=pd.merge(df1,df2, on=[‘Name’,’Age’,’Height’], right_index=True)
#show common records
commondf
Now, you can see this is very easy task to find out the matched records from two dataframes through merge by right_index property.
To learn more, please follow us -
To Learn more, please visit our YouTube channel at - 
To Learn more, please visit our Instagram account at -
To Learn more, please visit our twitter account at -
To Learn more, please visit our Medium account at -


Friday, October 9, 2020

Python — Show unmatched rows from two dataframes

For an example, you have some users data in a dataframe-1 and you have to new users data in a dataframe-2, then you have to find out all the unmatched records from dataframe-2 by comparing with dataframe-1 and report to the business for the reason of these records. 

If you are working as a Python developer and you have to validate the existing data with new incoming datasets then it would not be an easy job for you.

So, we are here to show you the logic to get these unmatched records from two datasets/dataframes in Python.

# pandas library for data manipulation in python
import pandas as pd
#create NaN Values in Pandas DataFrame by numpy
import numpy as np

#creating dataframe-1
df1 = pd.DataFrame({
‘Name’: [‘Ryan’,’Rosy’,’Wills’,’Tom’,’Alice’,’Volter’,’Jay’,’John’,’Ronny’],
‘Age’: [25,26,14,19,22,28,30,32,28],
‘Height’: [189.0,193.0,200.0,155.0,165.0,170.0,172.0,156.0,165.0]})

#creating dataframe-2
df2 = pd.DataFrame({
‘Name’: [‘Ryan’,’Rosy’,’Wills’,’Tom’,’Alice’,np.nan,’Jay’,’John’,’Ronny’],
‘Age’: [25,26,14,0,22,28,30,32,28],
‘Height’: [189.0,np.nan,200.0,155.0,np.nan,170.0,172.0,156.0,165.0]})

Display Values from Dataframe -1 and Dataframe -2 Now, we have populated the both dataframes and these are the below values from dataframes -

Image for post

Unmatched rows from Dataframe-2 : Now, we have to find out all the unmatched rows from dataframe -2 by comparing with dataframe-1. For doing this, we can compare the Dataframes in an elementwise manner and get the indexes as given below:

# compare the Dataframes in an elementwise manner
indexes = (df1 != df2).any(axis=1)

and then check for those rows where any of the items differ from dataframe-2 as given below:

#looking unmatched indexes in dataframe-2
# and store unmatched rows in dataframe-3
df3 = df2.loc[indexes]

#displaying unmatched values from dataframe-2
df3

Image for post

Unmatched rows from Dataframe-1 : Now, we have to find out all the unmatched rows from dataframe -1 by comparing with dataframe-2.

#looking unmatched indexes in dataframe-1
# and store unmatched rows in dataframe-4
df4 = df1.loc[indexes]

#displaying unmatched values from dataframe-1
df4

Image for post

Unmatched rows from Dataframe-1 & Dataframe-2 if you want to display all the unmatched rows from the both dataframes, then you can also merge the unmatched rows from dataframe-1 and unmatched rows from dataframe-2 as given below :

#merge both unmatched dataframes by using outer join
df5=pd.merge(df3, df4,how=’outer’)
#display all unmatched rows
df5

Image for post

Now, you can see this is very easy task to find out the unmatched records from two dataframes by index comparing the dataframes in an elementwise. 

To Learn more, please visit our YouTube channel at - 
http://www.youtube.com/c/Sql-datatools
To Learn more, please visit our Instagram account at -
https://www.instagram.com/asp.mukesh/
To Learn more, please visit our twitter account at -
https://twitter.com/macxima
To Learn more, please visit our Medium account at -
https://medium.com/@macxima

Popular Posts