Wednesday, November 18, 2020

Python — How to update multiple columns at once in dataframe?


W
orking as Python developer, data analysts or data scientists for any organisation then it is very important for you to know how to play with Dataframes. 



We understand, we can add a column to a dataframe and update its values to the values returned from a function or other dataframe column’s values as given below -

Problem — The actual function is returning two items, and I want to put these two items in two different new columns.

How to update multiple columns in Dataframe? If you want to update multiple columns in dataframe then you should make sure that these columns must be present in your dataframe. In case, updated columns are not in your dataframe, you have to create them as given below -

With the help of Pseudo code technique, we can update multiple columns at once. Pseudo code is a term which is often used in programming and algorithm based fields. It is a methodology that allows the programmer to represent the implementation of an algorithm.

Advantages of Pseudocode

  • Improves the readability of any approach. It’s one of the best approaches to start implementation of an algorithm.
  • Acts as a bridge between the program and the algorithm or flowchart. Also works as a rough documentation, so the program of one developer can be understood easily when a pseudo code is written out. In industries, the approach of documentation is essential. And that’s where a pseudo-code proves vital.
  • The main goal of a pseudo code is to explain what exactly each line of a program should do, hence making the code construction phase easier for the programmer.

Now, we have to create a function to return multiple values as given below -

Note: df.loc takes two parameters, a list of rows you want to work on — in this case — which means all of them, and a list of columns - [‘Square of Num’, ‘Cube of Num’].

We can call the function directly as given below —

Now, this example shows you, how to update multiple columns inside your dataframe. Keep in mind that if you’re munging data, you should most probably be using pandas because it has far more elegant tools than the pure Python workarounds.

To learn more, please follow us -
http://www.sql-datatools.com

To Learn more, please visit our YouTube channel at —
http://www.youtube.com/c/Sql-datatools

To Learn more, please visit our Instagram account at -
https://www.instagram.com/asp.mukesh/

To Learn more, please visit our twitter account at -
https://twitter.com/macxima


Saturday, October 24, 2020

Python — Finding the index of an item in a list



If you are working as Python developer, data analysts or data scientists for any organisation then it is very important for you to know how to play with Lists and get the requested info such as matching indexes or items from them.

If you want to get all the occurrences and the position of one or more items in a list by using Python then there are many ways but you need a very sufficient way to get the matching items from the list.

Let’s see the below example —

Now, you can see here, we have created a list of fruits and want to get the occurrence of Apple from the list and Python program will store this list as given below -
Image for post
Note — Python is a zero indexed based language

In the above list, you can see that Python based list is starting with 0 index and end with total item-1. In our cases, last index value in our fruit list will be 6–1 =5

Now, this point is How to get the occurrence of a item in the list. You could use a list comprehension with enumerate too such as given below -

Better data munging with pandas — If you have pandas, you can easily get this information with a Series object:

Now, we have converted list into series by using pandas library or package and we will be able to get a comparison check which will return a series of Booleans:

If you try to pass that series of Booleans to the series via subscript notation, and you get just the matching members:

In this case, if you want them in a list or tuple, just pass them to the constructor as given below:

Through For Loop — This is the another options to get the matching items from a given list, only for those coming from another language and may be with a simple loop it’s easier to understand and use it:

If you’re munging data, you should most probably be using pandas because it has far more elegant tools than the pure Python workarounds. Another fact is that, there are many ways to list down the matching indexes from the list but you must have to careful with your code efficiency.
To learn more, please follow us -
To Learn more, please visit our YouTube channel at - 
To Learn more, please visit our Instagram account at -
To Learn more, please visit our twitter account at -
To Learn more, please visit our Medium account at -

Thursday, October 22, 2020

Python- How to Merge two Dictionaries

Python Dictionaries
If you are working as Python developer, data analytics or data scientists for any organisation then it is very important for you to know how to merge two dictionaries.

In Python, Dictionaries are written with curly brackets{}, and they are the combination of keys and values (key: value). The basic use of dictionaries in Python to store data values like a map, which unlike other Data Types that hold only single value as an element. Dictionaries is known as a collection which is unordered, changeable and indexed. If you knows about your keys in dictionaries then you can optimized it to retrieve values in a better ways.

Suppose that you have to add two dictionaries x and y in Python as given below -

By creating a normal function — There is a very simple way to define a function in python by taking two dictionary variables as given below -

Now, you have to call the above function to merge two dictionaries as given below -

Efficient Way :Use lambda functionality — Some Python users think that lambdas are evil because lambda expressions are an odd and unfamiliar syntax to many Python programmers. But Lambdas are closures, which makes the space savings even bigger, since now you don’t have to add a parameterized constructor, fields, assignment code, etc to your replacement class.

You can write the fast and memory-efficient solution with one expression:

As suggested above, using two lines of code or writing a function is probably a better way to go.
To learn more, please follow us -
http://www.sql-datatools.com
To Learn more, please visit our YouTube channel at —
http://www.youtube.com/c/Sql-datatools
To Learn more, please visit our Instagram account at -
https://www.instagram.com/asp.mukesh/
To Learn more, please visit our twitter account at -
https://twitter.com/macxima
To Learn more, please visit our Medium account at -
https://medium.com/@macxima

Tuesday, October 20, 2020

Python — Read multiple SQL database tables into csv

If you are working as Python developer, data analytics or data scientists for any organisation then it is very important for you to know how to store and access your data such as you need to pull some database tables into csv or txt files for the further analytics process.
Image for post
Export all the database tables in the database into multiple csv files in python

If you want to dump each table in CSV format, it does call for a bit of code. Create a python loop to iterate through all the tables and then execute a SELECT query on each of those tables.

Relational databases are the most common storage used for web content, large business storage, and, most relevant, for data platforms. RDMSs will return results with all the same columns, so what you’re doing doesn’t really fit. What I would do is query the metadata table and get a list of all tables, then query each table and append that to the respective file.

Before we begin, you should make sure you have the necessary tools installed. The most important thing is to have a local version of SQL Server installed on your computer.

Package and Libraries — You need to have some Python packages and libraries to communicate your Python code with your source (SQL Server database tables) to destination (csv or txt files) areas as given below-
# pandas library for data manipulation in python
import pandas as pd
#pyodbc module that makes accessing ODBC databases
import pyodbc
#urllib module — allows you access websites via your program
import urllib
#SQLAlchemy SQL toolkit and object-relational mapper
from sqlalchemy import create_engine

Connecting to SQL Server Database — If you wish to communicate to the SQL Server, you need to use a type of client that speaks the database protocol described earlier. We’ll use pyodbc, an open source library that implements the SQL Server protocol as given below -

#create a database connection function
def db_connections():
quoted = urllib.parse.quote_plus(“Driver={SQL Server}; Server=server; Database=dbname; uid=sqluserid; pwd=password”)
engine = create_engine(‘mssql+pyodbc:///?odbc_connect={}’.format(quoted))
return engine
Define data tables structures Now, it would be very useful if you can create a method to define your data columns structures based on your data tables. For an example, you want to pull your Employee, Department, Project tables from your database then you can create a method to define the structures of those respective data tables as given below -

# function — define columns structures
# txt — table name
def df_cols(txt):
#check table name
if txt==’TB_Employee’:
df=pd.DataFrame(columns=[‘Emp_NO’,’Date_Of_Joining’,’Emp_Name’,’DOB’,’Address’,’DeptId’])
elif txt==”TB_Account”:
df=pd.DataFrame(columns=[‘Account_Key’,’AccountNum’,’AccountName’])
elif txt==”TB_Category_Dim”:
df=pd.DataFrame(columns=[‘Cat1Id’, ’Catgeory1', ’Cat2Id’, ’Catgeory2', ’Cat3Id’, ’Catgeory3'])
elif txt==”TB_Department”:
df=pd.DataFrame(columns=[‘DeptId’,’Dept_Name’,’DeptHead’])
elif txt==”TB_Contract_DIM”:
df=pd.DataFrame(columns=[‘Contract_Key’ ,’Contract Number’ ,’Contract’ ,’Contract Limit’ ,’Contract End Date’ ,’Line Type’])
elif txt==”TB_CostCenter_DIM”:
df=pd.DataFrame(columns=[‘CostCenter_Key’ ,’CostCenter’ ,’CostCenterDesc’])
elif txt==”TB_DATE_MAP”:
df=pd.DataFrame(columns=‘DayDate’ ,’DAYNO’ ,’WEEK_NO’ ,’YEAR_NO’ ,’FIN_WEEK_NO’ ,’FIN_PD_NO’ ,’FIN_YR_NO’,‘FIN_PD_WEEK_NO’ ,’AH_WEEK’,’FIN_QTR_NO’])
elif txt==”TB_Item_DIM”:
df=pd.DataFrame(columns=[‘Item_Key’,’ItemNo’,’Category1'])
elif txt==”TB_Project_Dim”:
df=pd.DataFrame(columns=[‘Project_Key’ ,’Project Number’ ,’Project Name’])
elif txt==”TB_Requestor_Dim”:
df=pd.DataFrame(columns=[‘Requestor_Key’,’Requestor Name’,’Requestor Location’])
elif txt==”TB_Supplier_DIM”:
df=pd.DataFrame(columns=[‘Supplier_Key’,’Supplier Number’,’Supplier Name’,’Supp Address Line 1',
‘Supp Address Line 2’,’Supp Address Line 3',’Supp City’,’Supp Post Code’,
‘Supp Country’,’Supplier Terms’])
return df

Create List for Data tables — Now, we have to create a static list of your data tables as given below —

#table list
MyList=[‘TB_Employee’,’TB_Account’,’TB_Category_Dim’,’TB_Department’, ‘TB_Contract_DIM’,’TB_CostCenter_DIM’,’TB_DATE_MAP’,
‘TB_Item_DIM’,’TB_Project_Dim’,’TB_Requestor_Dim’,’TB_Supplier_DIM’]

Define File Path  if you want to store these data files into any specific location then you can set a path for this as given below —

#file path to database tables
mypath=r’E:/MyPath/Database Files/’
#initaite database engine
engine=db_connections()
# set table variables
txt=’’

FOR Loop — Create a python loop to iterate through all the tables and then execute a SELECT query on each of those tables and set the table name in local variable to pull the columns structures and you already know the sql statements that you want to execute as given below —

#calling mylist items within for loop
for l in MyList:
print(l)
#set database table name into variable
txt=l
#set filepath with table name with csv extension
filepath=mypath+l+’.csv’
#set query for data pulling from db
sqlQuery=’SELECT * FROM DBO.’+l
#binding dataframe with respective data columns
df=df_cols(txt)
#pulling data into dataframe from database
df=pd.read_sql_query(sqlQuery, engine)
#drop indexes from dataframe
df=df.reset_index(drop=True)
#write dataframe into csv files on the defined location
df.to_csv(filepath,index=None, header=True)

Now, you can see this is very easy task to pull some database tables into csv or txt files by using Python. To learn more, please follow us -
To Learn more, please visit our YouTube channel at —
To Learn more, please visit our Instagram account at -
To Learn more, please visit our twitter account at -

Friday, October 16, 2020

Python — Retrieve matching rows from two Dataframes

This is the most common requirement to pull the common records from the two dataframes in Python if you are working as a Python developer/data analytics or data scientist for any organisation.

For an example, you have some users data in a dataframe-1 and you have to new users data in a dataframe-2, then you have to find out all the matched records from dataframe-2 and dataframe-1 by using pandas and retrieve matching rows and report to the business for the reason of these records.

So, we are here to show you the logic to get these matched records from two datasets/dataframes in Python.

# pandas library for data manipulation in python
import pandas as pd
#create NaN Values in Pandas DataFrame by numpy
import numpy as np
#creating dataframe-1
df1 = pd.DataFrame({
‘Name’: [‘Ryan’,’Rosy’,’Wills’,’Tom’,’Alice’,’Volter’,’Jay’,’John’,’Ronny’],
‘Age’: [25,26,14,19,22,28,30,32,28],
‘Height’: [189.0,193.0,200.0,155.0,165.0,170.0,172.0,156.0,165.0]})
#creating dataframe-2
df2 = pd.DataFrame({
‘Name’: [‘Ryan’,’Rosy’,’Wills’,’Tom’,’Alice’,np.nan,’Jay’,’John’,’Ronny’],
‘Age’: [25,26,14,0,22,28,30,32,28],
‘Height’: [189.0,np.nan,200.0,155.0,np.nan,170.0,172.0,156.0,165.0]})
Display Values from Dataframe -1 and Dataframe -2 Now, we have populated the both dataframes and these are the below values from dataframes -
Image for post
Verify the datatypes for each column in both dataframes — You have to check the datatypes of your columns and ensure they are the same, as we mentioned here —

# check datatypes for each column
df1 = df1.astype(df2.dtypes.to_dict())

How to pull the matched records? — Now, we have to find out all the matched or common rows from both dataframes by comparing through merge by right_index as given blow-
#matched rows through merge by right_index
commondf=pd.merge(df1,df2, on=[‘Name’,’Age’,’Height’], right_index=True)
#show common records
commondf
Now, you can see this is very easy task to find out the matched records from two dataframes through merge by right_index property.
To learn more, please follow us -
To Learn more, please visit our YouTube channel at - 
To Learn more, please visit our Instagram account at -
To Learn more, please visit our twitter account at -
To Learn more, please visit our Medium account at -


Friday, October 9, 2020

Python — Show unmatched rows from two dataframes

For an example, you have some users data in a dataframe-1 and you have to new users data in a dataframe-2, then you have to find out all the unmatched records from dataframe-2 by comparing with dataframe-1 and report to the business for the reason of these records. 

If you are working as a Python developer and you have to validate the existing data with new incoming datasets then it would not be an easy job for you.

So, we are here to show you the logic to get these unmatched records from two datasets/dataframes in Python.

# pandas library for data manipulation in python
import pandas as pd
#create NaN Values in Pandas DataFrame by numpy
import numpy as np

#creating dataframe-1
df1 = pd.DataFrame({
‘Name’: [‘Ryan’,’Rosy’,’Wills’,’Tom’,’Alice’,’Volter’,’Jay’,’John’,’Ronny’],
‘Age’: [25,26,14,19,22,28,30,32,28],
‘Height’: [189.0,193.0,200.0,155.0,165.0,170.0,172.0,156.0,165.0]})

#creating dataframe-2
df2 = pd.DataFrame({
‘Name’: [‘Ryan’,’Rosy’,’Wills’,’Tom’,’Alice’,np.nan,’Jay’,’John’,’Ronny’],
‘Age’: [25,26,14,0,22,28,30,32,28],
‘Height’: [189.0,np.nan,200.0,155.0,np.nan,170.0,172.0,156.0,165.0]})

Display Values from Dataframe -1 and Dataframe -2 Now, we have populated the both dataframes and these are the below values from dataframes -

Image for post

Unmatched rows from Dataframe-2 : Now, we have to find out all the unmatched rows from dataframe -2 by comparing with dataframe-1. For doing this, we can compare the Dataframes in an elementwise manner and get the indexes as given below:

# compare the Dataframes in an elementwise manner
indexes = (df1 != df2).any(axis=1)

and then check for those rows where any of the items differ from dataframe-2 as given below:

#looking unmatched indexes in dataframe-2
# and store unmatched rows in dataframe-3
df3 = df2.loc[indexes]

#displaying unmatched values from dataframe-2
df3

Image for post

Unmatched rows from Dataframe-1 : Now, we have to find out all the unmatched rows from dataframe -1 by comparing with dataframe-2.

#looking unmatched indexes in dataframe-1
# and store unmatched rows in dataframe-4
df4 = df1.loc[indexes]

#displaying unmatched values from dataframe-1
df4

Image for post

Unmatched rows from Dataframe-1 & Dataframe-2 if you want to display all the unmatched rows from the both dataframes, then you can also merge the unmatched rows from dataframe-1 and unmatched rows from dataframe-2 as given below :

#merge both unmatched dataframes by using outer join
df5=pd.merge(df3, df4,how=’outer’)
#display all unmatched rows
df5

Image for post

Now, you can see this is very easy task to find out the unmatched records from two dataframes by index comparing the dataframes in an elementwise. 

To Learn more, please visit our YouTube channel at - 
http://www.youtube.com/c/Sql-datatools
To Learn more, please visit our Instagram account at -
https://www.instagram.com/asp.mukesh/
To Learn more, please visit our twitter account at -
https://twitter.com/macxima
To Learn more, please visit our Medium account at -
https://medium.com/@macxima

Popular Posts