Saturday, August 29, 2020

Python - Transpose Dataframe Columns into Rows

Today, I was working with Python where I have to transpose some columns into rows to avoid a lot of calculations. As we know that Python has a lot of libraries and very strong communities support. That means, you can solve any problems with your dataset.

Here, I’m using a small dataset to show you that how can we use pandas library to transpose your dataframe.

In this example, I’m using class student’s dataset where each student has their subject in the columns with their obtained marks.

Now, we have to transpose subject columns into rows in the ‘Subject’ column and marks will be display in Marks column next to Subject within dataset-2.

Pandas melt() function is used to change the DataFrame format from wide to long. It’s used to create a specific format of the DataFrame object where one or more columns work as identifiers. All the remaining columns are treated as values and unpivoted to the row axis and only two columns — variable and value.

Here, we can see that with the help of Pandas library, we can transpose our dataset into the desired results.

 

#import Libraries

import pandas as pd

# Creating DataFrame from dict of narray/lists. intialise data of lists

list={'Name':['Ryan','Arjun','john','Rosy'],

     'Class':['IV','III','III','V'],    

     'English':[90,85,90,95],

     'Math':[95,90,85,80],

     'Science':[95,90,90,90],

     'Computer':[98,95,90,85],

     'Year':[2020,2020,2020,2020]}

 

# Create DataFrame from list/narray

df=pd.DataFrame(list)

 

#show data in the dataframe

df

======================================================

Name | Class | Year| English | Math |Science |Computer

------------------------------------------------------

Ryan |IV              | 2020 |               90           | 95          | 95     |98

Arjun|III              | 2020 |               85           | 90          | 90     |95

John |III               | 2020 |               90           | 85          | 90     |90

Rosy |V                | 2020 |               95           | 80          | 90     |85

======================================================

 

 

# function to unpivot the dataframe

df3=df.melt(['Name','Class','Year'], var_name='Subject')

 

#show data in the dataframe

df3

=======================================

  |Name | Class | Year|Subject  |value

---------------------------------------

0 |Ryan |  IV   |2020 |Computer| 98

1 |Arjun|  III  |2020 |Computer| 95

2 |john |  III  |2020 |Computer| 90

3 |Rosy |  V    |2020 |Computer| 85

4 |Ryan |  IV   |2020 |English | 90

5 |Arjun|  III  |2020 |English | 85

6 |john |  III  |2020 |English | 90

7 |Rosy |  V    |2020 |English | 95

8 |Ryan |  IV   |2020 |Math    | 95

9 |Arjun|  III  |2020 |Math    | 90

10|john |  III  |2020 |Math    | 85

11|Rosy |  V    |2020 |Math    | 80

12|Ryan |  IV   |2020 |Science | 95

13|Arjun|  III  |2020 |Science | 90

14|john |  III  |2020 |Science | 90

15|Rosy |  V    |2020 |Science | 90

=======================================

 

 

#rename value columns to Marks

df3=df3.rename(columns = {'value': 'Marks'}, inplace = False)

 

#show data in the dataframe

df3

=======================================

  |Name | Class | Year|Subject  |Marks

---------------------------------------

0 |Ryan |  IV   |2020 |Computer| 98

1 |Arjun|  III  |2020 |Computer| 95

2 |john |  III  |2020 |Computer| 90

3 |Rosy |  V    |2020 |Computer| 85

4 |Ryan |  IV   |2020 |English | 90

5 |Arjun|  III  |2020 |English | 85

6 |john |  III  |2020 |English | 90

7 |Rosy |  V    |2020 |English | 95

8 |Ryan |  IV   |2020 |Math    | 95

9 |Arjun|  III  |2020 |Math    | 90

10|john |  III  |2020 |Math    | 85

11|Rosy |  V    |2020 |Math    | 80

12|Ryan |  IV   |2020 |Science | 95

13|Arjun|  III  |2020 |Science | 90

14|john |  III  |2020 |Science | 90

15|Rosy |  V    |2020 |Science | 90

=======================================

 To learn more, please follow us -

http://www.sql-datatools.com

To Learn more, please visit our YouTube channel at - 

http://www.youtube.com/c/Sql-datatools

To Learn more, please visit our Instagram account at -

https://www.instagram.com/asp.mukesh/

To Learn more, please visit our twitter account at -

https://twitter.com/macxima

To Learn more, please visit our Medium account at -

https://medium.com/@macxima

Monday, August 24, 2020

Python — Filtering data with Pandas Dataframe

If you are working as Python developer where you have to accomplished a lot of data cleansing stuffs. One of the data cleansing stuff is to remove unwanted data from your dataframe. Pandas is one of the most important packages that makes importing and analyzing data much easier with the help of its strong library.

For analyzing data, a programmer requires a lot of filtering operations. Pandas provide many methods to filter a Data frame and Dataframe.query() is one of them.

To understand filtering feature of Pandas, we are creating some sample data by using list feature of Python.

In this example, dataframe has been filtered on multiple conditions.

# Import pandas library

import pandas as pd

 

# intialise data of lists.

data = {'Name':['Ryan Arjun', 'Kimmy Wang', 'Rose Gray', 'Will Smith'],

        'Age':[20, 21, 19, 18],

        'Country':['India','Taiwan','Canada','Greenland'],

        'Sex':['Male','Female','Female','Male']}

 

# Create DataFrame

df = pd.DataFrame(data)

 

#show data in the dataframe

df

=======================================

   Age |   Country |       Name   |  Sex

--------------------------------------- 

0   20 |     India | Ryan Arjun   |  Male

1   21 |    Taiwan | Kimmy Wang   |Female

2   19 |    Canada |  Rose Gray   |Female

3   18 | Greenland | Will Smith   | Male

=======================================

 

# filtering with query method

# Where sex must be male

# and Country must be India

# and age must be greater than 15

df.query('Sex =="Male" and Country =="India" and Age>15', inplace = True)

 

#show data in the dataframe

df

 

===================================

Age | Country   |     Name  | Sex

-----------------------------------

20  |India      |Ryan Arjun | Male

===================================

By using query feature of pandas in Python can save a lot of data processing time because we can use multiple filters conditions in a single go.

To learn more, please follow us -

http://www.sql-datatools.com

To Learn more, please visit our YouTube channel at - 

http://www.youtube.com/c/Sql-datatools

To Learn more, please visit our Instagram account at -

https://www.instagram.com/asp.mukesh/

To Learn more, please visit our twitter account at -

https://twitter.com/macxima

To Learn more, please visit our Medium account at -

https://medium.com/@macxima