In this tutorial, we will learn "Common Functions for Exploratory Data Analysis" in our Data Science processes by using Python.
Pandas is a Python package that provides fast, flexible, and expressive data structures designed to make working with structured (tabular, multidimensional, potentially heterogeneous) and time series data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python.
NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.
In the Data Science, in the most cases is not to explore the data but it is something about to analyze the data in some way, often through a model.
Pandas DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. It is similar to an excel spreadsheet or SQL table, or a dict of Series objects. It is generally the most commonly used pandas object. Like Series, DataFrame accepts many different kinds of input:
• Dict of 1D ndarrays, lists, dicts, or Series
• 2-D numpy.ndarray
• Structured or record ndarray
• A Series
• Another DataFrame
Along with the data, we can optionally pass index (row labels) and columns (column labels) arguments. After passing an index and / or columns, we are guaranteeing the index and / or columns of the resulting DataFrame. Thus, a dict of Series plus a specific index will discard all data not matching up to the passed index
1. Pandas.Dataframe.describe() is very informative function which is used to generate descriptive statistics of the data in a Pandas DataFrame or Series. It summarizes central tendency and dispersion of the dataset. describe() helps in getting a quick overview of the dataset.
2. Head and tail functions - If you want to view a small sample of a Series or DataFrame object, use the head() and tail() methods. The default number of elements to display is five, but you may pass a custom number.
import pandas as pd import numpy as np
#Create a series with
random numbers s =
pd.Series(np.random.randn(400))
#The first two rows
of the data series: print(s.head(2))
#The last two rows of
the data series: print(s.tail(2))
#Create a Dictionary
of series d =
{'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack']), 'Age':pd.Series([25,26,25,23,30,29,23]), 'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}
#Create a DataFrame df = pd.DataFrame(d)
#The first two rows
of the data frame print(df.head(2))
#The last two rows of
the data frame print(print
df.tail(2)) |
To learn more, please follow us -
http://www.sql-datatools.com
To Learn more, please visit our YouTube channel at -
http://www.youtube.com/c/Sql-datatools
To Learn more, please visit our Instagram account at -
https://www.instagram.com/asp.mukesh/
To Learn more, please visit our twitter account at -
https://twitter.com/macxima