Python Tutorials | Pandas Methods to Analyze your Data

by | Oct 26, 2020 | Python | 0 comments

Python Pandas

Master Panda is at His Best

Our post ‘Pandas Methods to Analyze your Data’ is based on  Data Analysis with Python  online course at cognitiveclass.ai. 

In this post we will learn various Pandas Methods to Analyze your data. Hope you have gone through Importing and Exporting Data(CSV format) in Python- Pandas way  post, if you are not then please do, as this post is next in the series.

In this post we will use a very famous Iris Data set( Author- R.A. Fisher , “UCI Machine Learning Repository: Iris Data Set” ). 

First we will import the Data in our Jupytor Notebook:-

import pandas as pd

path=”http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data”

iris= pd.read_csv(path,header=None)

If you read the Data Description at source you will know

Attribute Information:
   1. sepal length in cm
   2. sepal width in cm
   3. petal length in cm
   4. petal width in cm
   5. class: 
      -- Iris Setosa
      -- Iris Versicolour
      -- Iris Virginica

We will add the coloumn headers to imported data.

headers = [“sepal_length”,”sepal_width”, “petal_length”, “petal_width”, “class1”]

iris.columns = headers

Iris.head(6) will yield:-

sepal_lengthsepal_widthpetal_lengthpetal_widthclass1
05.13.51.40.2Iris-setosa
14.93.01.40.2Iris-setosa
24.73.21.30.2Iris-setosa
34.63.11.50.2Iris-setosa
45.03.61.40.2Iris-setosa
55.43.91.70.4Iris-setosa

and iris.tail(5) will result:-

sepal_length                sepal_width          petal_length            petal_width          class1

1446.73.35.72.5Iris-virginica
1456.73.05.22.3Iris-virginica
1466.32.55.01.9Iris-virginica
1476.53.05.22.0Iris-virginica
1486.23.45.42.3Iris-virginica
1495.93.05.11.8Iris-virginica

Basic Insight of Dataset

After reading data into Pandas dataframe, it is time for us to explore the dataset a little bit. There are several ways to obtain essential insights of dataset, to help us better understand our dataset.

Data Types in Pandas

Data has variety of types. The main types stored in pandas objects are objectfloatintbooland datetime64.

Some Important Points:-

  1. The main types stored in Pandas objects are object, float, int, and datetime.
  2. The datatype names are somewhat different from those in native Python.
  3. Some are very similar, such as the numeric datatypes “int” and “float”.
  4. The “object” pandas type functions similar to “string” in Python, save for the change
  5. in name, while the “datetime” pandas type, is a very useful type for handling time series data.
  6. There are two reasons to check data types in a dataset. Pandas automatically assigns types based on the encoding it detects from the original data table.

In order to better learn about each attributes, it is always good for us to know the data type of each column. In Pandas:

dataframe.dtypes

returns a Series with the data type of each column.

iris.dtypes
sepal_length    float64
sepal_width     float64
petal_length    float64
petal_width     float64
class1           object
dtype: object

Describe

If we would like to check the statistical summary of each column, such as records count, column mean value, column standard deviation, etc.

dataframe.describe()

Generates various summary statistics, excluding NaN (Not a Number) values.

iris.describe()
sepal_lengthsepal_widthpetal_lengthpetal_width
count150.000000150.000000150.000000150.000000
mean5.8433333.0540003.7586671.198667
std0.8280660.4335941.7644200.763161
min4.3000002.0000001.0000000.100000
25%5.1000002.8000001.6000000.300000
50%5.8000003.0000004.3500001.300000
75%6.4000003.3000005.1000001.800000
max7.9000004.4000006.9000002.500000

To also check all the columns including other types (such as object) of data?

You can add an argument include = "all" inside the bracket.

iris.describe(include = “all”)

sepal_lengthsepal_widthpetal_lengthpetal_widthclass1
count150.000000150.000000150.000000150.000000150
uniqueNaNNaNNaNNaN3
topNaNNaNNaNNaNIris-virginica
freqNaNNaNNaNNaN50
mean5.8433333.0540003.7586671.198667NaN
std0.8280660.4335941.7644200.763161NaN
min4.3000002.0000001.0000000.100000NaN
25%5.1000002.8000001.6000000.300000NaN
50%5.8000003.0000004.3500001.300000NaN
75%6.4000003.3000005.1000001.800000NaN
max7.9000004.4000006.9000002.500000NaN

Now, it provides the statistic summary of all the columns, including object-typed attributes. For object-type columns, a different set of statistics is evaluated, like unique, top and frequency.

“Unique” is the number of distinct objects in the column, “top” is the most frequently occurring object, and “freq” is the number of times the top object appears in the column.

We can now see how many unique values, which is the top value and the frequency of top value in object-typed columns.

Some values in the table above show as “NaN”, that is because those number is not available regarding particular column type.

 iris[[‘sepal_length’]].describe() for a Particular colomn
sepal_length
count150.000000
mean5.843333
std0.828066
min4.300000
25%5.100000
50%5.800000
75%6.400000
max7.900000

Info Method

Another method you can use to check your dataset is:

dataframe.info

It provide a concise summary of your DataFrame. This function shows the top 30 rows and bottom 30 rows of the dataframe.

iris.info Check for yourself

For Advanced read:-

Practical data analysis with Python

This guide is a comprehensive introduction to the data analysis process using the Python data ecosystem and an interesting open dataset. There are four sections covering selected topics as follows:

Join Today

Know more about latest trends, News in Field of Data Analytics.

We will offer Free SAS and Python Programme in Data Science & enhance your understanding of data analysis.

By rlochan2021

We offer Everything Free here and always will be, so joining is Risk Free and Always Cost Free.

Check Out These Related Posts

Functions in Python Programming for Data Science

I am really impressed with two online courses one is “Analytics in Python” at edX by ColombiaX and another is “Python for Data Science” at cognitiveclass.ai or (Previously Bigdatauniversity). So i am going to document what ever i will learn about functions in these two courses. First there are Built-in functions(Ready made for use), such […]

read more

String Functions in SAS and Python

I always believe SAS and Python can make a great team together for Data Scientists. So Why Now we study them together. In this Post we will discuss various String functions in SAS and Python. String Functions in Python 3 Hope you must have gone through my previous post “Python Programming- Strings explained“. len(): returns […]

read more

0 Comments

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *