Here i am again with another chapter from Data Analysis with Python online course at cognitiveclass.ai. In this post we will deal with Panda for Importing and Exporting Data in Python.
Hope you have gone through our previous post in (Python starters Archives | Python R)
Data used in the course- the Automobile Dataset is an online source, and it is in CSV (comma separated values) format. Let’s use this dataset as an example to practice data reading. data source: https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data data type: csv
How to Import Pandas
we import Pandas as follows:
import pandas as pd
What is DataFrame?
DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table, or a dict of Series objects. It is generally the most commonly used pandas object.
DataFrame accepts many different kinds of input:
- Dict of 1D ndarrays, lists, dicts, or Series
- 2-D numpy.ndarray
- Structured or record ndarray
Along with the data, you can optionally pass index (row labels) and columns (column labels) arguments.
The structure of a dataframe
A B 0 1 2 3 1 1 2 3
A B C Y 1 2 3 Z 1 2 3
Read 10 Minutes to pandas tutorial.
To read any data using Python’s pandas package, there are two important factors to consider: format and file path.
Format is the way data is encoded. We can usually tell different encoding schemes by looking at the ending of the file name. Some common encodings are csv, json, xlsx, hdf and so forth.
The (file) path tells us where the data is stored. Usually it is stored either on the computer we are using, or online on the internet.
In pandas, the “read_csv()” method can read in files with columns separated by commas into a pandas DataFrame. Reading data in pandas can be done quickly in three lines.
First, import pandas.
Then define a variable with the file path.
And then use the read_csv method to import the data.
For example, if you would save the dataframe “df” as “automobile.csv” to your local machine, you may use the syntax below:
However, “read_csv” assumes that the data contains a header. But if data has no column headers, so we need to specify “read_csv” to not assign headers by setting header to “none”.
After reading the dataset, it is a good idea to look at the dataframe to get a better intuition and to ensure that everything occurred the way you expected. we can just use dataframe.head() to show the first n rows of the data frame. You can also use Describe shows a quick statistic summary of your data dataframe.describe().
Similarly, dataframe.tail(n) shows the bottom n rows of data frame.
To export your pandas dataframe to a new CSV file. You can do this using the method, ”to_csv()” To do this, specify the file path (which includes the filename) that you want to write to.
For example, if you would like to save the dataframe “df” as “automobile.csv” to your own computer, you can use the syntax: df.to_csv (“automobile.csv”).
Pandas also supports importing and exporting of most data filetypes with different dataset formats. The code syntax for reading and saving other data formats is very similar to read or save csv file.
Each column shows a different method to read and save files into a different format.
Read/Save Other Data Formats
Same as read/save csv file, we use similar methods to read/save other dataset formats: