Before you start
- This is a beginner level series
- You must be comfortable with python, I will be using python 3.5x
- exposure to jupyter notebooks and numpy is a plus
What is pandas?
pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real-world data analysis in Python. …from pandas documentation
In a more simpler way, pandas library provides us with a way to interact with our data which can be visualized as an excel worksheet, labelled and organized in rows and columns.
This post will cover
-
Installing and importing pandas
-
Basic data-structure in pandas
-
Using pandas Series
Installing and getting started
- option1: you can install pandas using pip install pandas
- option2: you can install Anaconda which is a collection of libraries for data and scientific computing. Pandas is included in the collection
You can import pandas just as you import any other module in python using the import statement.
import pandas as pd
You can also see the jupyter notebook on google colab
Building Blocks of Pandas
Creating a pandas Series object
-
creating series from python list
-
creating series from python dict
-
creating series from numpy ndarray
# importing pandas library import pandas as pd # Creating series object from lists # series has two elements data and labels myNums = pd.Series([1,2,3,4,5,], index=['a','b','c','d','e']) print("Pandas series object with labels declared explicitly") print(myNums) # If we dont provide labels pandas will automatically assign integer range labels myNums = pd.Series([1,2,3,4,5,]) print("Pandas series object") print(myNums)
import pandas as pd # Pandas series object from python dictionary mydict = {"row1": 12, "row2": 15, "row3": 20, "row4": 32} series_from_dict = pd.Series(mydict) print("Creating pandas series object from dict") print(series_from_dict)
Numpy and pandas work very well together. We can easily convert a numpy array into a panda object.
import pandas as pd import numpy as np numpy_array = np.random.randn(10) np_series = pd.Series(numpy_array) print("pandas object from numpy") print(np_series)
Accessing data from Series object
We can access the data in the series using square brackets like lists and numpy array. While accessing data we can either use the positional indexing or the labels. We can also perform the slice operation on series data object as with lists in python.
import pandas as pd fruit_series = pd.Series( [11,23,45,54,67], index = ['banana','orange', 'apple', 'strawberry', 'grapes']) # Accessing the data using the positional index print("price of banana: {}".format(fruit_series[0])) # Accessing the data using the label print("price of banana: {}".format(fruit_series["banana"])) # Accessing the data using slices print("first two fruits in the list") print(fruit_series[0:2])
Operations on Series
import pandas as pd # We are going to use the same fruit data fruits = pd.Series([11,23,45,54,67], index = ['banana','orange', 'apple', 'strawberry', 'grapes'], name = "fruits") # Scalar operations # Let's say we want to offer 10% discount on all our fruits. # in pandas we can use mathematical operations # with any scalar and that will be applied to every data item discounted_fruits = fruits * .9 print( "discounted price through scalar multiplication") print(discounted_fruits) # we can also store the discount amount in new series object ten_percent_discount = fruits * .1 # and subtract the two series as long as the lengths are equal discounted_price = fruits - ten_percent_discount print( "discounted price by subtracting two series obj") print(discounted_price)
more coming soon…