Python Pandas Tutorial : Learn Pandas for Data Science

What is Python Pandas?

Python pandas is a software library specifically developed for data manipulation and analysis. Pandas is free software released under the three clause BSD license.Pandas provides two data structures like series and data frameand operations for manipulating numerical table and time series.


Python pandas is well suited for different kinds of data, such as:

  • • Tabular data with heterogeneously-typed columns
  • • Ordered and unordered time series data
  • • Arbitrary matrix data with row & column labels
  • • Unlabelled data
  • • Any other form of observational or statistical data sets

How to install Pandas?

To install Python Pandas, go to your command line/ terminal and type “pip install pandas” or else, if you have anaconda installed in your system, just type in “conda install pandas”. Once the installation is completed, go to your IDE (here I have used Jupyter Notebook) and simply import pandas library by typing:

#import pandas

import pandas as pd

#To get the current version of pandas
print(pd.__version__)

Output: 1.1.3
#Create a series from a list
list1=[10,20,30,40,50]
s1=pd.Series(list1)
print(s1)

Output:

0    10
1    20
2    30
3    40
4    50

dtype: int64

#Customize the index
list2=['a','b','c','d','e']
s2=pd.Series(list1,index=list2)
print(s2)

Output:

a    10
b    20
c    30
d    40
e    50

dtype: int64

#Create a series from random numbers
#To get random numbers by invoking randn()
import numpy as np
x=np.random.randn(10)
x

Output:

#Then create a Series by taking random numbers
s3=pd.Series(x)
s3

Output:

0    0.146392
1    -0.748931
2    0.525632
3    0.296724
4    -1.251444
5    0.070018
6    -0.465796
7    0.784062
8    0.763505
9    1.062141

dtype: float64

#Create a series from a dictionary
dict1={'a':100,'b':200,'c':300}
s4=pd.Series(dict1)
s4

Output:

a   100
b   200
c   300

dtype: int64

print(s1)

0    18
1    20
2    30
3    40
4    50

dtype: int64

#To get the minimum value
s1.min()
Output
10
#To get the max value
s1.max()
Output
50
#To get mean value
s1.mean()
Output
30.0
#To get the median value
s1.median()
Output
30.0
s5=pd.Series([10,20,30])
s5

Output

0    18
1    20
2    30

dtype: int64

#Add two series
s1.add(s5)

Output

0    20.0
1    40.00
2    60.00
3    NaN
4    NaN

dtype: float64

#subtract two series
s1.sub(s5)

Output

0    0.0
1    0.00
2    0.00
3    NaN
4    NaN

dtype: float64

#multiply two series
s1.mul(s5)

Output

0    100.0
1    400.00
2    900.00
3    NaN
4    NaN

dtype: float64

#Division of two series
s1.div(s5)

Output

0    1.0
1    1.00
2    1.00
3    NaN
4    NaN

dtype: float64

#Create a DataFrame from a List
list2=['java','python','php','ruby']
df1=pd.DataFrame(list2)
print(df1)

Output

   0
0    java
1    python
2    php
3    ruby

#Create a DataFrame from a dictionary
employee_detail={'eid':[101,102,103],'ename':['Rahul','Sachin','Sourav'],'esal'
	:[10000,2000,30000]}
df2=pd.DataFrame(employee_detail)
print(df2)

Output

   eid     ename     esal
0    101     Rahul     10000
1    101     Rahul     10000
2    102     Sachin     20000
3    103     Sourav     30000


#Create a DataFrame from a csv file
df3=pd.read_csv("E:\cipet\salary_info.csv")
df3.head()

Output

df3.tail()

Output

df3.shape

Output

(30, 2)

df3.info()

df3.describe()

Output


#Concatenating two data frames
dict1={'regdno':[101,102,103],'name':['Sachin','Sourav','Rahul'],
'branch':['CSE','IT','EEE']}
df4=pd.DataFrame(dict1,index=[0,1,2])
df4

Output


dict2={'regdno':[104,105,106],'name':['Akshay','Ajay','Abhay'],
'branch':['ETC','Civil','Mechanical']}
df5=pd.DataFrame(dict2,index=[3,4,5])
df5

Output

#concatenate two data frame
result=pd.concat([df4,df5])
print(result)

Output

    regdno     name     branch
0     101     Sachin     CSE
1     101     Sourav     IT
2     102     Rahul     EEE
3     103     Akshya     ETC
4     104     Ajay     Civil
5     105     Abhay     Mechanical

#merging two data frame
dict3={'regdno':[107,108,109],'name':['Lelin','Nirmal','Kalandi'],
'branch':['ETC','Civil','Mechanical']}
df6=pd.DataFrame(dict3)
df6

Output

dict4={'regdno':[107,108,109],'address':['BBSR','Cuttack','Paradeep'],
'age':[22,24,26]}
df7=pd.DataFrame(dict4)
df7

Output

result1=pd.merge(df6,df7,on='regdno')
print(result1)

Output

    regdno     name     branch     address     age
0     107     Lelin     ETC     BBSR     22
1     108     Nirmal     Civil     Cuttack     24
2     109     Kalandi     Mechanical     Paradeep     26

Download Python Pandas - Jupyter Notebook pdf Here


About the Author



Silan Software is one of the India's leading provider of offline & online training for Java, Python, AI (Machine Learning, Deep Learning), Data Science, Software Development & many more emerging Technologies.

We provide Academic Training || Industrial Training || Corporate Training || Internship || Java || Python || AI using Python || Data Science etc






 PreviousNext