introduction to python panda
Pandas are a general-purpose high-performance open-source python library which is used for the analysis of different data type. Pandas are using very powerful underlying data structure to process heavy data.
Pandas provide a variety of functionality.
1. Loading data from different data sources
2. Indexing and slicing of data
3. Transformation of data
4. Insert new data or delete existing data.
Before going next it is recommended that you should have knowledge of numpy. You can read it from our tutorial of numpy.
Data Structures in pandas
Pandas provide mainly 3 types of data Structure.  Pandas data structure is built on numpy so we can understand how fast they are.
1. Series
2. Dataframe
3. Panel
Series
It is a one-dimensional homogeneous array. Series data looks like one-dimensional array like below:
Numeric data: 1,5,8,6,7,9,11
String data: data1,data2,data3,data4,data5,data6,data7,data8
Dataframe
They are a two-dimensional tabular data structure which is capable of storing a heterogeneous data type with different columns.
Example for Dataframe is
| ID | Name | Age | Gender | 
| 1 | Katie | 35 | Female | 
| 101 | James | 28 | Male | 
| 306 | Steve | 21 | Male | 
| 406 | Lia | 44 | Female | 
Panels
They are Three dimensional labeled array. As panels are 3 dimensional, it is difficult to show an example here. We can say that panels are the containers for the data frame.
How to install panda/numpy?
1) pip install panda
2) pip install numpy
[root]# python Python 2.7.5 (default, Oct 30 2018, 23:45:53) [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import numpy >>> import panda >>>
Example1: Creating a simple series using numpy array.
#!/usr/bin/python #import the pandas and numpy library import pandas as pd import numpy as np # Creating a numpy array with some numeric data numericdata = np.array([100,200,300,400]) #Assigning value to series series = pd.Series(numericdata) print(series) ===========Output=========== 0 100 1 200 2 300 3 400 dtype: int64
In the above example, we can see the index is auto assign by pandas, But pandas provide flexibility to assign our own indexes.
#!/usr/bin/python #import the pandas and numpy library import pandas as pd import numpy as np # Creating a numpy array with some numeric data numericdata = np.array([100,200,300,400]) #Assigning value and index to series series = pd.Series(numericdata, index=[7,8,9,22]) print(series) ========output========== 7 100 8 200 9 300 22 400 dtype: int64
in the above example indexes are assigned manually.
Attributes of series.
Series provides the following attributes.
1. Axes
2. Size
3. Value
4. Head
5. Tail
6. empty
Let’s see an example of the above attributes:
#!/usr/bin/python
#import the pandas and numpy library
import pandas as pd
import numpy as np
# Creating a numpy array with some numeric data
numericdata = np.array([100,200,304,400])
#Assigning value to series
series = pd.Series(numericdata)
print("axis of series {0}".format(series.axes))
print("Is series empty {0}".format(series.empty))
print("Size of series {0}".format(series.size))
print("values in series {0}".format(series.values))
# head provides first n elements of series
print("Head of series {0}".format(series.head(2)))
# Tail provides last n elements of series
print("Tail of series {0}".format(series.tail(2)))
=======output=======
axis of series [RangeIndex(start=0, stop=4, step=1)]
Is series empty False
Size of series 4
values in series [100 200 304 400]
Head of series 0    100
1    200
dtype: int64
Tail of series 2    304
3    400
dtype: int64
Attributes of Data frames
Data frames provide the following attributes
1. Transpose
2. Empty
3. Shape
4. Size
5. Values
6. Head
7. Tail
Let’s take an example of all the above:
#!/usr/bin/python
#import the pandas and numpy library
import pandas as pd
import numpy as np
# Creating a numpy array with some numeric data
data = {"ID":[4,3,2,1],
"Name":["Name1","name2","name3","Name4"],
"Age":[10,20,30,40]
}
# Create a dataframe
df =pd.DataFrame(data)
print("Actual data")
print(df)
#Transpose of data
print("Transpose of data")
print(df.T)
# Axes are
print("Axes of data")
print(df.axes)
# empty tell us is data frame has data or not
print("Is empty ?")
print(df.empty)
# shape provides rows and columns contained in the data frame
print("Shape data")
print(df.shape)
# Head and tail get data from the top or bottom of the data frame
print("Top 2 records from data frame")
print(df.head(2))
print("last 2 records from dataframe")
print(df.tail(2))
=======output======
Actual data
   Age  ID   Name
0   10   4  Name1
1   20   3  name2
2   30   2  name3
3   40   1  Name4
Transpose of data
          0      1      2      3
Age      10     20     30     40
ID        4      3      2      1
Name  Name1  name2  name3  Name4
Axes of data
[RangeIndex(start=0, stop=4, step=1), Index([u'Age', u'ID', u'Name'], dtype='object')]
Is empty ?
False
Shape data
(4, 3)
Top 2 records from data frame
   Age  ID   Name
0   10   4  Name1
1   20   3  name2
last 2 records from dataframe
   Age  ID   Name
2   30   2  name3
3   40   1  Name4
Pandas provide some math function some are listed below:
1. sum()
2. count()
3. mean()
4. min()
5. max()
6. Prod
Working with text data
Pandas provide very good functionality to deal with text data. When we process any text data we need to split, sort, count and many such kinds of function which make our task easy.
Pandas provide these kinds of functions some of them are listed below:
1. lower()
2. upper()
3. len()
4. split()
5. replace()
6. count()
7. find()
8. Islower
9. Isupper
10. isnumeric()
Example of above functions:
!/usr/bin/python
#import the pandas and numpy library
import pandas as pd
s = pd.Series(['Raj kumar', 'Amit singh', 'John', 'katie', 20, '897456','Steve','smith'])
print("All text is in lower case")
print(s.str.lower())
print("All text is in upper case")
print(s.str.upper())
print("Length of each text in string")
print(s.str.len())
print("split string by space")
print(s.str.split(' '))
print("check string contains a specific data or not it return true and false")
print(s.str.contains(' '))
print("count a specific word and character in string ")
print(s.str.count('R'))
=======output==========
All text is in lower case
0     raj kumar
1    amit singh
2          john
3         katie
4           NaN
5        897456
6         steve
7         smith
dtype: object
All text is in upper case
0     RAJ KUMAR
1    AMIT SINGH
2          JOHN
3         KATIE
4           NaN
5        897456
6         STEVE
7         SMITH
dtype: object
Length of each text in string
0     9.0
1    10.0
2     4.0
3     5.0
4     NaN
5     6.0
6     5.0
7     5.0
dtype: float64
split string by space
0     [Raj, kumar]
1    [Amit, singh]
2           [John]
3          [katie]
4              NaN
5         [897456]
6          [Steve]
7          [smith]
dtype: object
check string contains a specific data or not it return true and false
0     True
1     True
2    False
3    False
4      NaN
5    False
6    False
7    False
dtype: object
count a specific word and character in string 
0    1.0
1    0.0
2    0.0
3    0.0
4    NaN
5    0.0
6    0.0
7    0.0
dtype: float64
SQL operations on pandas
We can perform many SQL operations on pandas some of the examples are listed below.
An example is mainly focused on Select, where, limit statement How these SQL Query can be used with a data frame with some tricks.
An example is below:
CSV File: (http://insight.dev.schoolwires.com/HelpAssets/C2Assets/C2Files/C2ImportGroupsSample.csv) GroupName,Groupcode ,GroupOwner,GroupCategoryID System Administrators,sysadmin,13456,100 Independence High Teachers,HS Teachers,,101 John Glenn Middle Teachers,MS Teachers,13458,102 Liberty Elementary Teachers,Elem Teachers,13559,103 1st Grade Teachers,1stgrade,,104 2nd Grade Teachers,2nsgrade,13561,105 3rd Grade Teachers,3rdgrade,13562,106 Guidance Department,guidance,,107 Independence Math Teachers,HS Math,13660,108 Independence English Teachers,HS English,13661,109 John Glenn 8th Grade Teachers,8thgrade,,110 John Glenn 7th Grade Teachers,7thgrade,13452,111 Elementary Parents,Elem Parents,,112 Middle School Parents,MS Parents,18001,113 High School Parents,HS Parents,18002,114
#!/usr/bin/python
#import the pandas and numpy library
import pandas as pd
# Read CSV
url = 'http://insight.dev.schoolwires.com/HelpAssets/C2Assets/C2Files/C2ImportGroupsSample.csv'
csvdata=pd.read_csv(url)
print(csvdata.head())
# select some specfic columns in pandas like
print("Select some specific columns")
print(csvdata[["GroupName","GroupOwner"]].head())
# Filter data like as Sql WHERE
print("Filer record")
print(csvdata[csvdata["GroupName"]=="System Administrators"])
# select top n rows like as SQL
print("Select only 2 rows like as SQL")
print(csvdata.head(2))
========output==========
                    GroupName     Groupcode   GroupOwner  GroupCategoryID 
0        System Administrators       sysadmin     13456.0               100
1   Independence High Teachers    HS Teachers         NaN               101
2   John Glenn Middle Teachers    MS Teachers     13458.0               102
3  Liberty Elementary Teachers  Elem Teachers     13559.0               103
4           1st Grade Teachers       1stgrade         NaN               104
Select some specific columns
                     GroupName  GroupOwner
0        System Administrators     13456.0
1   Independence High Teachers         NaN
2   John Glenn Middle Teachers     13458.0
3  Liberty Elementary Teachers     13559.0
4           1st Grade Teachers         NaN
Filer record
               GroupName Groupcode   GroupOwner  GroupCategoryID 
0  System Administrators   sysadmin     13456.0               100
Select only 2 rows like as SQL
                    GroupName   Groupcode   GroupOwner  GroupCategoryID 
0       System Administrators     sysadmin     13456.0               100
1  Independence High Teachers  HS Teachers         NaN               101
There is a variety of other functionality provided by pandas.
Like merging, grouping, missing data, Date functionality, time delta and many more.
The above article is mainly focused on the basics of pandas and how to start working with pandas.
For detail information, you can read the official document of pandas (https://pandas.pydata.org/pandas-docs/stable/)
