Basic python Programs
In this story, we are gonna see some questions on manipulation of some student's data and the solution will be composed of various python libraries such as pandas, regex, and numpy.
Question
Consider,
data=pd.DataFrame({‘names’:[‘tom’,’sam’,…],’email’:[‘tom21@gmail.com’,’samdr@yahoo.com’,’jk21456@abc.com’,..],’Firstweekscore’:[],’secondweekscore:[]})
1.Write a function that will create a new column consisting of an average of two scores
2.List comprehension → create another column which is consisting of scores 93 → 96
3.’gmail.com’ →regular expressions in pandas
4.Select rows which are having gmail address and also secondtestscore greater than 90
5.Create a new column ‘group’ and randomly assign values as 1,2 and 3
6.Create a pivot table having means of first test scores, by group
Solution
Let us see the solution step by step and I will share the whole code at the end of the story.
- Write a function that will create a new column consisting of an average of two scores
import pandas as pdimport numpy as npimport reimport randomdata = pd.DataFrame({‘name’:[‘nandha’,’tom’,’ram’,’jon’,’sam’,’robb’],”FirstWeekscore”:[92,82,91,93,91,95],”SecondWeekscore”:[91,81,95,93,92,96],’emailid’:[‘nandha12@abc.com’,’tom34@gmail.com’,’ram78@gmail.com’,’jon21@yahoo.com’,’sam11@gmail.com’,’robb@abc.com’]})data[‘average’] = [(data[“FirstWeekscore”][i]+data[“SecondWeekscore”][i])/2 for i in range (len(data[“FirstWeekscore”]))]
Here we are creating a data frame first and adding the average column using list comprehension.
2.List comprehension → create another column which is consisting of scores 93 → 96
import pandas as pdimport numpy as npimport reimport randomdata = pd.DataFrame({'name':['nandha','tom','ram','jon','sam','robb'],"FirstWeekscore":[92,82,91,93,91,95],"SecondWeekscore":[91,81,95,93,92,96],'emailid':['nandha12@abc.com','tom34@gmail.com','ram78@gmail.com','jon21@yahoo.com','sam11@gmail.com','robb@abc.com']})data['average'] = [(data["FirstWeekscore"][i]+data["SecondWeekscore"][i])/2 for i in range (len(data["FirstWeekscore"]))]data['ThirdWeekscore'] = [j+3 for j in data["SecondWeekscore"] ]print("Data after adding average and 3rd week score")
print(data)
The output will be like
3.’gmail.com’ →regular expressions in pandas
Here we have to find out and return the username of the student using gmail domains we are gonna use regex for this
import pandas as pdimport numpy as npimport reimport randomdata = pd.DataFrame({'name':['nandha','tom','ram','jon','sam','robb'],"FirstWeekscore":[92,82,91,93,91,95],"SecondWeekscore":[91,81,95,93,92,96],'emailid':['nandha12@abc.com','tom34@gmail.com','ram78@gmail.com','jon21@yahoo.com','sam11@gmail.com','robb@abc.com']})data['average'] = [(data["FirstWeekscore"][i]+data["SecondWeekscore"][i])/2 for i in range (len(data["FirstWeekscore"]))]data['ThirdWeekscore'] = [j+3 for j in data["SecondWeekscore"] ]print("Data after adding average and 3rd week score")print(data)
o=[]
print("Students using Gmail Domain are: ")for i in range(len(data['emailid'])):
x = data['emailid']
y = data['SecondWeekscore']
a = re.findall('@gmail.com', x[i])
if a != []:
c = re.split('@', x[i])
print(c[0])
The output for the three tasks be like
4.Select rows which are having gmail address and also secondtestscore greater than 90
import pandas as pdimport numpy as npimport reimport randomdata = pd.DataFrame({'name':['nandha','tom','ram','jon','sam','robb'],"FirstWeekscore":[92,82,91,93,91,95],"SecondWeekscore":[91,81,95,93,92,96],'emailid':['nandha12@abc.com','tom34@gmail.com','ram78@gmail.com','jon21@yahoo.com','sam11@gmail.com','robb@abc.com']})data['average'] = [(data["FirstWeekscore"][i]+data["SecondWeekscore"][i])/2 for i in range (len(data["FirstWeekscore"]))]data['ThirdWeekscore'] = [j+3 for j in data["SecondWeekscore"] ]print("Data after adding average and 3rd week score")print(data)o=[]
print("Students using Gmail Domain are: ")for i in range(len(data['emailid'])):
x = data['emailid']
y = data['SecondWeekscore']
a = re.findall('@gmail.com', x[i])
if a != []:
c = re.split('@', x[i])
print(c[0])
5.Create a new column ‘group’ and randomly assign values as 1,2 and 3
Here we are going to use random
6.Create a pivot table having means of first test scores, by group
This is done by a panda method called pd.pivot_table()
Full code
import pandas as pdimport numpy as npimport reimport randomdata = pd.DataFrame({'name':['nandha','tom','ram','jon','sam','robb'],"FirstWeekscore":[92,82,91,93,91,95],"SecondWeekscore":[91,81,95,93,92,96],'emailid':['nandha12@abc.com','tom34@gmail.com','ram78@gmail.com','jon21@yahoo.com','sam11@gmail.com','robb@abc.com']})data['average'] = [(data["FirstWeekscore"][i]+data["SecondWeekscore"][i])/2 for i in range (len(data["FirstWeekscore"]))]data['ThirdWeekscore'] = [j+3 for j in data["SecondWeekscore"] ]print("Data after adding average and 3rd week score")
print(data)
o=[]
print("Students using Gmail Domain are: ")for i in range(len(data['emailid'])):
x = data['emailid']
y = data['SecondWeekscore']
a = re.findall('@gmail.com', x[i])
if a != []:
c = re.split('@', x[i])
print(c[0])r = []
for i in range(6):
n = random.randint(1,3)
r.append(n)
data['group'] = r
print("Data after adding Group coloumn: ")
print(data)table = pd.pivot_table(data=data, values ='FirstWeekscore',index=["group"],aggfunc = np.mean)
print("Mean of 1st week score by group:")
print(table)
The final output is:
you may get different output because we are using random for assigning the values for the group column.
That’s it. Hope you found this article useful in giving you a solid grasp of how pandas, regex, and random are performed in Python. Thank you.