import pandas as pd
= 7 pd.options.display.max_rows
7 Intro to Functions and Conditionals
7.1 Intro
So far in this course you have mostly used functions written by others. In this lesson, you will learn how to write your own functions in Python.
7.2 Learning Objectives
By the end of this lesson, you will be able to:
- Create and use your own functions in Python.
- Design function arguments and set default values.
- Use conditional logic like
if
,elif
, andelse
within functions.
7.3 Packages
Run the following code to install and load the packages needed for this lesson:
# Import packages
import pandas as pd
import numpy as np
import vega_datasets as vd
7.4 Basics of a Function
Let’s start by creating a very simple function. Consider the following function that converts pounds (a unit of weight) to kilograms (another unit of weight):
def pounds_to_kg(pounds):
return pounds * 0.4536
If you execute this code, you will create a function named pounds_to_kg
, which can be used directly in a script or in the console:
print(pounds_to_kg(150))
68.04
Let’s break down the structure of this first function step by step.
First, a function is created using the def
keyword, followed by a pair of parentheses and a colon.
def function_name():
# Function body
Inside the parentheses, we indicate the arguments of the function. Our function only takes one argument, which we have decided to name pounds
. This is the value that we want to convert from pounds to kilograms.
def pounds_to_kg(pounds):
# Function body
Of course, we could have named this argument anything we wanted. E.g. p
or weight
.
The next element, after the colon, is the body of the function. This is where we write the code that we want to execute when the function is called.
def pounds_to_kg(pounds):
return pounds * 0.4536
We use the return
statement to specify what value the function should output.
You could also assign the result to a variable and then return that variable:
def pounds_to_kg(pounds):
= pounds * 0.4536
kg return kg
This is a bit more wordy, but it makes the function clearer.
We can now use our function like this with a named argument:
=150) pounds_to_kg(pounds
68.04
Or without a named argument:
150) pounds_to_kg(
68.04
To use this in a DataFrame, you can create a new column:
= pd.DataFrame({'pounds': [150, 200, 250]})
pounds_df 'kg'] = pounds_to_kg(pounds_df['pounds'])
pounds_df[ pounds_df
pounds | kg | |
---|---|---|
0 | 150 | 68.04 |
1 | 200 | 90.72 |
2 | 250 | 113.40 |
And that’s it! You have just created and usedyour first function in Python.
7.4.1 Age in Months Function
Create a simple function called years_to_months
that transforms age in years to age in months.
Use it on the riots_df
DataFrame imported below to create a new column called age_months
:
= vd.data.la_riots()
riots_df riots_df
first_name | last_name | age | gender | race | death_date | address | neighborhood | type | longitude | latitude | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | Cesar A. | Aguilar | 18.0 | Male | Latino | 1992-04-30 | 2009 W. 6th St. | Westlake | Officer-involved shooting | -118.273976 | 34.059281 |
1 | George | Alvarez | 42.0 | Male | Latino | 1992-05-01 | Main & College streets | Chinatown | Not riot-related | -118.234098 | 34.062690 |
2 | Wilson | Alvarez | 40.0 | Male | Latino | 1992-05-23 | 3100 Rosecrans Ave. | Hawthorne | Homicide | -118.326816 | 33.901662 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
60 | Elbert O. | Wilkins | 33.0 | Male | Black | 1992-04-30 | Western Avenue & 92nd Street | Gramercy Park | Homicide | -118.310004 | 33.952767 |
61 | John H. | Willers | 37.0 | Male | White | 1992-04-29 | 10621 Sepulveda Blvd. | Mission Hills | Homicide | -118.467770 | 34.263184 |
62 | Willie Bernard | Williams | 29.0 | Male | Black | 1992-04-29 | Gage & Western avenues | Chesterfield Square | Death | -118.308952 | 33.982363 |
63 rows × 11 columns
7.5 Functions with Multiple Arguments
Most functions take multiple arguments rather than just one. Let’s look at an example of a function that takes three arguments:
def calc_calories(carb_grams, protein_grams, fat_grams):
= (carb_grams * 4) + (protein_grams * 4) + (fat_grams * 9)
result return result
=50, protein_grams=25, fat_grams=10) calc_calories(carb_grams
390
The calc_calories
function computes the total calories based on the grams of carbohydrates, protein, and fat. Carbohydrates and proteins are estimated to be 4 calories per gram, while fat is estimated to be 9 calories per gram.
If you attempt to use the function without supplying all the arguments, it will yield an error.
=50, protein_grams=25) calc_calories(carb_grams
TypeError: calc_calories() missing 1 required positional argument: 'fat_grams'
You can define default values for your function’s arguments. If an argument is called without a value assigned to it, then this argument assumes its default value. Let’s make all arguments optional by giving them all default values:
def calc_calories(carb_grams=0, protein_grams=0, fat_grams=0):
= (carb_grams * 4) + (protein_grams * 4) + (fat_grams * 9)
result return result
Now, we can call the function with only some arguments without getting an error:
=50, protein_grams=25) calc_calories(carb_grams
300
Let’s use this on a sample dataset:
= pd.DataFrame({
food_df 'food': ['Apple', 'Avocado'],
'carb_grams': [25, 10],
'protein_grams': [0, 1],
'fat_grams': [0, 14]
})'calories'] = calc_calories(food_df['carb_grams'], food_df['protein_grams'], food_df['fat_grams'])
food_df[ food_df
food | carb_grams | protein_grams | fat_grams | calories | |
---|---|---|---|---|---|
0 | Apple | 25 | 0 | 0 | 100 |
1 | Avocado | 10 | 1 | 14 | 170 |
7.5.1 BMI Function
Create a function named calc_bmi
that calculates the Body Mass Index (BMI) for one or more individuals, then apply the function by running the code chunk further below. The formula for BMI is weight (kg) divided by height (m) squared.
# Your code here
= pd.DataFrame({
bmi_df 'Weight': [70, 80, 100], # in kg
'Height': [1.7, 1.8, 1.2] # in meters
})'BMI'] = calc_bmi(bmi_df['Weight'], bmi_df['Height'])
bmi_df[ bmi_df
7.6 Intro to Conditionals: if
, elif
, and else
Conditional statements allow you to execute code only when certain conditions are met. The basic syntax in Python is:
if condition:
# Code to execute if condition is True
elif another_condition:
# Code to execute if the previous condition was False and this condition is True
else:
# Code to execute if all previous conditions were False
Let’s look at an example of using conditionals within a function. Suppose we want to write a function that classifies a number as positive, negative, or zero.
def class_num(num):
if num > 0:
return "Positive"
elif num < 0:
return "Negative"
else:
return "Zero"
print(class_num(10)) # Output: Positive
print(class_num(-5)) # Output: Negative
print(class_num(0)) # Output: Zero
Positive
Negative
Zero
If you try to use this function the way we have done above for, for example the BMI function, you will get an error:
= pd.DataFrame({'num': [10, -5, 0]})
num_df num_df
num | |
---|---|
0 | 10 |
1 | -5 |
2 | 0 |
'category'] = class_num(num_df['num']) num_df[
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
The reason for this is that if statements are not built to work with series (they are not inherently vectorized); but rather work with single values. To get around this, we can use the np.vectorize
function to create a vectorized version of the function:
= np.vectorize(class_num)
class_num_vec 'category'] = class_num_vec(num_df['num'])
num_df[ num_df
num | category | |
---|---|---|
0 | 10 | Positive |
1 | -5 | Negative |
2 | 0 | Zero |
To get more practice with conditionals, let’s write a function that categorizes grades into simple categories:
- If the grade is 85 or above, the category is ‘Excellent’.
- If the grade is between 60 and 84, the category is ‘Pass’.
- If the grade is below 60, the category is ‘Fail’.
- If the grade is negative or invalid, return ‘Invalid grade’.
def categorize_grade(grade):
if grade >= 85 and grade <= 100:
return 'Excellent'
elif grade >= 60 and grade < 85:
return 'Pass'
elif grade >= 0 and grade < 60:
return 'Fail'
else:
return 'Invalid grade'
95) # Output: Excellent categorize_grade(
'Excellent'
We can apply this function to a column in a DataFrame but first we need to vectorize it:
= np.vectorize(categorize_grade) categorize_grade
= pd.DataFrame({'grade': [95, 82, 76, 65, 58, -5]})
grades_df 'grade_cat'] = categorize_grade(grades_df['grade'])
grades_df[ grades_df
grade | grade_cat | |
---|---|---|
0 | 95 | Excellent |
1 | 82 | Pass |
2 | 76 | Pass |
3 | 65 | Pass |
4 | 58 | Fail |
5 | -5 | Invalid grade |
7.6.1 Age Categorization Function
Now, try writing a function that categorizes age into different life stages as described earlier. You should use the following criteria:
- If the age is under 18, the category is ‘Minor’.
- If the age is greater than or equal to 18 and less than 65, the category is ‘Adult’.
- If the age is greater than or equal to 65, the category is ‘Senior’.
- If the age is negative or invalid, return ‘Invalid age’.
Use it on the riots_df
DataFrame printed below to create a new column called Age_Category
.
# Your code here
= vd.data.la_riots()
riots_df riots_df
first_name | last_name | age | gender | race | death_date | address | neighborhood | type | longitude | latitude | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | Cesar A. | Aguilar | 18.0 | Male | Latino | 1992-04-30 | 2009 W. 6th St. | Westlake | Officer-involved shooting | -118.273976 | 34.059281 |
1 | George | Alvarez | 42.0 | Male | Latino | 1992-05-01 | Main & College streets | Chinatown | Not riot-related | -118.234098 | 34.062690 |
2 | Wilson | Alvarez | 40.0 | Male | Latino | 1992-05-23 | 3100 Rosecrans Ave. | Hawthorne | Homicide | -118.326816 | 33.901662 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
60 | Elbert O. | Wilkins | 33.0 | Male | Black | 1992-04-30 | Western Avenue & 92nd Street | Gramercy Park | Homicide | -118.310004 | 33.952767 |
61 | John H. | Willers | 37.0 | Male | White | 1992-04-29 | 10621 Sepulveda Blvd. | Mission Hills | Homicide | -118.467770 | 34.263184 |
62 | Willie Bernard | Williams | 29.0 | Male | Black | 1992-04-29 | Gage & Western avenues | Chesterfield Square | Death | -118.308952 | 33.982363 |
63 rows × 11 columns
7.6.2 Apply vs Vectorize
Another way to use functions with if statements on a dataframe is to use the apply
method. Here is how you can do the grade categorization function with apply
:
'grade_cat'] = grades_df['grade'].apply(categorize_grade)
grades_df[ grades_df
grade | grade_cat | |
---|---|---|
0 | 95 | Excellent |
1 | 82 | Pass |
2 | 76 | Pass |
3 | 65 | Pass |
4 | 58 | Fail |
5 | -5 | Invalid grade |
The vectorize
method is easier to use with multiple arguments, but you will encounter the apply
method further down the road.
7.7 Conclusion
In this lesson, we’ve introduced the basics of writing functions in Python and how to use conditional statements within those functions. Functions are essential building blocks in programming that allow you to encapsulate code for reuse and better organization. Conditional statements enable your functions to make decisions based on input values or other conditions.