7  Intro to Functions and Conditionals

import pandas as pd
pd.options.display.max_rows = 7

7.1 Intro

So far in this course you have mostly used functions written by others. In this lesson, you will learn how to write your own functions in Python.

7.2 Learning Objectives

By the end of this lesson, you will be able to:

  1. Create and use your own functions in Python.
  2. Design function arguments and set default values.
  3. Use conditional logic like if, elif, and else within functions.

7.3 Packages

Run the following code to install and load the packages needed for this lesson:

# Import packages
import pandas as pd
import numpy as np
import vega_datasets as vd

7.4 Basics of a Function

Let’s start by creating a very simple function. Consider the following function that converts pounds (a unit of weight) to kilograms (another unit of weight):

def pounds_to_kg(pounds):
    return pounds * 0.4536

If you execute this code, you will create a function named pounds_to_kg, which can be used directly in a script or in the console:

print(pounds_to_kg(150))
68.04

Let’s break down the structure of this first function step by step.

First, a function is created using the def keyword, followed by a pair of parentheses and a colon.

def function_name():
    # Function body

Inside the parentheses, we indicate the arguments of the function. Our function only takes one argument, which we have decided to name pounds. This is the value that we want to convert from pounds to kilograms.

def pounds_to_kg(pounds):
    # Function body

Of course, we could have named this argument anything we wanted. E.g. p or weight.

The next element, after the colon, is the body of the function. This is where we write the code that we want to execute when the function is called.

def pounds_to_kg(pounds):
    return pounds * 0.4536

We use the return statement to specify what value the function should output.

You could also assign the result to a variable and then return that variable:

def pounds_to_kg(pounds):
    kg = pounds * 0.4536
    return kg

This is a bit more wordy, but it makes the function clearer.

We can now use our function like this with a named argument:

pounds_to_kg(pounds=150)
68.04

Or without a named argument:

pounds_to_kg(150)
68.04

To use this in a DataFrame, you can create a new column:

pounds_df = pd.DataFrame({'pounds': [150, 200, 250]})
pounds_df['kg'] = pounds_to_kg(pounds_df['pounds'])
pounds_df
pounds kg
0 150 68.04
1 200 90.72
2 250 113.40

And that’s it! You have just created and usedyour first function in Python.

Practice

7.4.1 Age in Months Function

Create a simple function called years_to_months that transforms age in years to age in months.

Use it on the riots_df DataFrame imported below to create a new column called age_months:

riots_df = vd.data.la_riots()
riots_df 
first_name last_name age gender race death_date address neighborhood type longitude latitude
0 Cesar A. Aguilar 18.0 Male Latino 1992-04-30 2009 W. 6th St. Westlake Officer-involved shooting -118.273976 34.059281
1 George Alvarez 42.0 Male Latino 1992-05-01 Main & College streets Chinatown Not riot-related -118.234098 34.062690
2 Wilson Alvarez 40.0 Male Latino 1992-05-23 3100 Rosecrans Ave. Hawthorne Homicide -118.326816 33.901662
... ... ... ... ... ... ... ... ... ... ... ...
60 Elbert O. Wilkins 33.0 Male Black 1992-04-30 Western Avenue & 92nd Street Gramercy Park Homicide -118.310004 33.952767
61 John H. Willers 37.0 Male White 1992-04-29 10621 Sepulveda Blvd. Mission Hills Homicide -118.467770 34.263184
62 Willie Bernard Williams 29.0 Male Black 1992-04-29 Gage & Western avenues Chesterfield Square Death -118.308952 33.982363

63 rows × 11 columns

7.5 Functions with Multiple Arguments

Most functions take multiple arguments rather than just one. Let’s look at an example of a function that takes three arguments:

def calc_calories(carb_grams, protein_grams, fat_grams):
    result = (carb_grams * 4) + (protein_grams * 4) + (fat_grams * 9)
    return result

calc_calories(carb_grams=50, protein_grams=25, fat_grams=10)
390

The calc_calories function computes the total calories based on the grams of carbohydrates, protein, and fat. Carbohydrates and proteins are estimated to be 4 calories per gram, while fat is estimated to be 9 calories per gram.

If you attempt to use the function without supplying all the arguments, it will yield an error.

calc_calories(carb_grams=50, protein_grams=25)
TypeError: calc_calories() missing 1 required positional argument: 'fat_grams'

You can define default values for your function’s arguments. If an argument is called without a value assigned to it, then this argument assumes its default value. Let’s make all arguments optional by giving them all default values:

def calc_calories(carb_grams=0, protein_grams=0, fat_grams=0):
    result = (carb_grams * 4) + (protein_grams * 4) + (fat_grams * 9)
    return result

Now, we can call the function with only some arguments without getting an error:

calc_calories(carb_grams=50, protein_grams=25)
300

Let’s use this on a sample dataset:

food_df = pd.DataFrame({
    'food': ['Apple', 'Avocado'],
    'carb_grams': [25, 10],
    'protein_grams': [0, 1],
    'fat_grams': [0, 14]
})
food_df['calories'] = calc_calories(food_df['carb_grams'], food_df['protein_grams'], food_df['fat_grams'])
food_df
food carb_grams protein_grams fat_grams calories
0 Apple 25 0 0 100
1 Avocado 10 1 14 170
Practice

7.5.1 BMI Function

Create a function named calc_bmi that calculates the Body Mass Index (BMI) for one or more individuals, then apply the function by running the code chunk further below. The formula for BMI is weight (kg) divided by height (m) squared.

# Your code here
bmi_df = pd.DataFrame({
    'Weight': [70, 80, 100],  # in kg
    'Height': [1.7, 1.8, 1.2]  # in meters
})
bmi_df['BMI'] = calc_bmi(bmi_df['Weight'], bmi_df['Height'])
bmi_df

7.6 Intro to Conditionals: if, elif, and else

Conditional statements allow you to execute code only when certain conditions are met. The basic syntax in Python is:

if condition:
    # Code to execute if condition is True
elif another_condition:
    # Code to execute if the previous condition was False and this condition is True
else:
    # Code to execute if all previous conditions were False

Let’s look at an example of using conditionals within a function. Suppose we want to write a function that classifies a number as positive, negative, or zero.

def class_num(num):
    if num > 0:
        return "Positive"
    elif num < 0:
        return "Negative"
    else:
        return "Zero"

print(class_num(10))    # Output: Positive
print(class_num(-5))    # Output: Negative
print(class_num(0))     # Output: Zero
Positive
Negative
Zero

If you try to use this function the way we have done above for, for example the BMI function, you will get an error:

num_df = pd.DataFrame({'num': [10, -5, 0]})
num_df
num
0 10
1 -5
2 0
num_df['category'] = class_num(num_df['num'])
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

The reason for this is that if statements are not built to work with series (they are not inherently vectorized); but rather work with single values. To get around this, we can use the np.vectorize function to create a vectorized version of the function:

class_num_vec = np.vectorize(class_num)
num_df['category'] = class_num_vec(num_df['num'])
num_df
num category
0 10 Positive
1 -5 Negative
2 0 Zero

To get more practice with conditionals, let’s write a function that categorizes grades into simple categories:

  • If the grade is 85 or above, the category is ‘Excellent’.
  • If the grade is between 60 and 84, the category is ‘Pass’.
  • If the grade is below 60, the category is ‘Fail’.
  • If the grade is negative or invalid, return ‘Invalid grade’.
def categorize_grade(grade):
    if grade >= 85 and grade <= 100:
        return 'Excellent'
    elif grade >= 60 and grade < 85:
        return 'Pass'
    elif grade >= 0 and grade < 60:
        return 'Fail'
    else:
        return 'Invalid grade'

categorize_grade(95)  # Output: Excellent
'Excellent'

We can apply this function to a column in a DataFrame but first we need to vectorize it:

categorize_grade = np.vectorize(categorize_grade)
grades_df = pd.DataFrame({'grade': [95, 82, 76, 65, 58, -5]})
grades_df['grade_cat'] = categorize_grade(grades_df['grade'])
grades_df
grade grade_cat
0 95 Excellent
1 82 Pass
2 76 Pass
3 65 Pass
4 58 Fail
5 -5 Invalid grade
Practice

7.6.1 Age Categorization Function

Now, try writing a function that categorizes age into different life stages as described earlier. You should use the following criteria:

  • If the age is under 18, the category is ‘Minor’.
  • If the age is greater than or equal to 18 and less than 65, the category is ‘Adult’.
  • If the age is greater than or equal to 65, the category is ‘Senior’.
  • If the age is negative or invalid, return ‘Invalid age’.

Use it on the riots_df DataFrame printed below to create a new column called Age_Category.

# Your code here

riots_df = vd.data.la_riots()
riots_df
first_name last_name age gender race death_date address neighborhood type longitude latitude
0 Cesar A. Aguilar 18.0 Male Latino 1992-04-30 2009 W. 6th St. Westlake Officer-involved shooting -118.273976 34.059281
1 George Alvarez 42.0 Male Latino 1992-05-01 Main & College streets Chinatown Not riot-related -118.234098 34.062690
2 Wilson Alvarez 40.0 Male Latino 1992-05-23 3100 Rosecrans Ave. Hawthorne Homicide -118.326816 33.901662
... ... ... ... ... ... ... ... ... ... ... ...
60 Elbert O. Wilkins 33.0 Male Black 1992-04-30 Western Avenue & 92nd Street Gramercy Park Homicide -118.310004 33.952767
61 John H. Willers 37.0 Male White 1992-04-29 10621 Sepulveda Blvd. Mission Hills Homicide -118.467770 34.263184
62 Willie Bernard Williams 29.0 Male Black 1992-04-29 Gage & Western avenues Chesterfield Square Death -118.308952 33.982363

63 rows × 11 columns

Side Note

7.6.2 Apply vs Vectorize

Another way to use functions with if statements on a dataframe is to use the apply method. Here is how you can do the grade categorization function with apply:

grades_df['grade_cat'] = grades_df['grade'].apply(categorize_grade)
grades_df
grade grade_cat
0 95 Excellent
1 82 Pass
2 76 Pass
3 65 Pass
4 58 Fail
5 -5 Invalid grade

The vectorize method is easier to use with multiple arguments, but you will encounter the apply method further down the road.

7.7 Conclusion

In this lesson, we’ve introduced the basics of writing functions in Python and how to use conditional statements within those functions. Functions are essential building blocks in programming that allow you to encapsulate code for reuse and better organization. Conditional statements enable your functions to make decisions based on input values or other conditions.