import pandas as pd
5 Data Structures in Python
5.1 Intro
So far in our Python explorations, we’ve been working with simple, single values, like numbers and strings. But, as you know, data usually comes in the form of larger structures. The structure most familiar to you is a table, with rows and columns.
In this lesson, we’re going to explore the building blocks for organizing data in Python, building up through lists, dictionaries, series, and finally tables, or, more formally,dataframes.
Let’s dive in!
5.2 Learning objectives
- Create and work with Python lists and dictionaries
- Understand and use Pandas Series
- Explore Pandas DataFrames for organizing structured data
5.3 Imports
We need pandas for this lesson. You can import it like this:
If you get an error, you probably need to install it. You can do this by running !pip install pandas
in a cell.
5.4 Python Lists
Lists are like ordered containers that can hold different types of information. For example, you might have a list of things to buy:
= ["apples", "bananas", "milk", "bread"]
shopping shopping
['apples', 'bananas', 'milk', 'bread']
In Python, we use something called “zero-based indexing” to access items in a list. This means we start counting positions from 0, not 1.
Let’s see some examples:
print(shopping[0]) # First item (remember, we start at 0!)
print(shopping[1]) # Second item
print(shopping[2]) # Third item
apples
bananas
milk
It might seem odd at first, but it’s a common practice in many programming languages. It has to do with how computers store information, and the ease of writing algorithms.
We can change the contents of a list after we’ve created it, using the same indexing system.
1] = "oranges" # Replace the second item (at index 1)
shopping[ shopping
['apples', 'oranges', 'milk', 'bread']
There are many methods accessible to lists. For example, we can add elements to a list using the append()
method.
"eggs")
shopping.append( shopping
['apples', 'oranges', 'milk', 'bread', 'eggs']
In the initial stages of your Python data journey, you may not work with lists too often, so we’ll keep this intro brief.
5.4.1 Practice: Working with Lists
- Create a list called
temps
with these values: 1,2,3,4 - Print the first element of the list
- Change the last element to 6
# Your code here
5.5 Python Dictionaries
Dictionaries are like labeled storage boxes for your data. Each piece of data (value) has a unique label (key). Below, we have a dictionary of grades for some students.
= {"Alice": 90, "Bob": 85, "Charlie": 92}
grades grades
{'Alice': 90, 'Bob': 85, 'Charlie': 92}
As you can see, dictionaries are defined using curly braces {}
, with keys and values separated by colons :
, and the key-value pairs are separated by commas.
We use the key to get the associated value.
"Bob"] grades[
85
5.5.1 Adding/Modifying Entries
We can easily add new information or change existing data in a dictionary.
"David"] = 88 # Add a new student
grades[ grades
{'Alice': 90, 'Bob': 85, 'Charlie': 92, 'David': 88}
"Alice"] = 95 # Update Alice's grade
grades[ grades
{'Alice': 95, 'Bob': 85, 'Charlie': 92, 'David': 88}
5.5.2 Practice: Working with Dictionaries
- Create a dictionary called
prices
with these pairs: “apple”: 0.50, “banana”: 0.25, “orange”: 0.75 - Print the price of an orange by using the key
- Add a new fruit “grape” with a price of 1.5
- Change the price of “banana” to 0.30
# Your code here
5.6 Pandas Series
Pandas provides a data structure called a Series that is similar to a list, but with additional features that are particularly useful for data analysis.
Let’s create a simple Series:
= pd.Series([1, 2, 3, 4, 5])
temps temps
0 1
1 2
2 3
3 4
4 5
dtype: int64
We can use built-in Series methods to calculate summary statistics.
temps.mean()
temps.median() temps.std()
np.float64(1.5811388300841898)
An important feature of Series is that they can have a custom index for intuitive access.
= pd.Series([1, 2, 3, 4], index=['Mon', 'Tue', 'Wed', 'Thu'])
temps_labeled
temps_labeled'Wed'] temps_labeled[
np.int64(3)
This makes them similar to dictionaries.
5.6.1 Practice: Working with Series
- Create a Series called
rain
with these values: 5, 4, 3, 2 - Get the mean and median rainfall
# Your code here
5.7 Pandas DataFrames
Next up, let’s consider Pandas DataFrames, which are like Series but in two dimensions - think spreadsheets or database tables.
This is the most important data structure for data analysis.
A DataFrame is like a spreadsheet in Python. It has rows and columns, making it perfect for organizing structured data.
Most of the time, you will be importing external data frames, but you should know how to data frames from scratch within Python as well.
Let’s create three lists first:
# Create three lists
= ["Alice", "Bob", "Charlie"]
names = [25, 30, 28]
ages = ["Lagos", "London", "Lima"] cities
Then we combined them into a dictionary, and finally into a dataframe.
= {'name': names,
data 'age': ages,
'city': cities}
= pd.DataFrame(data)
people_df people_df
name | age | city | |
---|---|---|---|
0 | Alice | 25 | Lagos |
1 | Bob | 30 | London |
2 | Charlie | 28 | Lima |
Note that we could have created the dataframe without the intermediate series:
= pd.DataFrame(
people_df
{"name": ["Alice", "Bob", "Charlie"],
"age": [25, 30, 28],
"city": ["Lagos", "London", "Lima"],
}
) people_df
name | age | city | |
---|---|---|---|
0 | Alice | 25 | Lagos |
1 | Bob | 30 | London |
2 | Charlie | 28 | Lima |
We can select specific columns or rows from our DataFrame.
"city"] # Selecting a column. Note that this returns a Series.
people_df[0] # Selecting a row by its label. This also returns a Series. people_df.loc[
name Alice
age 25
city Lagos
Name: 0, dtype: object
We can call methods on the dataframe.
# This is a summary of the numerical columns
people_df.describe() # This is a summary of the data types people_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 name 3 non-null object
1 age 3 non-null int64
2 city 3 non-null object
dtypes: int64(1), object(2)
memory usage: 204.0+ bytes
And we can call methods on the Series objects that result from selecting columns.
For example, we can get summary statistics on the “city” column.
"city"].describe() # This is a summary of the "city" column
people_df["age"].mean() # This is the mean of the "age" column people_df[
np.float64(27.666666666666668)
In a future series of lessons, we’ll dive deeper into slicing and manipulating DataFrames. Our goal in this lesson is just to get you familiar with the basic syntax and concepts.
5.7.1 Practice: Working with DataFrames
- Create a DataFrame called
students
with this information:- Columns: “Name”, “Age”, “Grade”
- Alice’s grade is 90, Bob’s grade is 85, and Charlie’s grade is 70. You pick the ages.
- Show only the “Grade” column
- Calculate and show the average age of the students
- Display the row for Bob.
# Your code here
5.8 Wrap-up
We’ve explored the main data structures for Python data analysis. From basic lists and dictionaries to Pandas Series and DataFrames, these tools are essential for organizing and analyzing data. They will be the foundation for more advanced data work in future lessons.