9  Univariate Graphs with Plotly Express

9.1 Intro

In this lesson, you’ll learn how to create univariate graphs using Plotly Express. Univariate graphs are essential for understanding the distribution of a single variable, whether it’s categorical or quantitative.

Let’s get started!

9.2 Learning objectives

  • Create bar charts, pie charts, and treemaps for categorical data using Plotly Express
  • Generate histograms for quantitative data using Plotly Express
  • Customize graph appearance and labels

9.3 Imports

This lesson requires plotly.express, pandas, and vega_datasets. Install them if you haven’t already.

import plotly.express as px
import pandas as pd
from vega_datasets import data

9.4 Quantitative Data

9.4.1 Histogram

Histograms are used to visualize the distribution of continuous variables.

Let’s make a histogram of the tip amounts in the tips dataset.

tips = px.data.tips()
tips.head() # view the first 5 rows
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4
px.histogram(tips, x='tip')

We can see that the highest bar, corresponding to tips between 1.75 and 2.24, has a frequency of 55. This means that there were 55 tips between 1.75 and 2.24.

Side-note

Notice that plotly charts are interactive. You can hover over the bars to see the exact number of tips in each bin.

Try playing with the buttons at the top right. The button to download the chart as a png is especially useful.

Practice

9.4.2 Practice Q: Speed Distribution Histogram

Following the example of the histogram of tips, create a histogram of the speed distribution (Speed_IAS_in_knots) using the birdstrikes dataset.

birdstrikes = data.birdstrikes()
birdstrikes.head()
# Your code here
Airport__Name Aircraft__Make_Model Effect__Amount_of_damage Flight_Date Aircraft__Airline_Operator Origin_State When__Phase_of_flight Wildlife__Size Wildlife__Species When__Time_of_day Cost__Other Cost__Repair Cost__Total_$ Speed_IAS_in_knots
0 BARKSDALE AIR FORCE BASE ARPT T-38A None 1/8/90 0:00 MILITARY Louisiana Climb Large Turkey vulture Day 0 0 0 300.0
1 BARKSDALE AIR FORCE BASE ARPT KC-10A None 1/9/90 0:00 MILITARY Louisiana Approach Medium Unknown bird or bat Night 0 0 0 200.0
2 BARKSDALE AIR FORCE BASE ARPT B-52 None 1/11/90 0:00 MILITARY Louisiana Take-off run Medium Unknown bird or bat Day 0 0 0 130.0
3 NEW ORLEANS INTL B-737-300 Substantial 1/11/90 0:00 SOUTHWEST AIRLINES Louisiana Take-off run Small Rock pigeon Day 0 0 0 140.0
4 BARKSDALE AIR FORCE BASE ARPT KC-10A None 1/12/90 0:00 MILITARY Louisiana Climb Medium Unknown bird or bat Day 0 0 0 160.0
px.histogram(birdstrikes, x='Speed_IAS_in_knots')

We can view the help documentation for the function by typing px.histogram? in a cell and running it.

px.histogram?

From the help documentation, we can see that the px.histogram function has many arguments that we can use to customize the graph.

Let’s make the histogram a bit nicer by adding a title, customizing the x axis label, and changing the color.

px.histogram(
    tips,
    x="tip",
    labels={"tip": "Tip Amount ($)"},
    title="Distribution of Tips", 
    color_discrete_sequence=["lightseagreen"]
)

Color names are based on standard CSS color naming from Mozilla. You can see the full list here.

Alternatively, you can use hex color codes, like #1f77b4. You can get these easily by using a color picker. Search for “color picker” on Google.

px.histogram(
    tips,
    x="tip",
    labels={"tip": "Tip Amount ($)"},
    title="Distribution of Tips", 
    color_discrete_sequence=["#6a5acd"]
)
Practice

9.4.3 Practice Q: Bird Strikes Histogram Custom

Update your birdstrikes histogram to use a hex code color, add a title, and change the x-axis label to “Speed (Nautical Miles Per Hour)”.

# Your code here
px.histogram(
    birdstrikes,
    x="Speed_IAS_in_knots",
    labels={"Speed_IAS_in_knots": "Speed (Nautical Miles Per Hour)"},
    title="Distribution of Bird Strike Speeds",
    color_discrete_sequence=["#4B0082"]  # Indigo color
)

9.4.4 Counts on bars

We can add counts to the bars with the text_auto argument.

px.histogram(tips, x='tip', text_auto= True)

9.4.4.1 Bins and bandwidths

We can adjust the number of bins or bin width to better represent the data using the nbins argument. Let’s make a histogram with just 10 bins:

px.histogram(tips, x='tip', nbins=10)

Now we have broader tip amount groups.

Practice

9.4.5 Practice Q: Speed Distribution Histogram Custom

Create a histogram of the speed distribution (Speed_IAS_in_knots) with 20 bins using the birdstrikes dataset. Add counts to the bars, use a color of your choice, and add an appropriate title.

# Your code here
px.histogram(
    birdstrikes,
    x="Speed_IAS_in_knots",
        nbins=15,
    text_auto=True,
    title="Distribution of Bird Strike Speeds",
    color_discrete_sequence=["#FF69B4"]  # Hot pink color
)

9.5 Categorical Data

9.5.1 Bar chart

Bar charts can be used to display the frequency of a single categorical variable.

Plotly has a px.bar function that we will see later. But for single categorical variables, the function plotly wants you to use is actually px.histogram. (Statisticians everywhere are crying; histograms are supposed to be used for just quantitative data!)

Let’s create a basic bar chart showing the distribution of sex in the tips dataset:

px.histogram(tips, x='sex')   

Let’s add counts to the bars.

px.histogram(tips, x='sex', text_auto= True)

We can enhance the chart by adding a color axis, and customizing the labels and title.

px.histogram(tips, x='sex', text_auto=True, color='sex', 
             labels={'sex': 'Gender'},
             title='Distribution of Customers by Gender')

Arguably, in this plot, we do not need the color axis, since the sex variable is already represented by the x axis. But public audiences like colors, so it may still be worth including.

However, we should remove the legend. Let’s also use custom colors.

For this, we can first create a figure object, then use the .layout.update method from that object to update the legend.

tips_by_sex = px.histogram(
    tips,
    x="sex",
    text_auto=True,
    color="sex",
    labels={"sex": "Gender"},
    title="Distribution of Customers by Gender",
    color_discrete_sequence=["#1f77b4", "#ff7f0e"],
)

tips_by_sex.update_layout(showlegend=False)
Practice

9.5.2 Practice Q: Bird Strikes by Phase of Flight

Create a bar chart showing the frequency of bird strikes by the phase of flight, When__Phase_of_flight. Add appropriate labels and a title. Use colors of your choice, and remove the legend.

# Your code here
fig = px.histogram(
    birdstrikes,
    x="When__Phase_of_flight",
    text_auto=True,
    color="When__Phase_of_flight",
    labels={"When__Phase_of_flight": "Phase of Flight"},
    title="Bird Strikes by Phase of Flight",
    color_discrete_sequence=px.colors.qualitative.Set3
)
fig.update_layout(showlegend=False)

9.5.2.1 Sorting categories

It is sometimes useful to dictate a specific order for the categories in a bar chart.

Consider this bar chart of the election winners by district in the 2013 Montreal mayoral election.

election = px.data.election()
election.head()
district Coderre Bergeron Joly total winner result district_id
0 101-Bois-de-Liesse 2481 1829 3024 7334 Joly plurality 101
1 102-Cap-Saint-Jacques 2525 1163 2675 6363 Joly plurality 102
2 11-Sault-au-Récollet 3348 2770 2532 8650 Coderre plurality 11
3 111-Mile-End 1734 4782 2514 9030 Bergeron majority 111
4 112-DeLorimier 1770 5933 3044 10747 Bergeron majority 112
px.histogram(election, x='winner')

Let’s define a custom order for the categories. “Bergeron” will be first, then “Joly” then “Coderre”.

custom_order = ["Bergeron", "Joly", "Coderre"]
election_chart = px.histogram(election, x='winner', category_orders={'winner': custom_order})
election_chart

We can also sort the categories by frequency.

We can sort the categories by frequency using the categoryorder attribute of the x axis.

election_chart = px.histogram(election, x="winner")
election_chart.update_xaxes(categoryorder="total descending")

Or in ascending order:

election_chart = px.histogram(election, x="winner")
election_chart.update_xaxes(categoryorder="total ascending")
Practice

9.5.3 Practice Q: Sorted Origin State Bar Chart

Create a sorted bar chart showing the distribution of bird strikes by origin state. Sort the bars in descending order of frequency.

# Your code here
fig = px.histogram(birdstrikes, x="Origin_State")
fig.update_xaxes(categoryorder="total descending")

9.5.4 Horizontal bar chart

When you have many categories, horizontal bar charts are often easier to read than vertical bar charts. To make a horizontal bar chart, simply use the y axis instead of the x axis.

px.histogram(tips, y='day')
Practice

9.5.5 Practice Q: Horizontal Bar Chart of Origin State

Create a horizontal bar chart showing the distribution of bird strikes by origin state.

# Your code here
px.histogram(birdstrikes, y="Origin_State")

9.5.6 Pie chart

Pie charts are also useful for showing the proportion of categorical variables. They are best used when you have a small number of categories. For larger numbers of categories, pie charts are hard to read.

Let’s make a pie chart of the distribution of tips by day of the week.

px.pie(tips, names="day")

We can add labels to the pie chart to make it easier to read.

tips_by_day = px.pie(tips, names="day")
tips_by_day_with_labels = tips_by_day.update_traces(textposition="inside", textinfo="percent+label")
tips_by_day_with_labels

The legend is no longer needed, so we can remove it.

tips_by_day_with_labels.update_layout(showlegend=False)
Pro

If you forget how to make simple changes like this, don’t hesitate to consult the plotly documentation, Google or ChatGPT.

Practice

9.5.7 Practice Q: Wildlife Size Pie Chart

Create a pie chart showing the distribution of bird strikes by wildlife size. Include percentages and labels inside the pie slices.

# Your code here
fig = px.pie(birdstrikes, names="Wildlife__Size")
fig.update_traces(textposition="inside", textinfo="percent+label")
fig.update_layout(showlegend=False)

9.6 Summary

In this lesson, you learned how to create univariate graphs using Plotly Express. You should now feel confident in your ability to create bar charts, pie charts, and histograms. You should also feel comfortable customizing the appearance of your graphs.

See you in the next lesson.