import plotly.express as px
import pandas as pd
from vega_datasets import data
9 Univariate Graphs with Plotly Express
9.1 Intro
In this lesson, you’ll learn how to create univariate graphs using Plotly Express. Univariate graphs are essential for understanding the distribution of a single variable, whether it’s categorical or quantitative.
Let’s get started!
9.2 Learning objectives
- Create bar charts, pie charts, and treemaps for categorical data using Plotly Express
- Generate histograms for quantitative data using Plotly Express
- Customize graph appearance and labels
9.3 Imports
This lesson requires plotly.express, pandas, and vega_datasets. Install them if you haven’t already.
9.4 Quantitative Data
9.4.1 Histogram
Histograms are used to visualize the distribution of continuous variables.
Let’s make a histogram of the tip amounts in the tips dataset.
= px.data.tips()
tips # view the first 5 rows tips.head()
total_bill | tip | sex | smoker | day | time | size | |
---|---|---|---|---|---|---|---|
0 | 16.99 | 1.01 | Female | No | Sun | Dinner | 2 |
1 | 10.34 | 1.66 | Male | No | Sun | Dinner | 3 |
2 | 21.01 | 3.50 | Male | No | Sun | Dinner | 3 |
3 | 23.68 | 3.31 | Male | No | Sun | Dinner | 2 |
4 | 24.59 | 3.61 | Female | No | Sun | Dinner | 4 |
='tip') px.histogram(tips, x
We can see that the highest bar, corresponding to tips between 1.75 and 2.24, has a frequency of 55. This means that there were 55 tips between 1.75 and 2.24.
Notice that plotly charts are interactive. You can hover over the bars to see the exact number of tips in each bin.
Try playing with the buttons at the top right. The button to download the chart as a png is especially useful.
9.4.2 Practice Q: Speed Distribution Histogram
Following the example of the histogram of tips, create a histogram of the speed distribution (Speed_IAS_in_knots) using the birdstrikes dataset.
= data.birdstrikes()
birdstrikes
birdstrikes.head()# Your code here
Airport__Name | Aircraft__Make_Model | Effect__Amount_of_damage | Flight_Date | Aircraft__Airline_Operator | Origin_State | When__Phase_of_flight | Wildlife__Size | Wildlife__Species | When__Time_of_day | Cost__Other | Cost__Repair | Cost__Total_$ | Speed_IAS_in_knots | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | BARKSDALE AIR FORCE BASE ARPT | T-38A | None | 1/8/90 0:00 | MILITARY | Louisiana | Climb | Large | Turkey vulture | Day | 0 | 0 | 0 | 300.0 |
1 | BARKSDALE AIR FORCE BASE ARPT | KC-10A | None | 1/9/90 0:00 | MILITARY | Louisiana | Approach | Medium | Unknown bird or bat | Night | 0 | 0 | 0 | 200.0 |
2 | BARKSDALE AIR FORCE BASE ARPT | B-52 | None | 1/11/90 0:00 | MILITARY | Louisiana | Take-off run | Medium | Unknown bird or bat | Day | 0 | 0 | 0 | 130.0 |
3 | NEW ORLEANS INTL | B-737-300 | Substantial | 1/11/90 0:00 | SOUTHWEST AIRLINES | Louisiana | Take-off run | Small | Rock pigeon | Day | 0 | 0 | 0 | 140.0 |
4 | BARKSDALE AIR FORCE BASE ARPT | KC-10A | None | 1/12/90 0:00 | MILITARY | Louisiana | Climb | Medium | Unknown bird or bat | Day | 0 | 0 | 0 | 160.0 |
='Speed_IAS_in_knots') px.histogram(birdstrikes, x
We can view the help documentation for the function by typing px.histogram?
in a cell and running it.
px.histogram?
From the help documentation, we can see that the px.histogram
function has many arguments that we can use to customize the graph.
Let’s make the histogram a bit nicer by adding a title, customizing the x axis label, and changing the color.
px.histogram(
tips,="tip",
x={"tip": "Tip Amount ($)"},
labels="Distribution of Tips",
title=["lightseagreen"]
color_discrete_sequence )
Color names are based on standard CSS color naming from Mozilla. You can see the full list here.
Alternatively, you can use hex color codes, like #1f77b4
. You can get these easily by using a color picker. Search for “color picker” on Google.
px.histogram(
tips,="tip",
x={"tip": "Tip Amount ($)"},
labels="Distribution of Tips",
title=["#6a5acd"]
color_discrete_sequence )
9.4.3 Practice Q: Bird Strikes Histogram Custom
Update your birdstrikes histogram to use a hex code color, add a title, and change the x-axis label to “Speed (Nautical Miles Per Hour)”.
# Your code here
px.histogram(
birdstrikes,="Speed_IAS_in_knots",
x={"Speed_IAS_in_knots": "Speed (Nautical Miles Per Hour)"},
labels="Distribution of Bird Strike Speeds",
title=["#4B0082"] # Indigo color
color_discrete_sequence )
9.4.4 Counts on bars
We can add counts to the bars with the text_auto
argument.
='tip', text_auto= True) px.histogram(tips, x
9.4.4.1 Bins and bandwidths
We can adjust the number of bins or bin width to better represent the data using the nbins
argument. Let’s make a histogram with just 10 bins:
='tip', nbins=10) px.histogram(tips, x
Now we have broader tip amount groups.
9.4.5 Practice Q: Speed Distribution Histogram Custom
Create a histogram of the speed distribution (Speed_IAS_in_knots) with 20 bins using the birdstrikes dataset. Add counts to the bars, use a color of your choice, and add an appropriate title.
# Your code here
px.histogram(
birdstrikes,="Speed_IAS_in_knots",
x=15,
nbins=True,
text_auto="Distribution of Bird Strike Speeds",
title=["#FF69B4"] # Hot pink color
color_discrete_sequence )
9.5 Categorical Data
9.5.1 Bar chart
Bar charts can be used to display the frequency of a single categorical variable.
Plotly has a px.bar
function that we will see later. But for single categorical variables, the function plotly wants you to use is actually px.histogram
. (Statisticians everywhere are crying; histograms are supposed to be used for just quantitative data!)
Let’s create a basic bar chart showing the distribution of sex in the tips dataset:
='sex') px.histogram(tips, x
Let’s add counts to the bars.
='sex', text_auto= True) px.histogram(tips, x
We can enhance the chart by adding a color axis, and customizing the labels and title.
='sex', text_auto=True, color='sex',
px.histogram(tips, x={'sex': 'Gender'},
labels='Distribution of Customers by Gender') title
Arguably, in this plot, we do not need the color
axis, since the sex
variable is already represented by the x axis. But public audiences like colors, so it may still be worth including.
However, we should remove the legend. Let’s also use custom colors.
For this, we can first create a figure object, then use the .layout.update
method from that object to update the legend.
= px.histogram(
tips_by_sex
tips,="sex",
x=True,
text_auto="sex",
color={"sex": "Gender"},
labels="Distribution of Customers by Gender",
title=["#1f77b4", "#ff7f0e"],
color_discrete_sequence
)
=False) tips_by_sex.update_layout(showlegend
9.5.2 Practice Q: Bird Strikes by Phase of Flight
Create a bar chart showing the frequency of bird strikes by the phase of flight, When__Phase_of_flight
. Add appropriate labels and a title. Use colors of your choice, and remove the legend.
# Your code here
= px.histogram(
fig
birdstrikes,="When__Phase_of_flight",
x=True,
text_auto="When__Phase_of_flight",
color={"When__Phase_of_flight": "Phase of Flight"},
labels="Bird Strikes by Phase of Flight",
title=px.colors.qualitative.Set3
color_discrete_sequence
)=False) fig.update_layout(showlegend
9.5.2.1 Sorting categories
It is sometimes useful to dictate a specific order for the categories in a bar chart.
Consider this bar chart of the election winners by district in the 2013 Montreal mayoral election.
= px.data.election()
election election.head()
district | Coderre | Bergeron | Joly | total | winner | result | district_id | |
---|---|---|---|---|---|---|---|---|
0 | 101-Bois-de-Liesse | 2481 | 1829 | 3024 | 7334 | Joly | plurality | 101 |
1 | 102-Cap-Saint-Jacques | 2525 | 1163 | 2675 | 6363 | Joly | plurality | 102 |
2 | 11-Sault-au-Récollet | 3348 | 2770 | 2532 | 8650 | Coderre | plurality | 11 |
3 | 111-Mile-End | 1734 | 4782 | 2514 | 9030 | Bergeron | majority | 111 |
4 | 112-DeLorimier | 1770 | 5933 | 3044 | 10747 | Bergeron | majority | 112 |
='winner') px.histogram(election, x
Let’s define a custom order for the categories. “Bergeron” will be first, then “Joly” then “Coderre”.
= ["Bergeron", "Joly", "Coderre"]
custom_order = px.histogram(election, x='winner', category_orders={'winner': custom_order})
election_chart election_chart
We can also sort the categories by frequency.
We can sort the categories by frequency using the categoryorder
attribute of the x axis.
= px.histogram(election, x="winner")
election_chart ="total descending") election_chart.update_xaxes(categoryorder
Or in ascending order:
= px.histogram(election, x="winner")
election_chart ="total ascending") election_chart.update_xaxes(categoryorder
9.5.3 Practice Q: Sorted Origin State Bar Chart
Create a sorted bar chart showing the distribution of bird strikes by origin state. Sort the bars in descending order of frequency.
# Your code here
= px.histogram(birdstrikes, x="Origin_State")
fig ="total descending") fig.update_xaxes(categoryorder
9.5.4 Horizontal bar chart
When you have many categories, horizontal bar charts are often easier to read than vertical bar charts. To make a horizontal bar chart, simply use the y
axis instead of the x
axis.
='day') px.histogram(tips, y
9.5.5 Practice Q: Horizontal Bar Chart of Origin State
Create a horizontal bar chart showing the distribution of bird strikes by origin state.
# Your code here
="Origin_State") px.histogram(birdstrikes, y
9.5.6 Pie chart
Pie charts are also useful for showing the proportion of categorical variables. They are best used when you have a small number of categories. For larger numbers of categories, pie charts are hard to read.
Let’s make a pie chart of the distribution of tips by day of the week.
="day") px.pie(tips, names
We can add labels to the pie chart to make it easier to read.
= px.pie(tips, names="day")
tips_by_day = tips_by_day.update_traces(textposition="inside", textinfo="percent+label")
tips_by_day_with_labels tips_by_day_with_labels
The legend is no longer needed, so we can remove it.
=False) tips_by_day_with_labels.update_layout(showlegend
If you forget how to make simple changes like this, don’t hesitate to consult the plotly documentation, Google or ChatGPT.
9.5.7 Practice Q: Wildlife Size Pie Chart
Create a pie chart showing the distribution of bird strikes by wildlife size. Include percentages and labels inside the pie slices.
# Your code here
= px.pie(birdstrikes, names="Wildlife__Size")
fig ="inside", textinfo="percent+label")
fig.update_traces(textposition=False) fig.update_layout(showlegend
9.6 Summary
In this lesson, you learned how to create univariate graphs using Plotly Express. You should now feel confident in your ability to create bar charts, pie charts, and histograms. You should also feel comfortable customizing the appearance of your graphs.
See you in the next lesson.