import plotly.express as px
import pandas as pd
import numpy as np
import plotly.io as pio
= px.data.tips() tips
8 Data Visualization Types
8.1 Slides
You can review the slides below, which was used to record the video. Or you can jump directly to the following sections, which contain more detail about each type of visualization.
8.2 Intro
In data visualization, there are often three main steps:
- Wrangle and clean your data.
- Pick the right type of visualization for your question.
- Write the code to implement that visualization.
In this lesson, we focus on step #2: understanding which types of plots best suit particular kinds of questions. We’ll look at examples from the built-in tips
dataset in Plotly, exploring:
- Univariate (single variable) numeric data
- Univariate (single variable) categorical data
- Bivariate numeric vs numeric
- Bivariate numeric vs categorical
- Bivariate categorical vs categorical
- Time series
We then practice with another dataset (gapminder
) to reinforce how to choose an effective plot for your data and your question.
8.3 Setup
Run the following code to set up the plotting environment and load the tips
dataset from Plotly Express.
9 1. Univariate Plots
When exploring one column (variable) in your data, your choice of visualization depends on whether that column is numeric (e.g. tip
, total_bill
) or categorical (e.g. sex
, day
).
9.1 1.1 Univariate Numeric
Questions of the form: > What is the distribution of a numeric variable?
> What’s its range? Its median? Where do most values lie?
Possible visualizations include:
- Histogram – good for seeing the overall distribution of values.
- Box Plot – highlights the median, quartiles, and outliers.
- Violin Plot – combines a box plot with a “smoothed” histogram (density) shape.
- Strip/Jitter Plot – shows each individual point, slightly “jittered” so they’re not on top of each other.
9.1.1 Histogram
= px.histogram(tips, x='tip')
fig_hist fig_hist.show()
9.1.2 Box Plot
= px.box(tips, x='tip')
fig_box fig_box.show()
9.1.3 Violin Plot (with Box + Points)
= px.violin(tips, x='tip', box=True, points='all')
fig_violin fig_violin.show()
9.2 1.2 Univariate Categorical
Questions of the form: > How many observations fall into each category? > In the case of the tips dataset, how many observations are there for each sex?
Possible visualizations include:
- Bar Chart – typically counts how many rows fall into each category.
- Pie Chart – can be used if you have only a few categories and want to highlight proportions.
9.2.1 Bar Chart
= px.histogram(tips, x='sex', color='sex', color_discrete_sequence=['#deb221', '#2f828a'])
fig_bar_cat =False)
fig_bar_cat.update_layout(showlegend fig_bar_cat.show()
9.2.2 Pie Chart
= px.pie(tips, names='sex', color='sex', values='tip',
fig_pie_cat =['#deb221', '#2f828a'])
color_discrete_sequence=False)
fig_pie_cat.update_layout(showlegend='none')
fig_pie_cat.update_traces(textposition fig_pie_cat.show()
10 2. Bivariate Plots
When comparing two variables, consider whether they’re numeric or categorical. We’ll look at:
- Numeric vs Numeric
- Numeric vs Categorical
- Categorical vs Categorical
10.1 2.1 Numeric vs Numeric
Typical question: > Do two numeric variables correlate? > In the case of the tips dataset, does the total bill correlate with the tip?
A scatter plot is usually the first choice.
= px.scatter(tips, x='total_bill', y='tip')
fig_scatter fig_scatter.show()
10.2 2.2 Numeric vs Categorical
Typical questions: > Which group has higher values on average?
> In the case of the tips dataset, do men or women tip more?
Visual options:
- Grouped Histograms – separate histograms for each category (often overlaid).
- Grouped Box/Violin/Strip – one distribution plot per category.
- Summary Bar Chart – show mean (or median) for each category, optionally with error bars.
10.2.1 Grouped Histogram
= px.histogram(tips, x='tip', color='sex', barmode='overlay',
fig_grouped_hist =['#deb221', '#2f828a'])
color_discrete_sequence fig_grouped_hist.show()
10.2.2 Grouped Violin/Box/Strip
= px.violin(tips, y='sex', x='tip', color='sex',
fig_grouped_violin =True, points='all',
box=['#deb221', '#2f828a'])
color_discrete_sequence=False)
fig_grouped_violin.update_layout(showlegend fig_grouped_violin.show()
10.2.3 Summary Bar (Mean + Std)
= tips.groupby('sex').agg({'tip': ['mean', 'std']}).reset_index()
summary_df = ['sex', 'mean_tip', 'std_tip']
summary_df.columns
= px.bar(
fig_mean_bar
summary_df,='sex',
y='mean_tip',
x='std_tip',
error_x='sex',
color=['#deb221', '#2f828a'],
color_discrete_sequence
)=False)
fig_mean_bar.update_layout(showlegend fig_mean_bar.show()
10.3 2.3 Categorical vs Categorical
Typical question: > How do two categorical variables overlap?
> In the case of the tips dataset, how does the sex ratio differ by day of the week?
Visual options:
- Stacked Bar Chart – total counts, stacked by category.
- Grouped/Clustered Bar Chart – side-by-side bars per category.
- Percent Stacked Bar Chart – each bar scaled to 100% so you see proportions.
10.3.1 Grouped Bar
= px.histogram(tips, x='day', color='sex', barmode='group',
fig_grouped_bar =['#deb221', '#2f828a'])
color_discrete_sequence fig_grouped_bar.show()
10.3.2 Stacked Bar
= px.histogram(tips, x='day', color='sex',
fig_stacked_bar =['#deb221', '#2f828a'])
color_discrete_sequence fig_stacked_bar.show()
10.3.3 Percent Stacked
= px.histogram(tips, x='day', color='sex',
fig_percent_stacked ='stack', barnorm='percent',
barmode=['#deb221', '#2f828a'])
color_discrete_sequence fig_percent_stacked.show()
11 3. Time Series
Although you can treat time as a numeric variable, time series are unique because they show how values change over continuous time. Common plots:
- Line Charts – ideal for continuous trends.
- Bar Charts – can work, but typically suggest discrete steps rather than a smooth progression.
Example with Nigeria’s population from Plotly’s gapminder
dataset:
= px.data.gapminder()
gap_dat = gap_dat.query('country == "Nigeria"')
nigeria_pop
# Bar Chart
= px.bar(nigeria_pop, x='year', y='pop')
fig_nigeria_bar
fig_nigeria_bar.show()
# Line Chart with Points
= px.line(nigeria_pop, x='year', y='pop', markers=True)
fig_nigeria_line fig_nigeria_line.show()
12 4. Practice with the Gapminder Dataset
Below, we filter gapminder
to the year 2007 and add a simple income group classification. Then we answer some common questions. Try thinking: “Is this numeric vs numeric, numeric vs categorical, or categorical vs categorical?” Then select your plot accordingly.
= (
gap_2007
gap_dat"year == 2007")
.query(=["iso_alpha", "iso_num", "year"])
.drop(columns=lambda df: np.where(df.gdpPercap > 15000, "High Income", "Low & Middle Income"))
.assign(income_group
) gap_2007.head()
country | continent | lifeExp | pop | gdpPercap | income_group | |
---|---|---|---|---|---|---|
11 | Afghanistan | Asia | 43.828 | 31889923 | 974.580338 | Low & Middle Income |
23 | Albania | Europe | 76.423 | 3600523 | 5937.029526 | Low & Middle Income |
35 | Algeria | Africa | 72.301 | 33333216 | 6223.367465 | Low & Middle Income |
47 | Angola | Africa | 42.731 | 12420476 | 4797.231267 | Low & Middle Income |
59 | Argentina | Americas | 75.320 | 40301927 | 12779.379640 | Low & Middle Income |
For each section below, try to answer the question on your own before looking at the solution.
12.1 4.1 How does GDP per capita vary across continents?
- GDP per capita = numeric
- Continent = categorical
A violin or box plot grouped by continent works well.
= px.violin(
fig_gdp_violin
gap_2007, ="gdpPercap",
x="continent",
y="continent",
color=True,
box="all",
points=px.colors.qualitative.G10
color_discrete_sequence
)=False)
fig_gdp_violin.update_layout(showlegend fig_gdp_violin.show()
12.2 4.2 Is there a relationship between a country’s GDP per capita and life expectancy?
- GDP per capita = numeric
- Life expectancy = numeric
A scatter plot is typically best.
= px.scatter(gap_2007, x='gdpPercap', y='lifeExp')
fig_scatter_gdp_life fig_scatter_gdp_life.show()
12.3 4.3 How does life expectancy vary between the income groups?
- Life expectancy = numeric
- Income group = categorical
Use grouped box, violin, or strip plots.
= px.violin(gap_2007, x="income_group", y="lifeExp", box=True, points="all")
fig_life_violin fig_life_violin.show()
12.4 4.4 What is the relationship between continent and income group?
- Continent = categorical
- Income group = categorical
A stacked or percent-stacked bar chart is a good choice.
= px.histogram(
fig_continent_income
gap_2007,='continent',
x='income_group',
color='stack',
barmode=['#deb221', '#2f828a']
color_discrete_sequence
) fig_continent_income.show()
13 Conclusion
Picking the right type of visualization starts with asking:
1. How many variables am I looking at?
2. Are they numeric or categorical?
3. What exact question am I trying to answer?
Most of the time, the plots shown above (histograms, box/violin plots, scatter plots, bar charts, etc.) will be enough to communicate your insights effectively. For time series or more specialized questions, you might pick line charts or other advanced visualizations.
Keep practicing with different datasets and always ask yourself: “Which plot would best answer the question at hand?”