23 Reshaping Data with melt() and pivot() – Introduction to Data Science with Python

23.1 Packages

import pandas as pd
import plotly.express as px

23.2 Data

Run the code below to load and define the datasets to be used in this lesson.

# Temperatures dataset
temperatures = pd.DataFrame(
    {
        "country": ["Sweden", "Denmark", "Norway"],
        "1994": [1, 2, 3],
        "1995": [3, 4, 5],
        "1996": [5, 6, 7],
    }
)

# Fuels Wide dataset
fuels_wide = pd.read_csv(
    "https://raw.githubusercontent.com/the-graph-courses/idap_book/main/data/oil_per_capita_wide.csv"
)

# Eurostat Births Wide dataset
euro_births_wide = pd.read_csv(
    "https://raw.githubusercontent.com/the-graph-courses/idap_book/main/data/euro_births_wide.csv"
)

# Contracts dataset
contracts = pd.read_csv(
    "https://raw.githubusercontent.com/the-graph-courses/idap_book/main/data/chicago_contracts_20_23.csv"
)

# Population dataset
population = pd.read_csv(
    "https://raw.githubusercontent.com/the-graph-courses/idap_book/main/data/tidyr_population.csv"
)

23.3 Introduction

Reshaping is a data manipulation technique that involves re-orienting the rows and columns of a dataset. This is often required to make data easier to analyze or understand.

In this lesson, we will cover how to effectively reshape data using pandas functions.

23.4 Learning Objectives

Understand what wide data format is and what long data format is.
Learn how to reshape wide data to long data using melt().
Learn how to reshape long data to wide data using pivot().

23.5 What Do “Wide” and “Long” Mean?

The terms “wide” and “long” are best understood in the context of example datasets. Let’s take a look at some now.

Imagine that you have three products for which you collect sales data over three months.

Wide Format:

Product	Jan	Feb	Mar
A	100	120	110
B	90	95	100
C	80	85	90

Long Format:

Product	Month	Sales
A	Jan	100
A	Feb	120
A	Mar	110
B	Jan	90
B	Feb	95
B	Mar	100
C	Jan	80
C	Feb	85
C	Mar	90

Take a minute to study the two datasets to make sure you understand the relationship between them.

In the wide dataset, each observational unit (each product) occupies only one row, and each measurement (sales in Jan, Feb, Mar) is in a separate column.

In the long dataset, on the other hand, each observational unit (each product) occupies multiple rows, with one row for each measurement.

Here’s another example with mock data, where the observational units are countries:

Long Format:

Country	Year	GDP
USA	2020	21433
USA	2021	22940
China	2020	14723
China	2021	17734

Wide Format:

Country	GDP_2020	GDP_2021
USA	21433	22940
China	14723	17734

The examples above are both time-series datasets because the measurements are repeated across time. But the concepts of wide and long are relevant to other kinds of data too.

Consider the example below, showing the number of employees in different departments of three companies:

Wide Format:

Company	HR	Sales	IT
A	10	20	15
B	8	25	20
C	12	18	22

Long Format:

Company	Department	Employees
A	HR	10
A	Sales	20
A	IT	15
B	HR	8
B	Sales	25
B	IT	20
C	HR	12
C	Sales	18
C	IT	22

In the wide dataset, each observational unit (each company) occupies only one row, with the repeated measurements (number of employees in different departments) spread across multiple columns.

In the long dataset, each observational unit is spread over multiple rows.

Vocab

The observational units, sometimes called statistical units, are the primary entities or items described by the dataset.

In the first example, the observational units were products; in the second example, countries; and in the third example, companies.

Practice

23.6 Practice Q: Wide or Long?

Consider the temperatures dataset created earlier:

temperatures

	country	1994	1995	1996
0	Sweden	1	3	5
1	Denmark	2	4	6
2	Norway	3	5	7

Is this data in a wide or long format?

23.7 When Should You Use Wide vs. Long Data?

The truth is, it really depends on what you want to do! The wide format is great for displaying data because it’s easy to visually compare values this way. Long data is best for some data analysis tasks, like grouping and plotting.

It is essential to know how to switch from one format to the other easily. Switching from the wide to the long format, or the other way around, is called reshaping.

23.8 From Wide to Long with `melt()`

To practice reshaping from a wide to a long format, we’ll consider data from Our World in Data on fossil fuel consumption per capita. You can find the data here.

Below, we view this data on fossil fuel consumption per capita:

fuels_wide

	Entity	Code	y_1970	y_1980	y_1990	y_2000	y_2010	y_2020
0	Algeria	DZA	1764.8470	3532.7976	4381.6636	3351.2180	5064.9863	4877.2680
1	Argentina	ARG	11677.9680	10598.3990	7046.2485	7146.8154	7966.7827	6399.2114
2	Australia	AUS	23040.4550	25007.4380	23046.9510	23976.3550	23584.3070	20332.4100
3	Austria	AUT	14338.8090	19064.0920	16595.1930	18189.0920	18424.1170	14934.0650
4	Azerbaijan	AZE	NaN	NaN	13516.0190	9119.3470	4031.9407	5615.1157
...	...	...	...	...	...	...	...	...
76	United States	USA	40813.9530	42365.6500	37525.5160	37730.1600	31791.3070	26895.4770
77	Uzbekistan	UZB	NaN	NaN	6324.8677	3197.1330	1880.1338	1859.1548
78	Venezuela	VEN	11138.2210	16234.0960	12404.5570	11239.9260	14948.3070	4742.6226
79	Vietnam	VNM	1757.6117	439.9465	523.2565	1280.3065	2296.7590	2927.7446
80	World	OWID_WRL	7217.8340	8002.0854	7074.2583	6990.4272	6879.6110	6216.8060

81 rows × 8 columns

We observe that each observational unit (each country) occupies only one row, with the repeated measurements of fossil fuel consumption (in kilowatt-hour equivalents) spread out across multiple columns. Hence, this dataset is in a wide format.

To convert it to a long format, we can use the convenient melt function. Within melt, can define the id variables, which we do not want to reshape:

fuels_long = fuels_wide.melt(id_vars=["Entity", "Code"])
fuels_long

	Entity	Code	variable	value
0	Algeria	DZA	y_1970	1764.8470
1	Argentina	ARG	y_1970	11677.9680
2	Australia	AUS	y_1970	23040.4550
3	Austria	AUT	y_1970	14338.8090
4	Azerbaijan	AZE	y_1970	NaN
...	...	...	...	...
481	United States	USA	y_2020	26895.4770
482	Uzbekistan	UZB	y_2020	1859.1548
483	Venezuela	VEN	y_2020	4742.6226
484	Vietnam	VNM	y_2020	2927.7446
485	World	OWID_WRL	y_2020	6216.8060

486 rows × 4 columns

Very easy!

Let’s sort it so it’s easier to read:

fuels_long = fuels_long.sort_values(by=['Entity', 'variable'])
fuels_long

	Entity	Code	variable	value
0	Algeria	DZA	y_1970	1764.8470
81	Algeria	DZA	y_1980	3532.7976
162	Algeria	DZA	y_1990	4381.6636
243	Algeria	DZA	y_2000	3351.2180
324	Algeria	DZA	y_2010	5064.9863
...	...	...	...	...
161	World	OWID_WRL	y_1980	8002.0854
242	World	OWID_WRL	y_1990	7074.2583
323	World	OWID_WRL	y_2000	6990.4272
404	World	OWID_WRL	y_2010	6879.6110
485	World	OWID_WRL	y_2020	6216.8060

486 rows × 4 columns

The years are now indicated in the variable variable, and all the consumption values occupy a single variable, value. We may wish to rename the variable column to year, and the value column to oil_consumption. This can be done directly in the melt function:

fuels_long = fuels_wide.melt(
    id_vars=['Entity', 'Code'],
    var_name='year',
    value_name='oil_consumption'
).sort_values(by=['Entity', 'year'])
fuels_long

	Entity	Code	year	oil_consumption
0	Algeria	DZA	y_1970	1764.8470
81	Algeria	DZA	y_1980	3532.7976
162	Algeria	DZA	y_1990	4381.6636
243	Algeria	DZA	y_2000	3351.2180
324	Algeria	DZA	y_2010	5064.9863
...	...	...	...	...
161	World	OWID_WRL	y_1980	8002.0854
242	World	OWID_WRL	y_1990	7074.2583
323	World	OWID_WRL	y_2000	6990.4272
404	World	OWID_WRL	y_2010	6879.6110
485	World	OWID_WRL	y_2020	6216.8060

486 rows × 4 columns

You may also want to remove the y_ in front of each year. This can be achieved with a string operation.

fuels_long['year'] = fuels_long['year'].str.replace('y_', '').astype(int)
fuels_long

	Entity	Code	year	oil_consumption
0	Algeria	DZA	1970	1764.8470
81	Algeria	DZA	1980	3532.7976
162	Algeria	DZA	1990	4381.6636
243	Algeria	DZA	2000	3351.2180
324	Algeria	DZA	2010	5064.9863
...	...	...	...	...
161	World	OWID_WRL	1980	8002.0854
242	World	OWID_WRL	1990	7074.2583
323	World	OWID_WRL	2000	6990.4272
404	World	OWID_WRL	2010	6879.6110
485	World	OWID_WRL	2020	6216.8060

486 rows × 4 columns

Here’s what we did above:

Used str.replace() to remove the y_ prefix from each year.
Converted the year column to integers using astype(int).
Sorted the data by “Entity” and “year” using sort_values().

Practice

23.9 Practice Q: Temperatures to Long

Convert the temperatures dataset shown below into a long format. Your answer should have the following column names: “country”, “year”, and “avg_temp”.

# Your code here
temperatures

	country	1994	1995	1996
0	Sweden	1	3	5
1	Denmark	2	4	6
2	Norway	3	5	7

Practice

23.10 Practice Q: Eurostat Births to Long

For this practice question, you will use the euro_births_wide dataset from Eurostat. It shows the annual number of births in 50 European countries:

euro_births_wide.head()

	country	x2015	x2016	x2017	x2018	x2019	x2020	x2021
0	Belgium	122274.0	121896.0	119690.0	118319.0	117695.0	114350.0	118349.0
1	Bulgaria	65950.0	64984.0	63955.0	62197.0	61538.0	59086.0	58678.0
2	Czechia	110764.0	112663.0	114405.0	114036.0	112231.0	110200.0	111793.0
3	Denmark	58205.0	61614.0	61397.0	61476.0	61167.0	60937.0	63473.0
4	Germany	737575.0	792141.0	784901.0	787523.0	778090.0	773144.0	795492.0

The data is in a wide format. Convert it to a long format DataFrame that has the following column names: “country”, “year”, and “births_count”.

Remove the x prefix from the year columns and convert them to integers.

23.11 Using Long Data for Analysis

Let’s see why long data is often better for analysis.

Consider again the fuels_wide dataset:

fuels_wide.head()

	Entity	Code	y_1970	y_1980	y_1990	y_2000	y_2010	y_2020
0	Algeria	DZA	1764.847	3532.7976	4381.6636	3351.2180	5064.9863	4877.2680
1	Argentina	ARG	11677.968	10598.3990	7046.2485	7146.8154	7966.7827	6399.2114
2	Australia	AUS	23040.455	25007.4380	23046.9510	23976.3550	23584.3070	20332.4100
3	Austria	AUT	14338.809	19064.0920	16595.1930	18189.0920	18424.1170	14934.0650
4	Azerbaijan	AZE	NaN	NaN	13516.0190	9119.3470	4031.9407	5615.1157

fuels_long.head()

	Entity	Code	year	oil_consumption
0	Algeria	DZA	1970	1764.8470
81	Algeria	DZA	1980	3532.7976
162	Algeria	DZA	1990	4381.6636
243	Algeria	DZA	2000	3351.2180
324	Algeria	DZA	2010	5064.9863

If we want to find the average fossil fuel consumption per country, this is very easy to do with the long format:

fuels_long.groupby('Entity')['oil_consumption'].mean()

Entity
Algeria           3828.796750
Argentina         8472.570833
Australia        23164.652667
Austria          16924.228000
Azerbaijan        8070.605600
                     ...     
United States    36187.010500
Uzbekistan        3315.322325
Venezuela        11784.621600
Vietnam           1537.604133
World             7063.503650
Name: oil_consumption, Length: 81, dtype: float64

But with the wide format, this is not so easy:

fuels_wide[['y_1970', 'y_1980', 'y_1990', 'y_2000', 'y_2010', 'y_2020']].mean(axis=1)

0      3828.796750
1      8472.570833
2     23164.652667
3     16924.228000
4      8070.605600
          ...     
76    36187.010500
77     3315.322325
78    11784.621600
79     1537.604133
80     7063.503650
Length: 81, dtype: float64

Imagine if you had 100 years of data!

And mean is a fairly simple operation. How would you calculate the standard deviation of fossil fuel consumption per country?

Long data is also very useful for plotting.

For example, to plot the average fossil fuel consumption per country over time, we can use the following code:

subset = fuels_long.query('Entity in ["Peru", "Iran", "China"]')
px.line(subset, x='year', y='oil_consumption', color='Entity', title='Average Fossil Fuel Consumption per Country')

To create a plot like this with the wide format is not directly possible, since the data you want to plot is scattered across multiple columns.

So as you can see, while wide data is great for display, long data is very useful for analysis and plotting.

23.12 From Long to Wide

Now you know how to reshape from wide to long with melt(). How about going the other way, from long to wide? For this, you can use the pivot() function.

But before we see how to use this function to manipulate long data, let’s first consider where you’re likely to run into long data.

While wide data tends to come from external sources (as we have seen above), long data, on the other hand, is likely to be created by you while data wrangling, especially in the course of grouped aggregations.

Let’s see an example of this now.

We will use a dataset of contracts granted by the city of Chicago from 2020 to 2023. You can find more information about the data here.

contracts

	year	approval_date	description	contract_num	revision_num	specification_num	contract_type	start_date	end_date	department	vendor_name	vendor_id	address_1	address_2	city	state	zip	award_amount	procurement_type	contract_pdf
0	2020	2020-01-02	LEASE	24406	32	96136	PROPERTY LEASE	NaN	NaN	NaN	8700 BUILDING LLC	89123305A	7300 S NARRAGANSETT	NaN	BEDFORD PARK	Illinois	60638	321.1	NaN	NaN
1	2020	2020-01-03	DFSS-HHS-CS-CEL:	113798	0	1070196	DELEGATE AGENCY	12/01/2019	11/30/2022	DEPT OF FAMILY AND SUPPORT SERVICES	CATHOLIC CHARITIES OF THE ARCHDIOCESE OF CHICAGO	102484615A	1 E BANKS ST	NaN	CHICAGO	Illinois	60670	17692515.0	NaN	NaN
2	2020	2020-01-03	DFSS-HHS-CS-CEL:	113819	0	1070196	DELEGATE AGENCY	12/01/2019	11/30/2022	DEPT OF FAMILY AND SUPPORT SERVICES	KIMBALL DAYCARE CENTER & KINDERGARTEN INC	105458567Z	1636-1638 N KIMBALL AVE	NaN	CHICAGO	Illinois	60647	11461500.0	NaN	http://ecm.cityofchicago.org/eSMARTContracts/s...
3	2020	2020-01-03	DFSS-HHS-CS-CEL:	113818	0	1070196	DELEGATE AGENCY	12/01/2019	11/30/2022	DEPT OF FAMILY AND SUPPORT SERVICES	JUDAH INTERNATIONAL OUTREACH MINISTRIES, INC	94219962X	856 N PULASKI RD	NaN	CHICAGO	Illinois	60651	2356515.0	NaN	http://ecm.cityofchicago.org/eSMARTContracts/s...
4	2020	2020-01-03	DFSS-HHS-CS-CEL:	113820	0	1070196	DELEGATE AGENCY	12/01/2019	11/30/2022	DEPT OF FAMILY AND SUPPORT SERVICES	Marillac St. Vincent Family Services Inc DBA S...	97791861L	212 S FRANCISCO AVENUE EFT	NaN	CHICAGO	Illinois	60612	3666015.0	NaN	http://ecm.cityofchicago.org/eSMARTContracts/s...
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
28823	2023	2023-12-29	DFSS-CORP-HL-PSH:	220413	3	1221503	DELEGATE AGENCY	NaN	NaN	DEPT OF FAMILY AND SUPPORT SERVICES	INNER VOICE INC.	6231926M	1621 W WALNUT ST FL 1ST	NaN	CHICAGO	Illinois	60612	0.0	NaN	NaN
28824	2023	2023-12-29	DFSS-CORP-YS-OST:	253846	0	1247493	DELEGATE AGENCY	NaN	NaN	DEPT OF FAMILY AND SUPPORT SERVICES	AFTER-SCHOOL MATTERS, INC.\|CLEANED-UP	72580818P	66 E RANDOLPH ST FL 1ST	NaN	CHICAGO	Illinois	60601	32000.0	NaN	NaN
28825	2023	2023-12-29	DFSS-IDHS-HL-INTHS:	253843	0	1235949	DELEGATE AGENCY	NaN	NaN	DEPT OF FAMILY AND SUPPORT SERVICES	BREAKTHROUGH URBAN MINISTRIES, INC.	94722896V	402 N ST LOUIS AVENUE EFT	NaN	CHICAGO	Illinois	60624	14400.0	NaN	NaN
28826	2023	2023-12-29	CDPH-RW-PA: ESS-HRSA PO 116685 CHICAGO HOUSE A...	192085	1	1095441	DELEGATE AGENCY	NaN	NaN	DEPARTMENT OF HEALTH	CHICAGO HOUSE & SOCIAL SERVICE AGENCY	105470138T	2229 S MICHIGAN AVE 304 EFT	NaN	CHICAGO	Illinois	60616	-32025.2	NaN	NaN
28827	2023	2023-12-29	DFSS-HHS-CS-CEL:	222199	1	1070196	DELEGATE AGENCY	NaN	NaN	DEPT OF FAMILY AND SUPPORT SERVICES	ALLISON'S INFANT & TODDLER CENTER INC	62751817Z	234 E 115TH ST FL 1ST	NaN	CHICAGO	Illinois	60628	141923.0	NaN	NaN

28828 rows × 20 columns

Each row corresponds to one contract, and we have each contract’s ID number, the year in which it was granted, the amount of the contract, and the vendor’s name and address, among other variables.

Now, consider the following grouped summary of the contracts dataset, which shows the number of contracts by state of the vendor in each year:

contracts_summary = contracts.groupby("state")["year"].value_counts().reset_index()
contracts_summary

	state	year	count
0	Alabama	2023	7
1	Alabama	2021	2
2	Alabama	2020	1
3	Alabama	2022	1
4	Arizona	2020	3
...	...	...	...
128	Washington	2021	1
129	Wisconsin	2023	25
130	Wisconsin	2020	18
131	Wisconsin	2022	17
132	Wisconsin	2021	15

133 rows × 3 columns

The output of this grouped operation is a quintessentially “long” dataset. Each observational unit (each state) occupies multiple rows, with one row for each measurement (each year).

Now, let’s see how to convert such long data into a wide format with pivot().

The code is quite straightforward:

contracts_wide = contracts_summary.pivot(
    index="state", columns="year", values="count"
).reset_index()
contracts_wide.head()

year	state	2020	2021	2022	2023
0	Alabama	1.0	2.0	1.0	7.0
1	Arizona	3.0	1.0	3.0	2.0
2	Arkansas	1.0	NaN	1.0	NaN
3	British Columbia	NaN	1.0	NaN	NaN
4	California	36.0	42.0	43.0	38.0

As you can see, pivot() has three important arguments:

index defines which column(s) to use as the new index. In our case, it’s the “state” since we want each row to represent one state.
columns identifies which variable to use to define column names in the wide format. In our case, it’s the “year”. You can see that the years are now the column names.
values specifies which values will become the core of the wide data format. In our case, it’s the number of contracts “count”.

You might also want to have the years be your primary observational units, with each year occupying one row. This can be carried out similarly to the above example, but with year as the index and state as the columns:

contracts_wide_year = contracts_summary.pivot(
    index="year", columns="state", values="count"
).reset_index()
contracts_wide_year

state	year	Alabama	Arizona	Arkansas	British Columbia	California	Canada	Colorado	Connecticut	Delaware	...	Oregon	Pennsylvania	Rhode Island	South Carolina	Tennessee	Texas	Vermont	Virginia	Washington	Wisconsin
0	2020	1.0	3.0	1.0	NaN	36.0	1.0	6.0	1.0	NaN	...	5.0	20.0	1.0	2.0	2.0	25.0	NaN	4.0	NaN	18.0
1	2021	2.0	1.0	NaN	1.0	42.0	1.0	2.0	5.0	NaN	...	NaN	24.0	NaN	3.0	2.0	24.0	1.0	4.0	1.0	15.0
2	2022	1.0	3.0	1.0	NaN	43.0	1.0	7.0	3.0	1.0	...	NaN	31.0	NaN	2.0	3.0	37.0	NaN	7.0	NaN	17.0
3	2023	7.0	2.0	NaN	NaN	38.0	NaN	6.0	5.0	2.0	...	NaN	37.0	NaN	1.0	3.0	28.0	NaN	9.0	NaN	25.0

4 rows × 44 columns

Here, the unique observation units (our rows) are now the years (2020, 2021, 2022, 2023).

Practice

23.13 Practice Q: Temperatures back to Wide

Convert the long temperatures_long dataset you created above back to a wide format. Your answer should have the following column names: “country”, “1994”, “1995”, and “1996”.

# Your code here

Practice

23.14 Practice Q: Population to Wide

The population dataset shows the populations of 219 countries over time.

Reshape this data into a wide format.

population

	country	year	population
0	Afghanistan	1995	17586073
1	Afghanistan	1996	18415307
2	Afghanistan	1997	19021226
3	Afghanistan	1998	19496836
4	Afghanistan	1999	19987071
...	...	...	...
4040	Zimbabwe	2009	12888918
4041	Zimbabwe	2010	13076978
4042	Zimbabwe	2011	13358738
4043	Zimbabwe	2012	13724317
4044	Zimbabwe	2013	14149648

4045 rows × 3 columns

23.15 Reshaping Can Be Hard

We have mostly looked at very simple examples of reshaping here, but in the wild, reshaping can be difficult to do accurately.

When you run into such cases, we recommend looking at the official documentation of reshaping from the pandas team, as it is quite rich in examples.

23.16 Wrap-Up

Congratulations! You’ve gotten the hang of reshaping data with pandas.

You now understand the differences between wide and long formats and can skillfully use melt() and pivot() to transform your data as needed.

23.17 Solution Q: Wide or Long?

The data is in a wide format.

23.18 Solution Q: Temperatures to Long

# Melt the wide data into long format
temperatures_long = temperatures.melt(
    id_vars=["country"], var_name="year", value_name="avgtemp"
)

# Display the long format data
temperatures_long

	country	year	avgtemp
0	Sweden	1994	1
1	Denmark	1994	2
2	Norway	1994	3
3	Sweden	1995	3
4	Denmark	1995	4
5	Norway	1995	5
6	Sweden	1996	5
7	Denmark	1996	6
8	Norway	1996	7

23.19 Solution Q: Eurostat Births to Long

# Melt the wide data into long format
births_long = euro_births_wide.melt(
    id_vars=["country"], var_name="year", value_name="births_count"
)

# Display the long format data
births_long

	country	year	births_count
0	Belgium	x2015	122274.0
1	Bulgaria	x2015	65950.0
2	Czechia	x2015	110764.0
3	Denmark	x2015	58205.0
4	Germany	x2015	737575.0
...	...	...	...
345	Ukraine	x2021	212.0
346	Armenia	x2021	271983.0
347	Azerbaijan	x2021	NaN
348	Georgia	x2021	112284.0
349	NaN	x2021	45946.0

350 rows × 3 columns

23.20 Solution Q: Temperatures back to Wide

# Pivot the long data into wide format
temperatures_wide = temperatures_long.pivot(
    index="country", columns="year", values="avgtemp"
).reset_index()

# Display the wide format data
temperatures_wide

year	country	1994	1995	1996
0	Denmark	2	4	6
1	Norway	3	5	7
2	Sweden	1	3	5

23.21 Solution Q: Population to Wide

# Pivot the long data into wide format
population_wide = population.pivot(
    index="country", columns="year", values="population"
).reset_index()

# Display the wide format data
population_wide

year	country	1995	1996	1997	1998	1999	2000	2001	2002	2003	2004	2005	2006	2007	2008	2009	2010	2011	2012	2013
0	Afghanistan	17586073.0	18415307.0	19021226.0	19496836.0	19987071.0	20595360.0	21347782.0	22202806.0	23116142.0	24018682.0	24860855.0	25631282.0	26349243.0	27032197.0	27708187.0	28397812.0	29105480.0	29824536.0	30551674.0
1	Albania	3357858.0	3341043.0	3331317.0	3325456.0	3317941.0	3304948.0	3286084.0	3263596.0	3239385.0	3216197.0	3196130.0	3179573.0	3166222.0	3156608.0	3151185.0	3150143.0	3153883.0	3162083.0	3173271.0
2	Algeria	29315463.0	29845208.0	30345466.0	30820435.0	31276295.0	31719449.0	32150198.0	32572977.0	33003442.0	33461345.0	33960903.0	34507214.0	35097043.0	35725377.0	36383302.0	37062820.0	37762962.0	38481705.0	39208194.0
3	American Samoa	52874.0	53926.0	54942.0	55899.0	56768.0	57522.0	58176.0	58729.0	59117.0	59262.0	59117.0	58652.0	57919.0	57053.0	56245.0	55636.0	55274.0	55128.0	55165.0
4	Andorra	63854.0	64274.0	64090.0	63799.0	64084.0	65399.0	68000.0	71639.0	75643.0	79060.0	81223.0	81877.0	81292.0	79969.0	78659.0	77907.0	77865.0	78360.0	79218.0
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
212	Wallis and Futuna Islands	14143.0	14221.0	14309.0	14394.0	14460.0	14497.0	14501.0	14476.0	14422.0	14344.0	14246.0	14126.0	13988.0	13840.0	13697.0	13565.0	13451.0	13353.0	13272.0
213	West Bank and Gaza Strip	2598393.0	2722497.0	2851993.0	2980563.0	3099951.0	3204572.0	3291620.0	3363542.0	3426549.0	3489743.0	3559856.0	3638829.0	3725076.0	3817551.0	3914035.0	4012880.0	4114199.0	4218771.0	4326295.0
214	Yemen	15018201.0	15578640.0	16088019.0	16564235.0	17035531.0	17522537.0	18029989.0	18551068.0	19081306.0	19612696.0	20139661.0	20661714.0	21182162.0	21703571.0	22229625.0	22763008.0	23304206.0	23852409.0	24407381.0
215	Zambia	8841338.0	9073311.0	9320089.0	9577483.0	9839179.0	10100981.0	10362137.0	10625423.0	10894519.0	11174650.0	11470022.0	11781612.0	12109620.0	12456527.0	12825031.0	13216985.0	13633796.0	14075099.0	14538640.0
216	Zimbabwe	11639364.0	11846110.0	12045813.0	12229500.0	12384727.0	12503652.0	12586763.0	12640922.0	12673103.0	12693047.0	12710589.0	12724308.0	12740160.0	12784041.0	12888918.0	13076978.0	13358738.0	13724317.0	14149648.0

217 rows × 20 columns

23.1 Packages

23.2 Data

23.3 Introduction

23.4 Learning Objectives

23.5 What Do “Wide” and “Long” Mean?

23.6 Practice Q: Wide or Long?

23.7 When Should You Use Wide vs. Long Data?

23.8 From Wide to Long with melt()

23.9 Practice Q: Temperatures to Long

23.10 Practice Q: Eurostat Births to Long

23.11 Using Long Data for Analysis

23.12 From Long to Wide

23.13 Practice Q: Temperatures back to Wide

23.14 Practice Q: Population to Wide

23.15 Reshaping Can Be Hard

23.16 Wrap-Up

23.17 Solution Q: Wide or Long?

23.18 Solution Q: Temperatures to Long

23.19 Solution Q: Eurostat Births to Long

23.20 Solution Q: Temperatures back to Wide

23.21 Solution Q: Population to Wide

23.8 From Wide to Long with `melt()`