How to Use the Pandas Melt Command

2020-12-22

The Pandas melt command unpivots tabular data, transforming it from wide to long format. To understand this operation, you have to know how pivots work for tabular data.

But don’t worry! In this post, I will explain it to you. If you already understand this transformation well, you should skip to the usage section. But if you’re struggling to understand what the melt command actually does, read on.

How pivots and unpivots work

A pivot takes a list of records and aggregates them across two or more variables in order to show relationships between them.

Long format

Let’s say we have a list of sales transactions for our company in long format.

dateregionrevenue
1999-01-02APAC354
1999-01-02EMEA236
1999-01-03APAC700
1999-01-03APAC472
1999-01-04APAC642
1999-01-04APAC645
1999-01-05APAC255
1999-01-06EMEA711
1999-01-07APAC967
1999-01-07APAC280

Long format is common for storing data in databases because it’s efficient, but it’s not usually how humans want to consume the information.

Wide format

To get the table into a wide format view of total daily sales per region, we can perform a pivot.

dateAPACEMEA
1999-01-02354236
1999-01-0311720
1999-01-0412870
1999-01-052550
1999-01-060711
1999-01-0712470

This pivot sums the revenue from all transactions in each region for each day. You could also perform any other aggregate function like mean, min, max, etc.

Unpivoting

The melt command is all about converting a table from wide format (as above) into long format. This is called “unpivoting”. But watch out! Pivots are not invertible. Unpivoting is a valid operation on wide format tables, but we do not actually recover the original table in long format. This is because we actually threw some information away when we aggregated the transactions:

dateregiontotal_revenue
1999-01-02APAC354
1999-01-02EMEA236
1999-01-03APAC1172
1999-01-03EMEA0
1999-01-04APAC1287
1999-01-04EMEA0
1999-01-05APAC255
1999-01-05EMEA0
1999-01-06APAC0
1999-01-06EMEA711
1999-01-07APAC1247
1999-01-07EMEA0

Notice there are new records with 0 revenue and transactions that occured on the same day in the same region are summed. To make this clear, I recommend renaming the value column. In the new long table above I call the column total_revenue instead of revenue.

Usage

The melt() command comes in two forms. If you have a DataFrame called wide_df in wide format, you can unpivot it with:

  1. A top-level Pandas function: pd.melt(wide_df)
  2. A method on the DataFrame: wide_df.melt()

These do exactly the same thing.

For the most basic use case, you’ll probably want to pass in a few arguments:

  • id_vars should be the row identifier from your wide format table.
  • value_vars should be the list of columns you want to collapse.
  • var_name should be the new name of the value column.
pd.melt(
    wide_df,
    id_vars='date',
    value_vars=['APAC', 'EMEA'],
    value_name='total_revenue',
)

Or equivalently:

wide_df.melt(
    id_vars='date',
    value_vars=['APAC', 'EMEA'],
    value_name='total_revenue',
)

Note: if the row identifier you want to use is a Pandas index, you can use the reset_index() method to create a column with the same data:

wide_df = wide_df.reset_index()

Try it yourself!

There’s no better way to grok a new function than playing around with it yourself. Check out my Jupyter notebook on GitHub that creates a long format dummy dataset and pivots/unpivots it. You should experiment with the input arguments on both the pivot_table() and melt() methods. Enjoy!

Leave a Comment

Your email address will not be published. Required fields are marked *