There are three cases where you might want to use a cross-entropy loss function:

  1. You have a single-label binary target
  2. You have a single-label categorical target
  3. You have a multi-label categorical target

You can use binary cross-entropy for single-label binary targets and multi-label categorical targets (because it treats multi-label 0/1 indicator variables the same as single-label one-hot vectors). You can use categorical cross-entropy for single-label categorical targets.

But there are a couple things that make it a little weird to figure out which PyTorch loss you should reach for in the above cases.

The first confusing thing is the naming pattern. The loss classes for binary and categorical cross-entropy loss are BCELoss and CrossEntropyLoss, respectively. It’s not a huge deal, but Keras uses the same pattern for both functions (BinaryCrossentropy and CategoricalCrossentropy), which is a little nicer for tab complete.

The second confusing thing is that the target tensors require different patterns for the two losses. For binary cross-entropy, you pass in two tensors of the same shape. The output tensor should have elements in the range of [0, 1] and the target tensor with labels should be dummy indicators with 0 for false and 1 for true (in this case both the output and target tensors should be floats). For categorical cross-entropy, the output is similar, but the target is a one-dimensional tensor of class indices with type long.

Here’s a toy example in code you can use as a cheat sheet:

import torch
import torch.nn as nn

# Single-label binary
x = torch.randn(10)
yhat = torch.sigmoid(x)
y = torch.randint(2, (10,), dtype=torch.float)
loss = nn.BCELoss()(yhat, y)

# Single-label categorical
x = torch.randn(10, 5)
yhat = torch.softmax(x, dim=-1)
y = torch.randint(5, (10,))
loss = nn.CrossEntropyLoss()(yhat, y)

# Multi-label categorical
x = torch.randn(10, 5)
yhat = torch.sigmoid(x)
y = torch.randint(2, (10, 5), dtype=torch.float)
loss = nn.BCELoss()(yhat, y)