Schrödinger Bridges, Optimal Transport, and Diffusion Models
Introduction
Many recent generative modeling techniques can be seen as solving variants of transport problems between probability distributions. Two central frameworks in this space are the Optimal Transport (OT) problem and the Schrödinger Bridge (SB) problem. Surprisingly, models such as diffusion models, consistency models, and flow matching can all be viewed through these lenses. In this post, we unify these perspectives and highlight the key mathematical connections and modeling implications.
1. Optimal Transport: Deterministic Transport
The classical Optimal Transport (OT) problem aims to find the most cost-efficient way to transform one distribution \( p_0(x) \) into another \( p_1(x) \).
Formally, it seeks a transport map \( T: \mathbb{R}^d \to \mathbb{R}^d \) that minimizes: \[ \inf_T \mathbb{E}_{x \sim p_0} [c(x, T(x))] \quad \text{such that} \quad T_\# p_0 = p_1 \] where \( c(x, y) \) is a cost function (e.g., squared Euclidean distance) and \( T_\# p_0 \) denotes the pushforward of \( p_0 \) under \( T \).
When viewed dynamically, this corresponds to finding a velocity field \( v_t(x) \) such that the probability flow satisfies: \[ \frac{\partial p_t}{\partial t} + \nabla \cdot (p_t v_t) = 0 \] and \( p_0 \to p_1 \) over time.
2. Schrödinger Bridges: Stochastic Transport
The Schrödinger Bridge (SB) problem is a stochastic generalization of OT. It asks: among all stochastic processes that transform \( p_0 \) to \( p_1 \), which is closest to a given reference process (usually Brownian motion)?
Formally, given a reference path measure \( \mathbb{Q} \) (e.g., Brownian motion), we minimize the relative entropy: \[ \inf_{\mathbb{P}: \mathbb{P}_0 = p_0, \mathbb{P}_1 = p_1} \mathrm{KL}(\mathbb{P} \| \mathbb{Q}) \] This yields a stochastic process \( \mathbb{P} \) whose trajectories are close to Brownian motion but consistent with the marginals \( p_0, p_1 \).
The SB has both forward and backward drift components and can be represented using time-reversal of stochastic differential equations (SDEs).
3. Diffusion Models as Schrödinger Bridges
Diffusion models define a forward process: \[ dx = f(x, t) dt + g(t) dW_t \] with \( f(x, t) \) typically set to destroy structure (e.g., Ornstein–Uhlenbeck), and learn a reverse process: \[ dx = [f(x, t) - g(t)^2 \nabla_x \log p_t(x)] dt + g(t) d\bar{W}_t \] where \( \bar{W}_t \) is reverse-time Brownian motion.
This backward SDE corresponds to the Schrödinger Bridge solution, with the learned score function \( \nabla \log p_t(x) \) adjusting the Brownian reference to satisfy \( p_0 \to p_1 \).
Hence, score-based diffusion models are approximate Schrödinger Bridges, minimizing the KL between a reference Brownian process and a generative process.
4. Stochastic Interpolants: A Toolkit for Bridge Construction
A stochastic interpolant defines a path between two distributions using an interpolating SDE or stochastic map. These interpolants are useful for training shortcut models or flow matchers.
A classic example: \[ x(t) = (1 - t) x_0 + t x_1 + \sqrt{t(1 - t)} \cdot \epsilon \] defines a stochastic curve from \( x_0 \sim p_0 \) to \( x_1 \sim p_1 \).
The velocity and score along this path can be computed analytically or via sampling, enabling flow matching or score distillation.
5. Unifying View
Let’s summarize how these methods relate:
- Optimal Transport — deterministic path minimizing cost
- Schrödinger Bridge — stochastic path minimizing KL to a reference (e.g., Brownian)
- Diffusion Models — learn the SB backward drift via score matching
- Stochastic Interpolants — inject flexibility into interpolation (e.g., using controlled noise)
- Flow Matching — directly learn the velocity field of transport, bypassing score estimation
Conclusion
The fields of stochastic control, optimal transport, and generative modeling are converging. Schrödinger bridges provide a powerful theoretical framework for understanding modern generative models, especially diffusion and flow-based ones. Going forward, we can expect new models that explicitly solve Schrödinger bridge problems, use stochastic interpolants for training and sampling, and generalize flow matching to richer noise and dynamics models.