Consistency Models, Flow Matching, and Stochastic Interpolants


Introduction

Recent advances in generative modeling have explored alternative paths to diffusion and score-based models that rely on solving differential equations. In this post, we discuss a suite of ideas that center around modeling consistency across time, matching flows directly, and interpolating data distributions — namely Consistency Models, Flow Matching, Shortcut Models, and Stochastic Interpolants. These techniques share the common theme of avoiding slow sampling while preserving sample quality.


1. Consistency Models

Consistency Models (CMs) aim to learn a function \( f(x_t, t) \) that is consistent across time steps, allowing single-step or few-step generation of high-quality samples.

Instead of predicting a score or a velocity like in DDPM or DDIM, CMs train a function such that: \[ f(x_t, t) \approx f(x_s, s) \quad \text{for all } s < t \] where \( x_t \) and \( x_s \) are noisy versions of the same clean input \( x_0 \) at times \( t \) and \( s \) respectively.

The training objective is: \[ \mathcal{L}_{\text{consistency}} = \mathbb{E}_{x_0, t, s} \left[ \| f(x_t, t) - f(x_s, s) \|^2 \right] \] This effectively enforces that the denoising function gives the same output regardless of the noise level, enabling faster sampling compared to standard diffusion models.


2. Flow Matching

Flow Matching (FM) aims to directly learn a vector field \( v(x, t) \) that transports a base distribution (e.g., Gaussian) to a data distribution along a continuous flow.

The key idea is to match the vector field of a known flow: \[ \mathcal{L}_{\text{flow}} = \mathbb{E}_{x, t} \left[ \| \hat{v}(x, t) - v^*(x, t) \|^2 \right] \] where \( v^*(x, t) \) is an analytically known or sampleable vector field from a chosen stochastic interpolant (e.g., linear or Gaussian mixture paths).

Flow Matching is related to Score Matching and Schrödinger bridges, but often bypasses the need to compute score functions by directly using transport velocities.


3. Shortcut Models

Shortcut Models take inspiration from Consistency Models but extend them by modeling the consistency between arbitrary pairs of time steps, not just adjacent ones.

Instead of relying on continuous consistency, the shortcut model is trained to denoise: \[ f(x_t, t) \approx x_0 \quad \text{and} \quad f(x_s, s) \approx x_0 \] and enforce: \[ f(x_t, t) \approx f(x_s, s) \] for randomly sampled \( s, t \in [0, 1] \). This makes them amenable to single-shot sampling at arbitrary time steps, and composable with standard diffusion trajectories.


4. Stochastic Interpolants

A stochastic interpolant is a process that smoothly connects two distributions \( p_0(x) \) and \( p_1(x) \) by a stochastic trajectory \( x(t) \), such that: \[ x(0) \sim p_0(x), \quad x(1) \sim p_1(x) \]

One common interpolant is the Gaussian path: \[ x(t) = (1 - t) x_0 + t x_1 + \sqrt{t(1 - t)} \cdot \epsilon \] with \( x_0 \sim p_0 \), \( x_1 \sim p_1 \), and \( \epsilon \sim \mathcal{N}(0, I) \).

These interpolants give rise to known velocity fields that can be used for training flow matching or shortcut models.


Connections and Summary

These methods push generative modeling toward efficient, interpretable, and theoretically grounded alternatives to standard diffusion. They are particularly useful for reducing sampling time and improving model stability.