Sampling in Diffusion Models: SDEs vs. ODEs

Once a score-based diffusion model is trained, we can sample from it using either a stochastic differential equation (SDE) or a deterministic ordinary differential equation (ODE). This post explores both formulations and derives the probability flow ODE using the Fokker–Planck equation. We also explain the theoretical connection between DDPM (denoising diffusion probabilistic models) and DDIM (deterministic diffusion implicit models).

1. The Forward and Reverse SDE

We define a forward SDE that perturbs data \( \mathbf{x}_0 \sim p_{\text{data}} \) into noise:

\[ d\mathbf{x}_t = \mathbf{f}(\mathbf{x}_t, t) \, dt + g(t) \, d\mathbf{w}_t \]

where \( \mathbf{w}_t \) is standard Brownian motion. For example, in the variance-exploding (VE) setting:

\[ d\mathbf{x}_t = \sqrt{d\sigma^2(t)} \, d\mathbf{w}_t, \quad \text{with } \sigma(t) \text{ increasing.} \]

Under mild regularity conditions, the time-reversal of the above SDE is:

\[ d\mathbf{x}_t = \left[ \mathbf{f}(\mathbf{x}_t, t) - g(t)^2 \nabla_{\mathbf{x}} \log p_t(\mathbf{x}_t) \right] dt + g(t) d\bar{\mathbf{w}}_t \]

where \( \bar{\mathbf{w}}_t \) is standard Brownian motion in reverse time. If we have access to the score function \( \nabla_{\mathbf{x}} \log p_t(\mathbf{x}_t) \), we can simulate this reverse process.

2. The Fokker–Planck Equation

The Fokker–Planck equation describes the evolution of the density \( p_t(\mathbf{x}) \) over time for an SDE of the form:

\[ d\mathbf{x}_t = \mathbf{f}(\mathbf{x}_t, t) \, dt + g(t) \, d\mathbf{w}_t \]

The associated Fokker–Planck PDE is:

\[ \frac{\partial p}{\partial t} = -\nabla \cdot (\mathbf{f} p) + \frac{1}{2} \nabla \cdot \left( g^2 \nabla p \right) \]

This equation governs the forward density evolution of the stochastic process.

3. The Probability Flow ODE

Surprisingly, there exists a deterministic ODE that pushes forward the same densities \( p_t \) as the SDE. This is the Probability Flow ODE (PF-ODE).

Let \( s_\theta(\mathbf{x}, t) \approx \nabla_{\mathbf{x}} \log p_t(\mathbf{x}) \) be the learned score function. Then the PF-ODE is:

\[ \frac{d\mathbf{x}_t}{dt} = \mathbf{f}(\mathbf{x}_t, t) - \frac{1}{2} g(t)^2 \nabla_{\mathbf{x}} \log p_t(\mathbf{x}_t) \]

Proof sketch: Assume both the SDE and the ODE induce the same marginal densities \( p_t \). From the Fokker–Planck equation, the drift of the equivalent ODE must match the instantaneous probability flux:

\[ \mathbf{v}(\mathbf{x}, t) = \mathbf{f}(\mathbf{x}, t) - \frac{1}{2} g(t)^2 \nabla_{\mathbf{x}} \log p_t(\mathbf{x}) \]

This shows that both the SDE and PF-ODE induce the same dynamics of \( p_t \), even though their trajectories differ.

4. DDPM vs DDIM: A Special Case

The original DDPM model samples via a stochastic Markov chain with Gaussian noise injection. Its continuous limit is a stochastic SDE.

In contrast, DDIM introduces a non-Markovian deterministic sampling process. It turns out DDIM can be derived as a discretization of the PF-ODE.

In particular, DDIM sampling solves:

\[ d\mathbf{x}_t = \left( \mathbf{f}(\mathbf{x}_t, t) - \frac{1}{2} g(t)^2 s_\theta(\mathbf{x}_t, t) \right) dt \]

Hence, DDIM is the ODE counterpart of DDPM. This unifies the picture: DDPM ↔ SDE, DDIM ↔ ODE.

5. Practical Considerations

Stochastic sampling (SDE): generally more diverse but can be slower and harder to control.
Deterministic sampling (ODE): faster and amenable to control/conditioning, but less diverse.
Advanced solvers like Runge–Kutta or Heun’s method are used to numerically integrate the ODE or SDE.

6. Conclusion

Once trained, a score-based diffusion model defines a family of evolving densities. We can sample from this family using either stochastic dynamics (SDE) or deterministic ones (ODE), both derived from the same Fokker–Planck equation. This duality underpins the connection between DDPM and DDIM, and offers flexibility in the tradeoff between diversity and speed in generative modeling.