Stochastic Processes, Brownian Motion, and Itô Calculus
This post is intended as a compact yet rigorous introduction to the stochastic calculus tools used in the study of diffusion models in machine learning. We'll cover the foundational elements of stochastic processes, focusing on Brownian motion and the Itô integral, which are key to understanding how diffusion models work.
1. Stochastic Processes
A stochastic process is a collection of random variables \( \{X_t\}_{t \geq 0} \) defined on a common probability space. The index \( t \) typically represents time. These processes model systems that evolve randomly over time.
Key properties to consider include:
- Stationarity: Statistical properties do not change over time.
- Markov property: The future state depends only on the present, not the past.
- Continuity: Whether the sample paths \( t \mapsto X_t(\omega) \) are continuous.
2. Brownian Motion
The most fundamental stochastic process for modeling random noise is Brownian motion, also known as a Wiener process. A standard Brownian motion \( \{W_t\}_{t \geq 0} \) satisfies:
- \( W_0 = 0 \)
- Independent increments: \( W_{t+s} - W_s \) is independent of the past for all \( t, s \geq 0 \)
- Gaussian increments: \( W_t - W_s \sim \mathcal{N}(0, t - s) \) for \( t > s \)
- Continuous paths: \( t \mapsto W_t \) is continuous almost surely
This randomness underlies the noise component in diffusion models.
3. Itô Calculus
Classical calculus cannot be directly applied to stochastic processes like \( W_t \), because Brownian paths are nowhere differentiable. This leads to the development of Itô calculus, a framework for defining integrals of the form:
\[ \int_0^t f(s, W_s) \, dW_s \]
This is called an Itô integral. Its properties are different from the Riemann or Lebesgue integrals. A critical result is the Itô's lemma, which serves as the stochastic version of the chain rule:
Itô’s Lemma: If \( X_t \) follows an Itô process: \[ dX_t = \mu(t, X_t) \, dt + \sigma(t, X_t) \, dW_t \] and \( f(t, X_t) \) is twice continuously differentiable, then: \[ df(t, X_t) = \left( \frac{\partial f}{\partial t} + \mu \frac{\partial f}{\partial x} + \frac{1}{2} \sigma^2 \frac{\partial^2 f}{\partial x^2} \right) dt + \sigma \frac{\partial f}{\partial x} dW_t \]
This formula is essential when analyzing the reverse diffusion SDEs used in score-based generative modeling.
4. Relevance to Diffusion Models
In machine learning, particularly in generative models like DDPMs and score-based models, we define a forward diffusion process: \[ dx = f(x, t)\,dt + g(t)\,dW_t \] which corrupts data with noise, and we aim to reverse it using a learned score function: \[ dx = \left[f(x, t) - g(t)^2 \nabla_x \log p_t(x) \right] dt + g(t)\,d\tilde{W}_t \] This reverse SDE is solved using numerical approximations or denoising networks.
5. Closing Thoughts
Understanding Brownian motion and Itô calculus is foundational for grasping how noise injection and removal work in diffusion models. These concepts also open doors to future research, such as using alternative stochastic processes (e.g., Lévy processes) or improving sampling through better discretization techniques.
If you’re interested in more technical details or applications in generative modeling, feel free to reach out or check back for part 2, where we’ll dive into Fokker–Planck equations and score matching!