Diffusion Models: Prerequisites, Theory, and Applications

Abstract

Diffusion models have emerged as a powerful class of generative models, achieving state-of-the-art performance in various tasks such as image synthesis, text-to-image generation, and video generation. This study provides a comprehensive overview of diffusion models, covering their theoretical foundations, practical implementations, and applications across different domains.

Study Plan

Part 1: Mathematical Foundation

Stochastic processes, Brownian motion, Itô calculus
SDEs and Fokker–Planck equations (forward & backward)
Girsanov theorem and time reversal

Part 2: Core Theory of Diffusion Models

Score matching (denoising, score estimation)
DDPM (Ho et al.), Score-based models (Song et al.)
Reverse-time SDEs, ODE formulation
Latent diffusion, Conditional Generation

Part 3: Innovations & Acceleration

Consistency models, Flow matching, stochastic interpolants
Schrödinger bridge & optimal transport formulation
Diffusion bridges, functional diffusion processes

Part 4: Applications

Protein folding/design (e.g., AlphaFold, RFdiffusion)
DNA/RNA generation
Text-to-image (DALLE, Imagen, SD)
Text-to-3D, multimodal alignment with LLMs
Diffusion + LLM
Diffusion + RL (diffusion policy)

Part 5: Future Directions

Research Questions

Implementations

Training a Diffusion Model on CIFAR-10