Diffusion Models: Prerequisites, Theory, and Applications


Abstract

Diffusion models have emerged as a powerful class of generative models, achieving state-of-the-art performance in various tasks such as image synthesis, text-to-image generation, and video generation. This study provides a comprehensive overview of diffusion models, covering their theoretical foundations, practical implementations, and applications across different domains.


Study Plan

Part 1: Mathematical Foundation
  • Stochastic processes, Brownian motion, Itô calculus
  • SDEs and Fokker–Planck equations (forward & backward)
  • Girsanov theorem and time reversal
Part 2: Core Theory of Diffusion Models
  • Score matching (denoising, score estimation)
  • DDPM (Ho et al.), Score-based models (Song et al.)
  • Reverse-time SDEs, ODE formulation
  • Latent diffusion, Conditional Generation
Part 3: Innovations & Acceleration
  • Consistency models, Flow matching, stochastic interpolants
  • Schrödinger bridge & optimal transport formulation
  • Diffusion bridges, functional diffusion processes
Part 4: Applications
  • Protein folding/design (e.g., AlphaFold, RFdiffusion)
  • DNA/RNA generation
  • Text-to-image (DALLE, Imagen, SD)
  • Text-to-3D, multimodal alignment with LLMs
  • Diffusion + LLM
  • Diffusion + RL (diffusion policy)
Part 5: Future Directions
  • Research Questions

Implementations


References