Universal Approximation Theorems
2026.02.13.
Abstract
This post discusses various theorems on the approximation capabilities of neural networks. These results are known as universal approximation theorems (UATs).
Notations: Let $\mathbb{N}$ and $\mathbb{R}$ denote the set of natural numbers and real numbers, respectively. Matrices and vectors are denoted by boldface letters $(\mathbf{A}, \mathbf{a})$, scalars by normal font $(A,a)$, and sets by blackboard bold font $(\mathbb{A})$. The $p-$norm of $\mathcal{l}^p$ norm of a vector $\mathbf{x}\in\mathbb{X}\subseteq \mathbb{R}^n$ is defined as: $$ \| \mathbf{x} \|_p = \left( \sum_{i=1}^n |x_i|^p \right)^{1/p}, \quad 1 \leq p < \infty $$ The $L^p$ norm of a function $f:\mathbb{X}\to\mathbb{R}$ is defined as: $$ \| f \|_{L^p} = \left( \int_{\mathbb{X}} \|f(\mathbf{x})\|_p^p d\mu(\mathbf{x}) \right)^{1/p}, \quad 1 \leq p < \infty $$ Lebesgue spaces or $\mathbb{L}^p$ spaces are function spaces containing measurable functions $f$ satisfying $\| f \|_{L^p} < \infty$. A subset $\mathbb{X}$ of $\mathbb{R}^n$ is said to be compact if it is closed and bounded. A subset $\mathbb{S}$ of a topological space $\mathbb{T}$ is said to be dense in $\mathbb{T}$, if every element of $\mathbb{T}$ is either in $\mathbb{S}$ or a limit point of $\mathbb{S}$.