# Gaussian tail bounds

812 words

## One-dimensional tail bounds

The standard normal distribution $N(0,1)$ has probability density function $\frac{1}{\sqrt{2\pi}}e^{-x^2/2}$. There is no way to integrate this function symbolically in a nice way, but we do at times want to (upper) bound expressions of the form $$\frac{1}{\sqrt{2\pi}}\int_x^\infty e^{-t^2/2} \mathrm{d}t.$$ How can we do this?

One way is to follow this approach. Since $t\geq x$ everywhere, we can upper bound $$\frac{1}{\sqrt{2\pi}}\int_x^\infty e^{-t^2/2} \mathrm{d}t \leq \frac{1}{\sqrt{2\pi}}\int_x^\infty \frac{t}{x} e^{-t^2/2} \mathrm{d}t = \frac{1}{x\sqrt{2\pi}}e^{-x^2/2}.$$

There is another tail bound which is a bit weaker for large $x$, but I like the proof better. We’ll give a tail bound by looking at the moment-generating function $\lambda \mapsto \mathbb{E}[e^{\lambda X}]$, where $X \sim N(0,1)$ is our normally distributed random variable. We can explicitly calculate this expectation and find $$\mathbb{E}[e^{\lambda X}] = \frac{1}{\sqrt{2\pi}} \int_{-\infty}^\infty e^{\lambda x - x^2/2}\mathrm{d}x = \frac{1}{\sqrt{2\pi}}e^{\lambda^2/2}\int_{-\infty}^\infty e^{-(x-\lambda)^2/2}\mathrm{d}x.$$ The last term is just the entire Gaussian integral shifted a bit and hence $\mathbb{E}[e^{\lambda X}] = e^{\lambda^2/2}$ Now we use Chernoff’s bound (an easy corrollary of Markov’s inequality) to find $\mathbb{P}[X \geq t] \leq \mathbb{E}[e^{\lambda X}]e^{-\lambda t},$ which we can now minimize over the choice of $\lambda$, setting $\lambda=t$, and we conclude that $\mathbb{P}[X \geq t] \leq e^{-t^2/2}$.

## Multi-variate Gaussians

Let $X \in \mathbb{R}^d$ be $N(0,I_d)$ normally distributed, i.e., $X$ is a vector with iid Gaussian $N(0,1)$ entries. What tail bounds do we get on $\|X\|$? We start off with Markov’s inequality again. $$\mathbb{P}[\|X\| > t] = \mathbb{P}[e^{\lambda\|X\|^2} > e^{\lambda t^2}] \leq \frac{\mathbb{E}[e^{\lambda\|X\|^2}]}{e^{\lambda t^2}}.$$

Deriving the moment generating function $\lambda \mapsto \mathbb{E}[e^{\lambda\|X\|^2}]$ of $X^2$ is an elementary calculation. $$\int_{-\infty}^\infty e^{\lambda x^2} \cdot e^{-x^2/2} \mathrm{d}x = \int_{-\infty}^\infty e^{\frac{-x^2}{2(\sqrt{1-2/\lambda})^2}}\mathrm{d}x = \frac{\sqrt{2\pi}}{\sqrt{1-2\lambda}}.$$

The coordinates of $X$ are iid, so $\mathbb{E}[e^{\lambda\|X\|^2}] = \mathbb{E}[e^{\lambda X_1^2}]^d = (1-2\lambda)^{-d/2}.$ The minimizer is at $\lambda=(1-1/t^2)/2$, and we find, requiring $t \geq 1$ for the last inequality,$$\mathbb{P}[\|X\| > t] \leq e^{-d(t^2-2\log t - 1)/2} \leq e^{-d(t-1)^2}.$$

## Operator norm of Gaussian matrices

The operator norm or spectral norm of a $n \times n$ matrix $M$ is defined as $$\|M\| := \max_{x \in \mathbb{R}^n} \frac{\|Mx\|}{\|x\|}.$$

Now if $M$ were a matrix with every entry independently $N(0,1)$, what would the largest singular value of this random Gaussian matrix be? I’ll give an easy tail bound based on a net argument.

An $\eta$-net, $\eta > 0$, on the sphere is a subset $N \subset \mathbb{S}^{d-1}$ such that for every point $x \in \mathbb{S}^{d-1}$ there is a net element $n \in N$ such that $\|x-n\| \leq \eta$, but every two net elements are at distance at least $\eta$ from each other. A greedy algorithm can construct an $\eta$-net, and any $\eta$-net has size at most $(4/\eta)^d$. 1See e.g., Jiří Matoušek, Lectures on Discrete Geometry (Springer, 2002), page 314. The proof is based on a simple packing argument where balls of radius $\eta/2$ around each net element have to fit disjointly inside the ball of radius $1+\eta/2 \leq 1$ centered at the origin.

Now let $N\subset \mathbb{S}^{d-1}$ be a $1/2$-net. By the above, the size of the net is bounded by $|N| \leq 8^d$.

The function $x \mapsto \|Mx\|$ is $\|M\|$-Lipschitz. Hence we can bound $$\|M\| \leq \max_{x\in\mathbb{S}^{d-1}} \min_{\omega \in N} \|M\omega\| + \|M\|\cdot\|x-\omega\| \leq \max_{x\in\mathbb{S}^{d-1}} \min_{\omega \in N} \|M\omega\| + \|M\|/2.$$ So we have now proved that $\|M\| \leq 2\max_{\omega\in N} \|M\omega\|$.

Now, as $M\omega$ is $N(0,I_d)$ normally distributed for any $\omega\in\mathbb{S}^{d-1}$, we can use the union bound over all points of $N$ and conclude that, for all $t \geq 1$, $$\mathbb{P}[\|M\| \geq 2t\sqrt{d}] \leq 8^d e^{-d(t-1)^2/2}.$$

## Maximum of $n$ Gaussians

The distribution of the maximum $\mathbb{P}[\max_{i \leq n} X_i \geq t]$ of $n$ independent identically distributed variables $X_1,\ldots,X_n \sim N(0,1)$ is, up to a constant factor, tight with the union bound $$\mathbb{P}[\max_{i \leq n} X_i \geq t] \leq ne^{-t^2/2}.$$

From this, we can find that the expected maximum is $\mathbb{E}[\max_{i \leq n} X_i] = O(\sqrt{\ln n})$. We derive this by bounding the integral $$\mathbb{E}[\max_{i \leq n} X_i] = \int_0^\infty \mathbb{P}[\max_{i \leq n} X_i \geq t] {\rm d}t.$$ Split the integral at $t = \sqrt{\ln n}$, bound the integrand in the first part by 1 (it is a probability) and in the second part by $ne^{-t^2/2}.$

## Average width of the simplex

Let $x_1,\dots,x_{d+1} \in \mathbb{R}^d$ be the vertices of a regular simplex such that $\|x_i\| = 1$ for all $i \in [d+1]$. If $\omega \in \mathbb{S}^{d-1}$ is chosen uniformly at random, the expectation of the difference $\max_{i,j\in[d+1]} |\omega^{\mathsf{T}}(x_i-x_j)|$ is called the average width of the simplex. We can bound this up to a constant factor using our knowledge of Gaussians. Let $H_t := \{y\in\mathbb{R}^d : \omega^{\mathsf{T}}y = t\}$. The $d-2$-dimensional volume of $H_t\cap \mathbb{S}^{d-1}$ is $(1-t^2)^{(d-1)/2}$ times the volume of $\mathbb{S}^{d-2}$ by Pythatoras’ theorem. Recalling that $(1+1/\lambda)^\lambda \approx e$, you can prove that the distribution of $\omega^{\mathsf{T}}x_i$ is approximately $N(0,1/\sqrt{d-1})$. The Gaussian tail bound now says that the average width of the simplex is $O(\frac{\sqrt{\ln d}}{\sqrt d}).$