Gaussian tail bounds

787 words

This is all basic stuff, but at the moment it is too hard find through Google and sometimes I just want to know the answer without having to derive it on the spot. Hence, this post. I hope I’m not ruining anyone’s course’s hand-in exercises.

The standard normal distribution N(0,1) has probability density function \frac{1}{\sqrt{2\pi}}e^{-x^2/2}. There is no way to integrate this function symbolically in a nice way, but we do at times want to (upper) bound expressions of the form \frac{1}{\sqrt{2\pi}}\int_x^\infty e^{-t^2/2} \mathrm{d}t. How can we do this?

One way is to follow this approach. Since t\geq x everywhere, we can upper bound \frac{1}{\sqrt{2\pi}}\int_x^\infty e^{-t^2/2} \mathrm{d}t \leq \frac{1}{\sqrt{2\pi}}\int_x^\infty \frac{t}{x} e^{-t^2/2} \mathrm{d}t = \frac{1}{x\sqrt{2\pi}}e^{-x^2/2}.

There is another tail bound which is a bit weaker for large x, but I like the proof better. We’ll give a tail bound by looking at the moment-generating function \lambda \mapsto \mathbb{E}[e^{\lambda X}], where X \sim N(0,1) is our normally distributed random variable. We can explicitly calculate this expectation and find \mathbb{E}[e^{\lambda X}] = \frac{1}{\sqrt{2\pi}} \int_{-\infty}^\infty e^{\lambda x - x^2/2}\mathrm{d}x = \frac{1}{\sqrt{2\pi}}e^{\lambda^2/2}\int_{-\infty}^\infty e^{-(x-\lambda)^2/2}\mathrm{d}x. The last thing is just the entire Gaussian integral shifted a bit and hence \mathbb{E}[e^{\lambda X}] = e^{\lambda^2/2} Now we use Chernoff’s bound (an easy corrollary of Markov’s inequality) to find \mathbb{P}[X \geq t] \leq \mathbb{E}[e^{\lambda X}]e^{-\lambda t}, which we can now minimize over the choice of \lambda, setting \lambda=t, and we conclude that \mathbb{P}[X \geq t] \leq e^{-t^2/2}.

Multi-variate Gaussians

Now X \in \mathbb{R}^d is N(0,I_d) normally distributed, i.e., X is a vector with iid Gaussian N(0,1) entries. What tail bounds do we get on \|X\|? We start off with Markov’s inequality again. \mathbb{P}[\|X\| > t] = \mathbb{P}[e^{\lambda\|X\|^2} > e^{\lambda t^2}] \leq \frac{\mathbb{E}[e^{\lambda\|X\|^2}]}{e^{\lambda t^2}}.

Deriving the moment generating function \lambda \mapsto \mathbb{E}[e^{\lambda\|X\|^2}] of X^2 is an elementary calculation. \int_{-\infty}^\infty e^{\lambda x^2} \cdot e^{-x^2/2} \mathrm{d}x = \int_{-\infty}^\infty e^{\frac{-x^2}{2(\sqrt{1-2/\lambda})^2}}\mathrm{d}x = \frac{\sqrt{2\pi}}{\sqrt{1-2\lambda}}.

The coordinates of X are iid, so \mathbb{E}[e^{\lambda\|X\|^2}] = \mathbb{E}[e^{\lambda X_1^2}]^d = (1-2\lambda)^{-d/2}. The minimizer is at \lambda=(1-1/t^2)/2, and we find, requiring t \geq 1 for the last inequality,\mathbb{P}[\|X\| > t] \leq e^{-d(t^2-2\log t - 1)/2} \leq e^{-d(t-1)^2}.

Operator norm of Gaussian matrices

The operator norm or spectral norm of a n \times n matrix M is defined as \|M\| := \max_{x \in \mathbb{R}^n} \frac{\|Mx\|}{\|x\|}.

Now if M were a matrix with every entry independently N(0,1), what would the largest singular value of this random Gaussian matrix be? I’ll give an easy tail bound based on a net argument.

An \eta-net, \eta > 0, on the sphere is a subset N \subset \mathbb{S}^{d-1} such that, for every point x \in \mathbb{S}^{d-1} there is a net element n \in N such that \|x-n\| \leq \eta but every two net elements are at distance at least \eta from each other. A greedy algorithm can construct an \eta-net, and any \eta-net has size at most (4/\eta)^d.1

Now let N\subset \mathbb{S}^{d-1} be a 1/2-net. By the above, the size of the net is bounded by |N| \leq 8^d.

The function x \mapsto \|Mx\| is \|M\|-Lipschitz. Hence we can bound \|M\| \leq \max_{x\in\mathbb{S}^{d-1}} \min_{\omega \in N} \|M\omega\| + \|M\|\cdot\|x-\omega\| \leq \max_{x\in\mathbb{S}^{d-1}} \min_{\omega \in N} \|M\omega\| + \|M\|/2. So we have now proved that \|M\| \leq 2\max_{\omega\in N} \|M\omega\|.

Now, as M\omega is N(0,I_d) normally distributed for any \omega\in\mathbb{S}^{d-1}, we can use the union bound over all points of N and conclude that, for all t \geq 1, \mathbb{P}[\|M\| \geq 2t\sqrt{d}] \leq 8^d e^{-d(t-1)^2/2}.

Maximum of n Gaussians

The distribution of the maximum \mathbb{P}[\max_{i \leq n} X_i \geq t] of n independent identically distributed variables X_1,\ldots,X_n \sim N(0,1) is, up to a constant factor, tight with the union bound \mathbb{P}[\max_{i \leq n} X_i \geq t] \leq ne^{-t^2/2}.

Hence the expected maximum is \mathbb{E}[\max_{i \leq n} X_i] = O(\sqrt{\ln n}).

Average width of the simplex

Let x_1,\dots,x_{d+1} \in \mathbb{R}^d be the vertices of a regular simplex such that \|x_i\| = 1 for all i \in [d+1]. If \omega \in \mathbb{S}^{d-1} is chosen uniformly at random, the difference \max_{i,j\in[d+1]} |\omega^{\mathsf{T}}(x_i-x_j| is called the average width of the simplex. We can bound this up to a constant factor using our knowledge of Gaussians. Let H_t := \{y\in\mathbb{R}^d : \omega^{\mathsf{T}}y = t\}. The d-2-dimensional volume of H_t\cap \mathbb{S}^{d-1} is (1-t^2)^{(d-1)/2} times the volume of \mathbb{S}^{d-2} by Pythatoras’ theorem. Recalling that (1+1/\lambda)^\lambda \approx e, you can prove that the distribution of \omega^{\mathsf{T}}x_i is approximately N(0,1/\sqrt{d-1}). The Gaussian tail bound now says that the average width of the simplex is O(\frac{\sqrt{\ln d}}{\sqrt d}).

  1. See e.g., Jiří Matoušek, Lectures on Discrete Geometry (Springer, 2002), page 314. The proof is based on a simple packing argument where balls of radius \eta/2 around each net element have to fit disjointly inside the ball of radius 1+\eta/2 \leq 1 centered at the origin. []

Leave a Reply