STSCI6940: Readings & Research in High Dimensional Statistics

Instructor: Ahmed El Alaoui
Meeting time: Tuesday-Thursday 2:55pm-4:10pm
Meeting location: Surge B 159
Office hours: Wednesday 1-2pm, or by appointment.

Description: This course will survey a selection of topics in modern high-dimensional statistics. We will start with the core theory of probability in high dimension, including concentration inequalities, Gaussian processes and elements of non-asymptotic random matrix theory. We will then turn our attention to analyzing statistical estimation problems such as principal component analysis, covariance matrix estimation, sparse linear estimation, non-parametric linear regression and reproducing kernel Hilbert spaces. We will finish the course with a survey of the recent theory on the performance of neural networks.

We will rely on the following excellent sources:

High-dimensional Probability by R. Vershynin.
High-dimensional Statistics by M. Wainwright.
Six Lectures on Linearized Neural Networks by T. Misiakiewicz and A. Montanari.

Evaluation: Students will be asked to scribe at least one lecture, and give a lecture towards the end of the semester. The students will also be asked to complete three homeworks.

Prerequisites: A graduate or advanced undergraduate level course in probability and real analysis is highly recommended.

Schedule:

Lecture 1 (01/23): Introduction.

Lecture 2 (01/25): Approximate Carathéodory's theorem, covering number of a polytope, sub-Gaussian r.v.'s, Hoeffding's inequality.

Lecture 3 (01/30): Sub-Exponential random variables, Bernstein's inequality, the Johnson-Lindestrauss lemma.

Lecture 4 (02/1): Bounded difference inequality, concentration of Lipschitz functions of Gaussian r.v.'s.

Lecture 5 (02/6): Concentration via isoperimetry, covering and packing numbers.

Lecture 6 (02/8): Spectral norm of random matrices with independent entries, application to the stochastic block model.

Lecture 7 (02/13): Spectral norm of matrices with independent rows, application to covariance matrix estimation.

Lecture 8 (02/15): Principal component analysis, estimation rates in the spiked model.

Lecture 9 (02/20): Sparse principal component analysis, estimation rates in the spiked model.

Lecture 10 (02/22): Matrix concentration inequalities, the matrix Chernoff bound.

Lecture 11 (02/29): Matrix concentration inequalities, the matrix Bernstein bound.

Lecture 12 (03/05): Applications of matrix Bernstein: covariance matrix estimation, community detection.

Lecture 13 (03/07): Gaussian comparison inequalities. Slepian, Sudakov-Fernique, and Sudakov's minoration theorem.

Lecture 14 (03/12): Chaining, Dudley's bound, and Talagrand's majorization theorem.

Lecture 15 (03/14): Uniform matrix deviation inequalities. The intersection size of a set with a random subspace.

Lecture 16 (03/19): Applications to compressed sensing and sparse linear estimation.

Lecture 17 (03/21): Reproducing Kernel Hilbert Spaces.

Lecture 18 (03/26): The representer theorem, kernel ridge regression.

Lecture 19 (03/28): Non-parametric least-squares, rates of estimation via metric entropy.

Lecture 20 (04/09): Theory of linearized neural networks I: infinite width.

Lecture 21 (04/11): Theory of linearized neural networks II: finite width.