MATH 581: Mathematics of Data Science

**Instructor:** Dmitriy Drusvyatskiy

**Lecture:** MW 1:00-2:20 PM

**Office hours:** F 12:00-1:00 PM and by appointment

**Office:** PDL C-434

**Email:** ddrusv at uw.edu

**lecture time:** Monday and Wednesday 1:00-2:20 PM

**lecture location:** Mary Gates Hall (MGH) 238

Description: This is a three-term sequence covering the fundamentals of mathematics of data science. The basic question we will address is how does one “optimally” extract information from high dimensional data. The course will cover material from a few different fields, most notably high dimensional probability, statistical and machine learning, and optimization.

We will use a number of different texts:

Roman Vershynin, “High-dimensional probability: An introduction with applications in data science.”

Martin J. Wainwright, “High-dimensional statistics: A non-asymptotic viewpoint.”

Sebastian Bubeck, “Convex Optimization: Algorithms and Complexity.”

Shai Shalev-Shwartz, Shai Ben-David, “Understanding Machine Learning: From Theory to Algorithms.”

Biweakly problem sets with solutions

**written in LaTex**(100% of the grade). Since, there is no TA for this course, the grading will be on the scale:0 (did not much work, if any),

1 (did at least 50% of the homework),

2 (did at least 80% of the homework).

You may work together on problem sets, **but you must write up your own solutions**. You must also cite any resources which helped you obtain your solutions.

Chapter 1: Concentration Inequalities

Introduction to the curse of dimensionality (Monte-Carlo simulation, Empirical Risk Minimization) (Lecture 1)

Review of basic probability and limit theorems (Lecture 2)

Chernoff Bound and SubGaussian random variables (Lecture 3)

Hoeffding and Chernoff inequalities; applications (robust mean estimation with median of means, approximate regularity of Erdos-Renyi graphs) (Lecture 4, Lecture 5)

Sub-Exponential Random Variables and Bernstein's Inequality; applications (dimensionality reduction with Johnson-Lindenstrauss lemma) (Lecture 6)

Fast Johnson-Lindenstrauss transformation (Lecture 7/8)

Functions of independent random variabels: McDiarmid inequality and Gaussian concentration (Lecture 9/10) (Nice reference to Talagrant and Gaussian concentration inequalities)

Concentration, moments, and Orlicz norms (Lecture 11)

Chapter 2: Random Vectors in High Dimensions (Lecture 12/13)

Concentration of the norm

Isotropic Distributions

Similarity of Normal and Spherical Distributions

Sub-Gaussian and Sub-Exponential Random Vectors

Chapter 3: Introduction to Statistical Inference (Lecture 14/15)

Mean square error and the Bias-Variance decomposition

Maximum Likelyhood Estimation and examples

Chapter 4: Linear Regression (Lecture 16/17)

Ordinary least squares

Ridge regression

Chapter 5: Uniform Laws and Introduction to Statistical Learning (Lecture 18-20)

Uniform law via Rademacher complexity

Glivenko-Cantelli Theorem

Upper bounds on Rademacher complexity

Sample complexity of binary classification with VC dimension

Learning linear classes

Stable learning rules generalize

Learning without uniform convergence with convex losses

Chapter 6: Metric entropy and Matrix Concentration (Lecture 21-31)

Nets, Covering numbers, and the metric entropy

Eigenvalaues and Singular values of sub-Gaussian random matrices

Matrix concentration

Controlling the operator norm of a random matrix by row/column norms

Applications: community detection, (sparse) covariance estimation, matrix completion; See also the paper "Clustering mixtures with unknown covariances"

Chapter 7: Learning with kernels (Lecture 31-35)

Motivation: the Kernel trick and dimension-free generalization

Introduction to Hilbert spaces

Representer Theorem

Positive definite kernels and the reproducing kernel Hilbert space (RKHS)

Moore-Aronszajn theorem and continuity of point evaluations

Translation-invariant kernels

Generalization properties

Chapter 8: Optimization for learning (Lecture 36-45)

Two paradigms: empirical risk minimization vs stochastic approximation

Illustration: gradient descent for least squares

Convexity: an interlude

Gradient descent

Accelerated gradient descent

Projected subgradient method

Minimax lower bound for deterministic convex optimization

Stochastic gradient method and Polyak-Juditsky averaging

Stochastic variance reduced gradient (SVRG)

Chapter 9: Suprema of sub-Gaussian processes (Notes)

Motivation

Examples of sub-Gaussian processes

Dudley's entropy integral and chaining

From metric entropy to Gaussian width

Sudakov-Fernique inequality

Gordon's theorem and the matrix deviation inequality

Johnson Lindenstrauss lemma for infinite sets

-bound and escape through the mesh

Chapter 11: Sparse recovery (Notes)

High dimensional signal recovery

signal recovery based on M* bound

exact recovery based on the escape theorem

deterministic conditions for recovery

recovery with noise (LASSO/BPDN)

Chapter 10: Minimax lower bounds for estimation (Notes)

Minimax risk

Reduction to hypothesis testing

Le Cam's method based on binary testing

Interlude: Deviations between probability distributions

Applications: lower bounds for mean estimation and linear regression

Chapter 12: Overparametrized models and the Polyak-Łojasiewicz (PL) inequality

Polyak's inequality and linear convergence of gradient descent (Notes)

Lojasiewicz inequality for analytic functions and finite length of gradient orbits

Kurdyka and desingularization of tame functions

Some extentions to nonsmooth functions

Introduction to neural networks

Transition to linearity and the PL condition for wide neural networks (Paper 1, Paper 2)

HW 1 (due Friday October 20th (slide it under my door)):

Reading: Vershynin (Chapter 2, Sections 3.1-3.4)

Optional Reading: Wainwright (Chapter 1, Chapter 2) and Vershynin (Sections 3.5-3.8)

Exercises: Vershynin (1.3.3, 2.2.10, 2.5.10, 2.5.11, 3.3.5, 3.4.4)

HW 2 (due Friday November 10th (slide it under my door)):

Exercises: Vershynin (2.4.4, 2.5.7, 2.5.9, 2.6.5, 2.6.6, 3.3.3 )

HW 3 (due Friday December 1st (slide it under my door)):

Reading: Wainwright (Chapter 4)

Exercises: (HW3)

HW 4 (due Friday February 2nd (slide it under my door)):

Reading: Vershynin (Chapter 4)

Exercises: Vershynin (4.1.4, 4.4.6, 4.4.7, 4.5.2, 4.6.2, 4.7.6)

HW 5 (due Friday February 23rd (slide it under my door)):

Reading: Vershynin (Chapter 5.4), Wainwright (Chapter 6)

Exercises: Vershynin (the version at this link Book) (5.4.5 (f)-(g), 5.4.11, 5.4.13, 5.4.15)

HW 6 (due Friday April 3rd (slide it under my door)):

Exercises: 3.3, 3.5, 3.15, 5.12, 5.13, 5.14, 5.15 in the (text)