## Pacific Northwest Probability Seminar

The Twenty First Northwest Probability Seminar
October 19, 2019
Supported by the University of Oregon, Pacific Institute for the Mathematical Sciences (PIMS), and Milliman Fund.

The Birnbaum Lecture in Probability will be delivered by Dana Randall (Georgia Institute of Technology) in 2019.

Northwest Probability Seminars are mini-conferences held at the University of Washington and organized in collaboration with the Oregon State University, the University of British Columbia and the University of Oregon. There is no registration fee.

The Scientific Committee for the NW Probability Seminar 2019 consists of Omer Angel (U British Columbia), Chris Burdzy (U Washington), Zhenqing Chen (U Washington), Yevgeniy Kovchegov (Oregon State U) and David Levin (U Oregon).

HOTEL INFORMATION

Hotels near the University of Washington.

The talks will take place in Mary Gates Hall 241 (see the map).

Parking on UW campus is free on Saturdays only after noon. See parking information.

### Schedule

• 10:00 - 10:50 Coffee Mary Gates Commons
• 10:50 - 11:35 Jerry Li, Microsoft Research
Quantum entropy scoring for fast robust mean estimation and improved outlier detection
We study two problems in high-dimensional robust statistics: robust mean estimation and outlier detection. In robust mean estimation the goal is to estimate the mean $\mu$ of a distribution on $\mathbb{R}^d$ given $n$ independent samples, an epsilon fraction of which have been corrupted by a malicious adversary. In outlier detection the goal is to assign an outlier score to each element of a data set such that elements more likely to be outliers are assigned higher scores. Our algorithms for both problems are based on a new outlier scoring method we call QUE-scoring based on quantum entropy regularization. For robust mean estimation, this yields the first algorithm with optimal error rates and nearly-linear running time $\tilde O(nd)$ in all parameters. For outlier detection, we evaluate the performance of QUE-scoring via extensive experiments on synthetic and real data, and demonstrate that it often performs better than previously proposed algorithms.
Joint work with Yihe Dong and Samuel B. Hopkins.
• 11:40 - 12:25 Sayan Banerjee, University of North Carolina, Chapel Hill
Non-parametric change point detection in growing networks
Motivated by applications of modeling both real world and probabilistic systems such as recursive trees, the last few years have seen an explosion in models for dynamically evolving networks. In this talk, we consider models of growing networks which evolve via new vertices attaching to the pre-existing network according to one attachment function $f$ till the system grows to size $\tau(n) < n$, when new vertices switch their behavior to a different function $g$ till the system reaches size $n$. We explore the effect of this change point on the evolution and final degree distribution of the network. In particular, we consider two cases, the standard model where $\tau(n) = \gamma n$ as well as the quick big bang model when $\tau(n) = n^ \gamma$ for some $0 < \gamma < 1$. In the former case, we obtain deterministic "fluid limits" to track the degree evolution in the sup-norm metric and in the latter case, we show that the effect of the pre-change point dynamics "washes out" when the network reaches size $n$. We also devise non-parametric, consistent estimators to detect the change point. Our methods exploit and develop new techniques connecting inhomogeneous continuous time branching processes (CTBP) to the evolving networks.
Joint work with Shankar Bhamidi and Iain Carmichael.
• 12:30 - 2:00 Lunch, catered, Mary Gates Commons
• 2:00 - 2:50 Dana Randall, Georgia Institute of Technology
"Birnbaum Lecture": Phase transitions and emergent phenomena in algorithms and applications
Markov chain Monte Carlo methods have become ubiquitous across science and engineering as a means of exploring large configuration spaces. The idea is to walk among the configurations so that even though you explore a very small part of the space, samples will be drawn from a desirable distribution. Over the last 20 years there have been tremendous advances in the design and analysis of efficient sampling algorithms for this purpose, largely building on insights from statistical physics. One of the striking discoveries has been the realization that many natural Markov chains undergo a phase transition where they change from being efficient to inefficient as some parameter of the system is varied. In this talk we will explore this phenomenon, and show how insights from computing and probability reveal interesting results for asynchronous models of programmable matter.
• 3:00 - 3:55 Coffee Mary Gates Commons
• 3:55 - 4:40 Delphin Sénizergues, University of British Columbia
Geometry of the $\alpha$-stable component
For $\alpha \in (1,2]$, the $\alpha$-stable component arises as the universal scaling limit of a connected component of a critical random graphs with i.i.d. degrees having a given $\alpha$-dependent power-law tail behavior. In this talk, I will try to explain how this object arises in (Conchon-Kerjan, Goldschmidt 2019+), where it is constructed from a biased version of the $\alpha$-stable tree, by gluing a certain number of leaves along their paths to the root. I will then expose some of the results from (Goldschmidt, Haas, Sénizergues 2019+) where we understand the distribution of the object through its discrete "random finite-dimensional marginals". It allows us in particular to describe the object as a gluing of rescaled $\alpha$-stable trees along some random discrete multigraph.
Joint work with Christina Goldschmidt and Bénédicte Haas.
• 4:45 - 5:30 Peter Ralph, University of Oregon
Whole genomes and whole genealogies: progress and challenges with tree sequences
Genetic variation is shaped by genealogical relatedness, and inferring these genealogies would be an invaluable aid in extracting information from modern genomic datasets. I will describe a stochastic process that creates the genealogies, and introduce the succinct tree sequence, a means of storing and processing whole genealogies and genomes for large populations. Making use of the genealogies enables logarithmic compression, computation, and simulation, speeding up generation of and operations on hundreds of thousands of whole genomes by several orders of magnitude. I will also describe recent progress in inferring tree sequences from real data - i.e., learning the ancestry of everyone.
• 6:15 No-host dinner.
• Restaurant: Mamma Melina. There will be set menu, the first menu on this list. The cost per person will be 37 dollars (this includes tax and gratuity). Please bring cash. There will be a vegetarian option. Wine and coffee can be ordered and paid for individually (cash only). Address: 5101 25th Ave NE, Seattle, WA 98105. See Google map. The restaurant is a 20 minute walk from Mary Gates Hall.