# Distinguished Seminar in Optimization & Data

This is an interdepartmental talks series at the University of Washington,
focused on all aspects of optimization and data science.
It is a new incarnation of the CORE seminar
that ran from 2013-2020.

The series is funded by the Institute for the Foundations of
Data Science , the Pacific Institute of Mathematical Sciences (PIMS), and the home departments of the organizers.

### Organizers:

Sasha Aravkin, Applied Mathematics
Dima Drusvyatskiy, Mathematics
Maryam Fazel, Electrical Engineering & Computer Engineering * (current co-chair) *
Zaid Harchaoui, Statistics
Bamdad Hosseini, Applied Math
Kevin Jamieson, Computer Science & Engineering
Thomas Rothvoss, Mathematics & Computer Sience & Engineering
Rekha Thomas, Mathematics * (current co-chair) *
Cynthia Vinzant, Mathematics

## Amitabh Basu, Johns Hopkins University

* Monday June 3, 2024, 3:30 pm *

* Gates Commons (CSE 691) Allen Center *

### Information complexity of mixed-integer convex optimization

** Abstract**:We consider the problem of minimizing a convex
function under convex constraints, where some or all of the decision
variables are constrained to be integer, with access to first-order
oracles for the objective function and separation oracles for the
constraints. We focus on the information complexity of the problem,
i.e., the minimum number of oracle calls to solve the problem to a
desired accuracy. We will present nearly tight upper and lower bounds
for the problem, extending classical results from continuous convex
optimization. We also initiate the study of, and obtain the first set
of results on, information complexity under oracles that only reveal
partial first-order information, e.g., where one can only make a
binary query on the function value or (sub)gradient at a given
point. Our bounds in these new settings show that these oracles are
provably less informative than full first-order oracles for the
purpose of optimization. We will close the talk with some open
questions.

** Bio**:Amitabh Basu is Professor of Applied Mathematics and
Statistics at Johns Hopkins University. He received his Ph.D. from
Carnegie Mellon University in 2010 and did postdoctoral work in the
Dept. of Mathematics at the University of California, Davis from
2010-2013. His main research interests are in mathematical
optimization and its applications, with an emphasis on problems with a
combinatorial or discrete flavor. He serves on the editorial boards of
Mathematics of Operations Research, Mathematical Programming, SIAM
Journal on Optimization and the MOS-SIAM Series on Optimization. His
work has been recognized by the NSF Career award and the Egon Balas
Prize from the INFORMS Optimization Society.

* Monday October 14, 2024 *

* Monday November 25, 2024 *

* Monday January 27, 2025 *

* Monday April 14, 2025 *

* Monday May 19, 2025 *

## Past Talks:

* Monday May 6, 2024, 3:30 pm *

* Gates Commons (CSE 691) Allen Center *

### Complexity of Finding Local Minima in Continuous Optimization

** Abstract**: Can we efficiently find a local minimum of a
nonconvex continuous optimization problem? We give a rather complete
answer to this question for optimization problems defined by
polynomial data. In the unconstrained case, the answer remains
positive for polynomials of degree up to three: We show that while the
seemingly easier task of finding a critical point of a cubic
polynomial is NP-hard, the complexity of finding a local minimum of a
cubic polynomial is equivalent to the complexity of semidefinite
programming. In the constrained case, we prove that unless P=NP, there
cannot be a polynomial-time algorithm that finds a point within
Euclidean distance c^n (for any constant c >= 0) of a local minimum of
an n-variate quadratic polynomial over a polytope. This result (with
c=0) answers a question of Pardalos and Vavasis that appeared on a
list of seven open problems in complexity theory for numerical
optimization in 1992. Based on joint work with Jeffrey Zhang (Yale).

** Bio**: Amir Ali Ahmadi is a Professor at the Department of
Operations Research and Financial Engineering at Princeton University
and an Associated Faculty member of the Program in Applied and
Computational Mathematics, the Department of Computer Science, the
Department of Mechanical and Aerospace Engineering, the Department of
Electrical Engineering, and the Center for Statistics and Machine
Learning. He serves as the Director of the Certificate Program in
Optimization and Quantitative Decision Science. He has also held
visiting appointments with the industry, as a Visiting Senior
Optimization Fellow at Citadel, Global Quantitative Strategies, and a
Visiting Research Scientist at Google Brain (in the Robotics
group). Amir Ali received his PhD in EECS from MIT and was a Goldstine
Fellow at the IBM Watson Research Center prior to joining
Princeton. His research interests are in optimization theory,
computational aspects of dynamical systems, control-oriented learning,
and algorithms and complexity.
Amir Ali’s distinctions include the Sloan Fellowship in Computer
Science, the Presidential Early Career Award for Scientists and
Engineers (PECASE), the NSF CAREER Award, the AFOSR Young Investigator
Award, the DARPA Faculty Award, the Google Faculty Award, the MURI
award of the AFOSR, the Howard B. Wentz Junior Faculty Award, as well
as the Innovation Award of Princeton University, the Goldstine
Fellowship of IBM Research, and the Oberwolfach Fellowship of the
NSF. His undergraduate course at Princeton (ORF 363, Computing and
Optimization) is a four-time recipient of the Teaching Award of the
Princeton Engineering Council, as well as a recipient of the
Excellence in Teaching of Operations Research Award of the Institute
for Industrial and Systems Engineers, the Princeton SEAS Distinguished
Teaching Award, and the Phi Beta Kappa Award for Excellence in
Undergraduate Teaching at Princeton. His research has been
recognized by a number of best-paper awards, including the INFORMS
Optimization Society Young Researchers Prize, the INFORMS Computing
Society Prize (for best series of papers at the interface of
operations research and computer science), the best conference paper
award of the IEEE International Conference on Robotics and Automation,
and the best paper prize of the SIAM Journal on Control and
Optimization. Amir Ali was a plenary speaker at the 2021 SIAM
Conference on Optimization and the 2022 Colombian Conference on
Applied and Industrial Mathematics.

* Monday April 22, 2024 *

* Gates Commons (CSE 691) Allen Center *

### Nonsmooth Optimization on a Finer Scale

** Abstract**: Nonsmooth optimization problems are ubiquitous in
industrial and machine learning applications, arising from the need to
address tasks such as resource allocation, threat detection, and model
training. Within the realm of mathematical optimization, nonsmooth
problems stand as some of the most challenging; yet they offer the
possibility of developing efficient algorithms with provable
guarantees. The complexity of these problems, encompassing both lower
and upper bounds, has historically typically been examined under
generic assumptions, bounding the growth of the objective functions by
assuming Lipschitz continuity of either the objective itself or its
gradient. In this talk, I will argue that these traditional
assumptions defining classes of nonsmooth optimization problems
inadequately capture local properties of problems that may make them
amenable to more efficient algorithms. I will introduce a notion of
local bounded variation of the (sub)gradient and discuss how, under
this notion, we can obtain a more fine-grained characterization of the
complexity of nonsmooth problems. One consequence of these results is
that, contrary to prior belief, the complexity of nonsmooth
optimization problems, such as those with piecewise linear objectives
with polynomially many pieces, can be improved using parallelization
even in high dimensions, either if the problem is unconstrained or if
the function is allowed to be queried outside the feasible set. Based
on joint work with Cristobal Guzman; preprint available at
.

** Bio**:Jelena Diakonikolas is an assistant professor in the
Department of Computer Sciences at the University of
Wisconsin-Madison. Her primary research interests are in algorithm
design and complexity analysis in large-scale continuous optimization
settings, with a particular focus on challenges arising in machine
learning applications. Prior to joining UW-Madison, she held
postdoctoral positions at UC Berkeley and Boston University. She
received her PhD from Columbia University. Jelena is a recipient of an
AFOSR Young Investigator Award, Simons-Berkeley Research Fellowship,
and the Morton B. Friedman Prize for Excellence at Columbia
Engineering.

## Lin Xiao, Fundamental AI Research - Meta

* Monday November 6, 2023, 3:30 pm *

* Gates Commons (CSE 691) Allen Center *

### Non-negative Gauss-Newton Methods for Empirical Risk Minimization
( recording)

** Abstract: **
Training large-scale machine learning models, especially modern deep learning models, relies exclusively on the stochastic gradient method and its many variants. A major challenge in applying these methods is how to automatically schedule or adapt the step size. We exploit the fact that the loss functions in machine learning are mostly non-negative and thus can be written as the composition of the square and their real-valued square roots. With such a simple reformulation, we can apply the classical Gauss-Newton method, or the Levenberg-Marquardt method with an extra quadratic regularization. We show that the resulting algorithms are highly adaptive and can automatically warm up and decay the effective step size while tracking the loss landscape. We provide convergence analysis of these Non-negative Gauss-Newton (NGN) methods in convex, non-convex, and stochastic settings. Both the convergence rates and empirical evaluations compare favorably to the classical (stochastic) gradient method. This is joint work with Antonio Orvieto.

** Bio: **
Lin Xiao is a Research Scientist Manager on the Fundamental AI Research (FAIR) team at Meta. He received his Ph.D. from Stanford University in 2004, spent two years as a postdoctoral fellow at California Institute of Technology, and then worked at Microsoft Research until 2020 before joining Meta. His current research focuses on optimization theory and algorithms for deep learning and reinforcement learning.

* Monday October 2, 2023, 3:30 pm *

* Gates Commons (CSE 691) Allen Center *

### New directions in (quantum) distribution learning and testing (recording)

** Abstract: **
I will talk about:
- New efficient algorithms for quantum state tomography (the quantum analogue of estimating a probability distribution).
- Why you should care about the difference between total variation distance and Hellinger distance and KL divergence and chi-squared divergence.
- Quantum-inspired improvements to the classical problem of independence testing.

Includes joint work with Steven T. Flammia (Amazon)

** Bio: ** Ryan O'Donnell is a professor of computer science at Carnegie Mellon University. He received his Ph.D. in 2003 from the Department of Applied Mathematics at MIT. His research interests include quantum computation and information theory, approximability of optimization problems, spectral graph theory, analysis of Boolean functions, probability, and complexity theory.
## Ting-Kei Pong, Hong Kong Polytechnic University

* Monday June 5, 2023, 3:30 pm *

* Gates Commons (CSE 691) Allen Center *

### Error bounds for conic feasibility problems: case studies on the exponential cone ( recording )

** Abstract: **
A linear conic programming problem aims at minimizing a linear objective over the intersection of an affine set and a closed convex cone. The conic feasibility problems we study in this talk naturally arise from the optimality conditions of linear conic programming problems, which boil down to finding a point in the intersection of an affine set and a closed convex cone. While the distance to the intersection gives a precise measure of proximity of a point to being feasible, this distance can be much harder to compute than the distances to the affine set and the cone respectively. Thus, establishing bounds on the distance to the intersection based on the latter two distances (a.k.a. the error bound problem) is instrumental in the design of termination criteria for conic solvers and the study of convergence rate of algorithms. In this talk, we present a general framework for deriving error bounds for conic feasibility problems. Our framework is based on the classical concept of facial reduction, which is a fundamental tool for handling degeneracy in conic programs, and a new object called one-step facial residual function. We develop tools to compute these facial residual functions, which are applicable even when the projections onto the cones under study are not easy to analyze. We illustrate how our framework can be applied to obtain error bounds for the exponential cone. Exponential cone is a recent addition to the MOSEK commercial conic solver and allows the modeling of optimization problems involving power, exponential, logarithmic and entropy functions in the objective or constraint set. If time permits, we will also use our new error bound results to derive interesting new results in the study of Kurdyka-Lojasiewicz property.
This is joint work with Scott B. Lindstrom and Bruno F. Lourenco.

** Bio: ** Ting Kei Pong is an associate professor in Department of
Applied Mathematics at the Hong Kong Polytechnic University. He
received a bachelor's degree in 2004 and an MPhil degree in 2006 from
the Chinese University of Hong Kong. Pong received a Ph.D. from the
Department of Mathematics at UW in 2011 followed by visiting an
postdoctoral positions at Simon Fraser University, University of
Waterloo and UBC Vancouver. He joined the Hong Kong Polytechnic
University in 2014.
* Monday May 22, 2023, 3:30 pm *

* Gates Commons (CSE 691) Allen Center *

### Statistical applications of Wasserstein gradient flows (recording)

** Abstract: **
Otto calculus is a fundamental toolbox in mathematical optimal transport, imparting the Wasserstein space of probability measures with a Riemmanian structure. In particular, one can compute the Riemannian gradient of a functional over this space and, in turn, optimize it using Wasserstein gradient flows. The necessary background to define and compute Wasserstein gradient flows will be presented in the first part of the talk before moving to statistical applications such as variational inference and maximum likelihood estimation in Gaussian mixture models.

** Bio: **
Philippe Rigollet is a Professor of Mathematics at MIT. He received his Ph.D. in mathematics from the University of Paris VI. in 2006. His work is at the intersection of statistics, machine learning, and optimization, focusing primarily on the design and analysis of statistical methods for high-dimensional problems. Rigollet's recent research focuses on statistical optimal transport and its applications to geometric data analysis and sampling.
Please advertise it through your department's mailing list, and send it to anyone who may be interested.
## Misha Belkin , University of California, San Diego

* Monday May 1, 2023, 3:30 pm *

* Gates Commons (CSE 691) Allen Center *

### The Challenges of Training Infinitely Large Neural Networks (recording)

** Abstract: **
Remarkable recent successes of deep learning have heavily relied on very large neural networks. Time and time again, larger networks have shown better performance than smaller networks trained on comparably sized data sets. Since large networks generally work better, why not train infinitely large networks directly to get best performance? This is not a rhetorical question. Recent work on the Neural Tangent Kernel showed that in certain regimes infinitely wide neural networks are equivalent to kernel machines with an explicitly computable kernel function. These machines can be trained directly by solving linear systems. However, there are two primary challenges in training infinitely large networks. First, such networks are not able to take advantage of feature learning, a key ingredient in the success of neural networks. Second, scaling kernels machines to large data sizes has been a serious computational challenge.
In this talk, I will first show how feature learning can be incorporated into kernel machines without backpropagation, resulting in Recursive Feature Machines (RFM) that outperform Multilayer Perceptrons. Just as standard kernel machines, RFMs can be trained by solving linear systems of equations. These machines demonstrate state-of-the art performance on tabular data and are considerably more efficient than neural networks for small and medium data sizes. Second, I will discuss some of our recent work on scaling kernel machines to much larger data. While the data sizes used to train modern LLMs remain beyond our reach, given sufficient resources, scaling to these data sizes is not out of the realm of possibility.

** Bio: **
Mikhail Belkin is a Professor at Halicioglu Data Science Institute and Computer Science and Engineering Department at UCSD and an Amazon
Scholar. He received his Ph.D. in 2003 from the Department of Mathematics at the University of Chicago. His research interests are broadly in theory and applications of machine learning and data analysis. Some of his well-known work includes widely used Laplacian Eigenmaps, Graph Regularization and Manifold Regularization algorithms, which brought ideas from classical differential geometry and spectral graph theory to data science. His recent work has been concerned with understanding remarkable mathematical and statistical phenomena observed in the context of deep learning. This empirical evidence necessitated revisiting some of the basic concepts in statistics and optimization. One of his key recent findings has been the "double descent" risk curve that extends the textbook U-shaped bias-variance trade-off curve beyond the point of interpolation.
* Monday April 3, 2023, 3:30 pm *

* Gates Commons (CSE 691) Allen Center *

### Leveraging "partial" smoothness for faster convergence in nonsmooth optimization

** Abstract: **
Nonsmoothness and nonconvexity are significant challenges in large-scale optimization problems, such as training neural networks and solving inverse problems in computational imaging. Although first-order methods are commonly used to address these problems, they are often considered slow. In this presentation, we introduce a (locally) accelerated first-order method that violates this perception by solving "generic" nonsmooth equations at a superlinear rate. The central insight is that nonsmooth functions often exhibit "partial" smoothness, which can be leveraged through a novel generalization of Newton's method.

** Bio: **
Damek Davis is an Associate Professor of Operations Research at Cornell University. He received his Ph.D. in mathematics from the University of California, Los Angeles in 2015. His research focuses on the interplay of optimization, signal processing, statistics, and machine learning. He has received several awards for his work, including a Sloan Research Fellowship in Mathematics (2020), the INFORMS Optimization Society Young Researchers Prize (2019), an NSF CAREER Award (2021), and the SIAM Activity Group on Optimization Best Paper Prize (2023).

* Monday January 9, 2023, 3:30 pm *

* Gates Commons (CSE 691) Allen Center *

### Stability and Learning in Strategic Queueing Systems (recording)

** Abstract: **
Over the last two decades we have developed good
understanding of how to quantify the impact of strategic user behavior on
outcomes in many games (including traffic routing and online auctions)
and showed that the resulting bounds extend to repeated games assuming
players use a form of no-regret learning to adapt to the
environment. Unfortunately, these results do not apply when outcomes
in one round affect the game in the future, as is the case in many
applications. In this talk, we study this phenomenon in the context of
a game modeling queuing systems: routers compete for servers, where
packets that do not get served need to be resent, resulting in a
system where the number of packets at each round depends on the
success of the routers in the previous rounds. In joint work with
Jason Gaitonde, we analyze the resulting highly dependent random
process. We find that if the capacity of the servers is high enough to
allow a centralized and knowledgeable scheduler to get all packets
served even with double the packet arrival rate, then despite selfish
behavior of the queues, the expected number of packets in the queues
will remain bounded throughout time, assuming older packets have
priority. Further, if queues are more patient in evaluating their
outcomes , maximizing their long-run success rate, stability can be
ensured with just 1.58 times extra capacity, strictly better than what
is possible assuming the no-regret property.

** Bio: ** Éva Tardos is a Jacob Gould Schurman Professor of
Computer Science, currently chair of the Department of Computer
Science for a second term after being chair 2006-2010. She was Interim
Dean for Computing and Information Sciences 2012-2013 and more
recently was Associate Dean for Diversity & Inclusion at Cornell
University. She received her BA and PhD from Eötvös University in
Budapest. She joined the faculty at Cornell in 1989. Tardos’s research
interest is algorithms and interface of algorithms and incentives.
She is most known for her work on network-flow algorithms and
quantifying the efficiency of selfish routing. She has been elected to
the National Academy of Engineering, the National Academy of Sciences,
the American Philosophical Society, the American Academy of Arts and
Sciences, and to the Hungarian Academy of Sciences. She is the
recipient of a number of fellowships and awards including the Packard
Fellowship, the Gödel Prize, Dantzig Prize, Fulkerson Prize, ETACS
prize, and the IEEE von Neumann Medal. She co-wrote the widely used
textbook Algorithms Design. She has been editor-in-Chief of the
Journal of the ACM and of the SIAM Journal of Computing, and was
editor of several other journals, and was program committee member and
chair for several ACM and IEEE conferences in her area.