Distinguished Seminar in Optimization & Data

This is an interdepartmental talks series at the University of Washington, focused on all aspects of optimization and data science. It is a new incarnation of the CORE seminar that ran from 2013-2020.
The series is funded by the Institute for the Foundations of Data Science , the Pacific Institute of Mathematical Sciences (PIMS), and the home departments of the organizers.


  • Sasha Aravkin, Applied Mathematics
  • Dima Drusvyatskiy, Mathematics
  • Maryam Fazel, Electrical Engineering & Computer Engineering (current co-chair)
  • Zaid Harchaoui, Statistics
  • Bamdad Hosseini, Applied Math
  • Kevin Jamieson, Computer Science & Engineering
  • Thomas Rothvoss, Mathematics & Computer Sience & Engineering
  • Rekha Thomas, Mathematics (current co-chair)
  • Cynthia Vinzant, Mathematics

    Upcoming Talks ( Live stream link , Playlist of past talks )

    Amitabh Basu, Johns Hopkins University

    Monday June 3, 2024, 3:30 pm
    Gates Commons (CSE 691) Allen Center

    Information complexity of mixed-integer convex optimization

    Abstract:We consider the problem of minimizing a convex function under convex constraints, where some or all of the decision variables are constrained to be integer, with access to first-order oracles for the objective function and separation oracles for the constraints. We focus on the information complexity of the problem, i.e., the minimum number of oracle calls to solve the problem to a desired accuracy. We will present nearly tight upper and lower bounds for the problem, extending classical results from continuous convex optimization. We also initiate the study of, and obtain the first set of results on, information complexity under oracles that only reveal partial first-order information, e.g., where one can only make a binary query on the function value or (sub)gradient at a given point. Our bounds in these new settings show that these oracles are provably less informative than full first-order oracles for the purpose of optimization. We will close the talk with some open questions.

    Bio:Amitabh Basu is Professor of Applied Mathematics and Statistics at Johns Hopkins University. He received his Ph.D. from Carnegie Mellon University in 2010 and did postdoctoral work in the Dept. of Mathematics at the University of California, Davis from 2010-2013. His main research interests are in mathematical optimization and its applications, with an emphasis on problems with a combinatorial or discrete flavor. He serves on the editorial boards of Mathematics of Operations Research, Mathematical Programming, SIAM Journal on Optimization and the MOS-SIAM Series on Optimization. His work has been recognized by the NSF Career award and the Egon Balas Prize from the INFORMS Optimization Society.

    Friedrich Eisenbrand , EPFL

    Monday October 14, 2024

    Andrea Bertozzi , UCLA

    Monday November 25, 2024

    Rachel Ward , UT Austin

    Monday January 27, 2025

    Tselil Schramm , Stanford

    Monday April 14, 2025

    Tamara Broderick , MIT

    Monday May 19, 2025

    Past Talks:

    Amirali Ahmadi, Princeton University

    Monday May 6, 2024, 3:30 pm
    Gates Commons (CSE 691) Allen Center

    Complexity of Finding Local Minima in Continuous Optimization

    Abstract: Can we efficiently find a local minimum of a nonconvex continuous optimization problem? We give a rather complete answer to this question for optimization problems defined by polynomial data. In the unconstrained case, the answer remains positive for polynomials of degree up to three: We show that while the seemingly easier task of finding a critical point of a cubic polynomial is NP-hard, the complexity of finding a local minimum of a cubic polynomial is equivalent to the complexity of semidefinite programming. In the constrained case, we prove that unless P=NP, there cannot be a polynomial-time algorithm that finds a point within Euclidean distance c^n (for any constant c >= 0) of a local minimum of an n-variate quadratic polynomial over a polytope. This result (with c=0) answers a question of Pardalos and Vavasis that appeared on a list of seven open problems in complexity theory for numerical optimization in 1992. Based on joint work with Jeffrey Zhang (Yale).

    Bio: Amir Ali Ahmadi is a Professor at the Department of Operations Research and Financial Engineering at Princeton University and an Associated Faculty member of the Program in Applied and Computational Mathematics, the Department of Computer Science, the Department of Mechanical and Aerospace Engineering, the Department of Electrical Engineering, and the Center for Statistics and Machine Learning. He serves as the Director of the Certificate Program in Optimization and Quantitative Decision Science. He has also held visiting appointments with the industry, as a Visiting Senior Optimization Fellow at Citadel, Global Quantitative Strategies, and a Visiting Research Scientist at Google Brain (in the Robotics group). Amir Ali received his PhD in EECS from MIT and was a Goldstine Fellow at the IBM Watson Research Center prior to joining Princeton. His research interests are in optimization theory, computational aspects of dynamical systems, control-oriented learning, and algorithms and complexity. Amir Ali’s distinctions include the Sloan Fellowship in Computer Science, the Presidential Early Career Award for Scientists and Engineers (PECASE), the NSF CAREER Award, the AFOSR Young Investigator Award, the DARPA Faculty Award, the Google Faculty Award, the MURI award of the AFOSR, the Howard B. Wentz Junior Faculty Award, as well as the Innovation Award of Princeton University, the Goldstine Fellowship of IBM Research, and the Oberwolfach Fellowship of the NSF. His undergraduate course at Princeton (ORF 363, Computing and Optimization) is a four-time recipient of the Teaching Award of the Princeton Engineering Council, as well as a recipient of the Excellence in Teaching of Operations Research Award of the Institute for Industrial and Systems Engineers, the Princeton SEAS Distinguished Teaching Award, and the Phi Beta Kappa Award for Excellence in Undergraduate Teaching at Princeton. His research has been recognized by a number of best-paper awards, including the INFORMS Optimization Society Young Researchers Prize, the INFORMS Computing Society Prize (for best series of papers at the interface of operations research and computer science), the best conference paper award of the IEEE International Conference on Robotics and Automation, and the best paper prize of the SIAM Journal on Control and Optimization. Amir Ali was a plenary speaker at the 2021 SIAM Conference on Optimization and the 2022 Colombian Conference on Applied and Industrial Mathematics.

    Jelena Diakonikolas, University of Wisconsin-Madison

    Monday April 22, 2024
    Gates Commons (CSE 691) Allen Center

    Nonsmooth Optimization on a Finer Scale

    Abstract: Nonsmooth optimization problems are ubiquitous in industrial and machine learning applications, arising from the need to address tasks such as resource allocation, threat detection, and model training. Within the realm of mathematical optimization, nonsmooth problems stand as some of the most challenging; yet they offer the possibility of developing efficient algorithms with provable guarantees. The complexity of these problems, encompassing both lower and upper bounds, has historically typically been examined under generic assumptions, bounding the growth of the objective functions by assuming Lipschitz continuity of either the objective itself or its gradient. In this talk, I will argue that these traditional assumptions defining classes of nonsmooth optimization problems inadequately capture local properties of problems that may make them amenable to more efficient algorithms. I will introduce a notion of local bounded variation of the (sub)gradient and discuss how, under this notion, we can obtain a more fine-grained characterization of the complexity of nonsmooth problems. One consequence of these results is that, contrary to prior belief, the complexity of nonsmooth optimization problems, such as those with piecewise linear objectives with polynomially many pieces, can be improved using parallelization even in high dimensions, either if the problem is unconstrained or if the function is allowed to be queried outside the feasible set. Based on joint work with Cristobal Guzman; preprint available at .

    Bio:Jelena Diakonikolas is an assistant professor in the Department of Computer Sciences at the University of Wisconsin-Madison. Her primary research interests are in algorithm design and complexity analysis in large-scale continuous optimization settings, with a particular focus on challenges arising in machine learning applications. Prior to joining UW-Madison, she held postdoctoral positions at UC Berkeley and Boston University. She received her PhD from Columbia University. Jelena is a recipient of an AFOSR Young Investigator Award, Simons-Berkeley Research Fellowship, and the Morton B. Friedman Prize for Excellence at Columbia Engineering.

    Lin Xiao, Fundamental AI Research - Meta

    Monday November 6, 2023, 3:30 pm
    Gates Commons (CSE 691) Allen Center

    Non-negative Gauss-Newton Methods for Empirical Risk Minimization ( recording)

    Abstract: Training large-scale machine learning models, especially modern deep learning models, relies exclusively on the stochastic gradient method and its many variants. A major challenge in applying these methods is how to automatically schedule or adapt the step size. We exploit the fact that the loss functions in machine learning are mostly non-negative and thus can be written as the composition of the square and their real-valued square roots. With such a simple reformulation, we can apply the classical Gauss-Newton method, or the Levenberg-Marquardt method with an extra quadratic regularization. We show that the resulting algorithms are highly adaptive and can automatically warm up and decay the effective step size while tracking the loss landscape. We provide convergence analysis of these Non-negative Gauss-Newton (NGN) methods in convex, non-convex, and stochastic settings. Both the convergence rates and empirical evaluations compare favorably to the classical (stochastic) gradient method. This is joint work with Antonio Orvieto.

    Bio: Lin Xiao is a Research Scientist Manager on the Fundamental AI Research (FAIR) team at Meta. He received his Ph.D. from Stanford University in 2004, spent two years as a postdoctoral fellow at California Institute of Technology, and then worked at Microsoft Research until 2020 before joining Meta. His current research focuses on optimization theory and algorithms for deep learning and reinforcement learning.

    Ryan O'Donnell, Carnegie Mellon University

    Monday October 2, 2023, 3:30 pm
    Gates Commons (CSE 691) Allen Center

    New directions in (quantum) distribution learning and testing (recording)

    Abstract: I will talk about: Includes joint work with Steven T. Flammia (Amazon)

    Bio: Ryan O'Donnell is a professor of computer science at Carnegie Mellon University. He received his Ph.D. in 2003 from the Department of Applied Mathematics at MIT. His research interests include quantum computation and information theory, approximability of optimization problems, spectral graph theory, analysis of Boolean functions, probability, and complexity theory.

    Ting-Kei Pong, Hong Kong Polytechnic University

    Monday June 5, 2023, 3:30 pm
    Gates Commons (CSE 691) Allen Center

    Error bounds for conic feasibility problems: case studies on the exponential cone ( recording )

    Abstract: A linear conic programming problem aims at minimizing a linear objective over the intersection of an affine set and a closed convex cone. The conic feasibility problems we study in this talk naturally arise from the optimality conditions of linear conic programming problems, which boil down to finding a point in the intersection of an affine set and a closed convex cone. While the distance to the intersection gives a precise measure of proximity of a point to being feasible, this distance can be much harder to compute than the distances to the affine set and the cone respectively. Thus, establishing bounds on the distance to the intersection based on the latter two distances (a.k.a. the error bound problem) is instrumental in the design of termination criteria for conic solvers and the study of convergence rate of algorithms. In this talk, we present a general framework for deriving error bounds for conic feasibility problems. Our framework is based on the classical concept of facial reduction, which is a fundamental tool for handling degeneracy in conic programs, and a new object called one-step facial residual function. We develop tools to compute these facial residual functions, which are applicable even when the projections onto the cones under study are not easy to analyze. We illustrate how our framework can be applied to obtain error bounds for the exponential cone. Exponential cone is a recent addition to the MOSEK commercial conic solver and allows the modeling of optimization problems involving power, exponential, logarithmic and entropy functions in the objective or constraint set. If time permits, we will also use our new error bound results to derive interesting new results in the study of Kurdyka-Lojasiewicz property. This is joint work with Scott B. Lindstrom and Bruno F. Lourenco.

    Bio: Ting Kei Pong is an associate professor in Department of Applied Mathematics at the Hong Kong Polytechnic University. He received a bachelor's degree in 2004 and an MPhil degree in 2006 from the Chinese University of Hong Kong. Pong received a Ph.D. from the Department of Mathematics at UW in 2011 followed by visiting an postdoctoral positions at Simon Fraser University, University of Waterloo and UBC Vancouver. He joined the Hong Kong Polytechnic University in 2014.

    Philippe Rigollet , MIT

    Monday May 22, 2023, 3:30 pm
    Gates Commons (CSE 691) Allen Center

    Statistical applications of Wasserstein gradient flows (recording)

    Abstract: Otto calculus is a fundamental toolbox in mathematical optimal transport, imparting the Wasserstein space of probability measures with a Riemmanian structure. In particular, one can compute the Riemannian gradient of a functional over this space and, in turn, optimize it using Wasserstein gradient flows. The necessary background to define and compute Wasserstein gradient flows will be presented in the first part of the talk before moving to statistical applications such as variational inference and maximum likelihood estimation in Gaussian mixture models.

    Bio: Philippe Rigollet is a Professor of Mathematics at MIT. He received his Ph.D. in mathematics from the University of Paris VI. in 2006. His work is at the intersection of statistics, machine learning, and optimization, focusing primarily on the design and analysis of statistical methods for high-dimensional problems. Rigollet's recent research focuses on statistical optimal transport and its applications to geometric data analysis and sampling. Please advertise it through your department's mailing list, and send it to anyone who may be interested.

    Misha Belkin , University of California, San Diego

    Monday May 1, 2023, 3:30 pm
    Gates Commons (CSE 691) Allen Center

    The Challenges of Training Infinitely Large Neural Networks (recording)

    Abstract: Remarkable recent successes of deep learning have heavily relied on very large neural networks. Time and time again, larger networks have shown better performance than smaller networks trained on comparably sized data sets. Since large networks generally work better, why not train infinitely large networks directly to get best performance? This is not a rhetorical question. Recent work on the Neural Tangent Kernel showed that in certain regimes infinitely wide neural networks are equivalent to kernel machines with an explicitly computable kernel function. These machines can be trained directly by solving linear systems. However, there are two primary challenges in training infinitely large networks. First, such networks are not able to take advantage of feature learning, a key ingredient in the success of neural networks. Second, scaling kernels machines to large data sizes has been a serious computational challenge. In this talk, I will first show how feature learning can be incorporated into kernel machines without backpropagation, resulting in Recursive Feature Machines (RFM) that outperform Multilayer Perceptrons. Just as standard kernel machines, RFMs can be trained by solving linear systems of equations. These machines demonstrate state-of-the art performance on tabular data and are considerably more efficient than neural networks for small and medium data sizes. Second, I will discuss some of our recent work on scaling kernel machines to much larger data. While the data sizes used to train modern LLMs remain beyond our reach, given sufficient resources, scaling to these data sizes is not out of the realm of possibility.

    Bio: Mikhail Belkin is a Professor at Halicioglu Data Science Institute and Computer Science and Engineering Department at UCSD and an Amazon Scholar. He received his Ph.D. in 2003 from the Department of Mathematics at the University of Chicago. His research interests are broadly in theory and applications of machine learning and data analysis. Some of his well-known work includes widely used Laplacian Eigenmaps, Graph Regularization and Manifold Regularization algorithms, which brought ideas from classical differential geometry and spectral graph theory to data science. His recent work has been concerned with understanding remarkable mathematical and statistical phenomena observed in the context of deep learning. This empirical evidence necessitated revisiting some of the basic concepts in statistics and optimization. One of his key recent findings has been the "double descent" risk curve that extends the textbook U-shaped bias-variance trade-off curve beyond the point of interpolation.

    Damek Davis , Cornell University

    Monday April 3, 2023, 3:30 pm
    Gates Commons (CSE 691) Allen Center

    Leveraging "partial" smoothness for faster convergence in nonsmooth optimization

    Abstract: Nonsmoothness and nonconvexity are significant challenges in large-scale optimization problems, such as training neural networks and solving inverse problems in computational imaging. Although first-order methods are commonly used to address these problems, they are often considered slow. In this presentation, we introduce a (locally) accelerated first-order method that violates this perception by solving "generic" nonsmooth equations at a superlinear rate. The central insight is that nonsmooth functions often exhibit "partial" smoothness, which can be leveraged through a novel generalization of Newton's method.

    Bio: Damek Davis is an Associate Professor of Operations Research at Cornell University. He received his Ph.D. in mathematics from the University of California, Los Angeles in 2015. His research focuses on the interplay of optimization, signal processing, statistics, and machine learning. He has received several awards for his work, including a Sloan Research Fellowship in Mathematics (2020), the INFORMS Optimization Society Young Researchers Prize (2019), an NSF CAREER Award (2021), and the SIAM Activity Group on Optimization Best Paper Prize (2023).

    Éva Tardos, Cornell University

    Monday January 9, 2023, 3:30 pm
    Gates Commons (CSE 691) Allen Center

    Stability and Learning in Strategic Queueing Systems (recording)

    Abstract: Over the last two decades we have developed good understanding of how to quantify the impact of strategic user behavior on outcomes in many games (including traffic routing and online auctions) and showed that the resulting bounds extend to repeated games assuming players use a form of no-regret learning to adapt to the environment. Unfortunately, these results do not apply when outcomes in one round affect the game in the future, as is the case in many applications. In this talk, we study this phenomenon in the context of a game modeling queuing systems: routers compete for servers, where packets that do not get served need to be resent, resulting in a system where the number of packets at each round depends on the success of the routers in the previous rounds. In joint work with Jason Gaitonde, we analyze the resulting highly dependent random process. We find that if the capacity of the servers is high enough to allow a centralized and knowledgeable scheduler to get all packets served even with double the packet arrival rate, then despite selfish behavior of the queues, the expected number of packets in the queues will remain bounded throughout time, assuming older packets have priority. Further, if queues are more patient in evaluating their outcomes , maximizing their long-run success rate, stability can be ensured with just 1.58 times extra capacity, strictly better than what is possible assuming the no-regret property.

    Bio: Éva Tardos is a Jacob Gould Schurman Professor of Computer Science, currently chair of the Department of Computer Science for a second term after being chair 2006-2010. She was Interim Dean for Computing and Information Sciences 2012-2013 and more recently was Associate Dean for Diversity & Inclusion at Cornell University. She received her BA and PhD from Eötvös University in Budapest. She joined the faculty at Cornell in 1989. Tardos’s research interest is algorithms and interface of algorithms and incentives. She is most known for her work on network-flow algorithms and quantifying the efficiency of selfish routing. She has been elected to the National Academy of Engineering, the National Academy of Sciences, the American Philosophical Society, the American Academy of Arts and Sciences, and to the Hungarian Academy of Sciences. She is the recipient of a number of fellowships and awards including the Packard Fellowship, the Gödel Prize, Dantzig Prize, Fulkerson Prize, ETACS prize, and the IEEE von Neumann Medal. She co-wrote the widely used textbook Algorithms Design. She has been editor-in-Chief of the Journal of the ACM and of the SIAM Journal of Computing, and was editor of several other journals, and was program committee member and chair for several ACM and IEEE conferences in her area.