Distinguished Seminar in Optimization & Data

This is an interdepartmental talks series at the University of Washington, focused on all aspects of optimization and data science. It is a new incarnation of the CORE seminar that ran from 2013-2020.
The series is funded by the Institute for the Foundations of Data Science , the Pacific Institute of Mathematical Sciences (PIMS), and the home departments of the organizers.

Organizers:

  • Sasha Aravkin, Applied Mathematics
  • Dima Drusvyatskiy, Mathematics
  • Maryam Fazel, Electrical Engineering & Computer Engineering (current co-chair)
  • Zaid Harchaoui, Statistics
  • Kevin Jamieson, Computer Science & Engineering
  • Thomas Rothvoss, Mathematics & Computer Sience & Engineering
  • Rekha Thomas, Mathematics (current co-chair)
  • Cynthia Vinzant, Mathematics



    Upcoming Talks:

    Rachel Ward, UT Austin

    Monday February 5, 2024

    Amitabh Basu, Johns Hopkins University

    Monday March 18, 2024

    Jelena Diakonikolas, University of Wisconsin-Madison

    Monday April 22, 2024

    Amirali Ahmadi, Princeton

    Monday May 6, 2024




    Past Talks:

    Lin Xiao, Fundamental AI Research - Meta

    Monday November 6, 2023, 3:30 pm
    Gates Commons (CSE 691) Allen Center

    Non-negative Gauss-Newton Methods for Empirical Risk Minimization ( recording)

    Abstract: Training large-scale machine learning models, especially modern deep learning models, relies exclusively on the stochastic gradient method and its many variants. A major challenge in applying these methods is how to automatically schedule or adapt the step size. We exploit the fact that the loss functions in machine learning are mostly non-negative and thus can be written as the composition of the square and their real-valued square roots. With such a simple reformulation, we can apply the classical Gauss-Newton method, or the Levenberg-Marquardt method with an extra quadratic regularization. We show that the resulting algorithms are highly adaptive and can automatically warm up and decay the effective step size while tracking the loss landscape. We provide convergence analysis of these Non-negative Gauss-Newton (NGN) methods in convex, non-convex, and stochastic settings. Both the convergence rates and empirical evaluations compare favorably to the classical (stochastic) gradient method. This is joint work with Antonio Orvieto.

    Bio: Lin Xiao is a Research Scientist Manager on the Fundamental AI Research (FAIR) team at Meta. He received his Ph.D. from Stanford University in 2004, spent two years as a postdoctoral fellow at California Institute of Technology, and then worked at Microsoft Research until 2020 before joining Meta. His current research focuses on optimization theory and algorithms for deep learning and reinforcement learning.


    Ryan O'Donnell, Carnegie Mellon University

    Monday October 2, 2023, 3:30 pm
    Gates Commons (CSE 691) Allen Center

    New directions in (quantum) distribution learning and testing (recording)

    Abstract: I will talk about: Includes joint work with Steven T. Flammia (Amazon)

    Bio: Ryan O'Donnell is a professor of computer science at Carnegie Mellon University. He received his Ph.D. in 2003 from the Department of Applied Mathematics at MIT. His research interests include quantum computation and information theory, approximability of optimization problems, spectral graph theory, analysis of Boolean functions, probability, and complexity theory.

    Ting-Kei Pong, Hong Kong Polytechnic University

    Monday June 5, 2023, 3:30 pm
    Gates Commons (CSE 691) Allen Center

    Error bounds for conic feasibility problems: case studies on the exponential cone ( recording )

    Abstract: A linear conic programming problem aims at minimizing a linear objective over the intersection of an affine set and a closed convex cone. The conic feasibility problems we study in this talk naturally arise from the optimality conditions of linear conic programming problems, which boil down to finding a point in the intersection of an affine set and a closed convex cone. While the distance to the intersection gives a precise measure of proximity of a point to being feasible, this distance can be much harder to compute than the distances to the affine set and the cone respectively. Thus, establishing bounds on the distance to the intersection based on the latter two distances (a.k.a. the error bound problem) is instrumental in the design of termination criteria for conic solvers and the study of convergence rate of algorithms. In this talk, we present a general framework for deriving error bounds for conic feasibility problems. Our framework is based on the classical concept of facial reduction, which is a fundamental tool for handling degeneracy in conic programs, and a new object called one-step facial residual function. We develop tools to compute these facial residual functions, which are applicable even when the projections onto the cones under study are not easy to analyze. We illustrate how our framework can be applied to obtain error bounds for the exponential cone. Exponential cone is a recent addition to the MOSEK commercial conic solver and allows the modeling of optimization problems involving power, exponential, logarithmic and entropy functions in the objective or constraint set. If time permits, we will also use our new error bound results to derive interesting new results in the study of Kurdyka-Lojasiewicz property. This is joint work with Scott B. Lindstrom and Bruno F. Lourenco.

    Bio: Ting Kei Pong is an associate professor in Department of Applied Mathematics at the Hong Kong Polytechnic University. He received a bachelor's degree in 2004 and an MPhil degree in 2006 from the Chinese University of Hong Kong. Pong received a Ph.D. from the Department of Mathematics at UW in 2011 followed by visiting an postdoctoral positions at Simon Fraser University, University of Waterloo and UBC Vancouver. He joined the Hong Kong Polytechnic University in 2014.

    Philippe Rigollet , MIT

    Monday May 22, 2023, 3:30 pm
    Gates Commons (CSE 691) Allen Center

    Statistical applications of Wasserstein gradient flows (recording)

    Abstract: Otto calculus is a fundamental toolbox in mathematical optimal transport, imparting the Wasserstein space of probability measures with a Riemmanian structure. In particular, one can compute the Riemannian gradient of a functional over this space and, in turn, optimize it using Wasserstein gradient flows. The necessary background to define and compute Wasserstein gradient flows will be presented in the first part of the talk before moving to statistical applications such as variational inference and maximum likelihood estimation in Gaussian mixture models.

    Bio: Philippe Rigollet is a Professor of Mathematics at MIT. He received his Ph.D. in mathematics from the University of Paris VI. in 2006. His work is at the intersection of statistics, machine learning, and optimization, focusing primarily on the design and analysis of statistical methods for high-dimensional problems. Rigollet's recent research focuses on statistical optimal transport and its applications to geometric data analysis and sampling. Please advertise it through your department's mailing list, and send it to anyone who may be interested.

    Misha Belkin , University of California, San Diego

    Monday May 1, 2023, 3:30 pm
    Gates Commons (CSE 691) Allen Center

    The Challenges of Training Infinitely Large Neural Networks (recording)

    Abstract: Remarkable recent successes of deep learning have heavily relied on very large neural networks. Time and time again, larger networks have shown better performance than smaller networks trained on comparably sized data sets. Since large networks generally work better, why not train infinitely large networks directly to get best performance? This is not a rhetorical question. Recent work on the Neural Tangent Kernel showed that in certain regimes infinitely wide neural networks are equivalent to kernel machines with an explicitly computable kernel function. These machines can be trained directly by solving linear systems. However, there are two primary challenges in training infinitely large networks. First, such networks are not able to take advantage of feature learning, a key ingredient in the success of neural networks. Second, scaling kernels machines to large data sizes has been a serious computational challenge. In this talk, I will first show how feature learning can be incorporated into kernel machines without backpropagation, resulting in Recursive Feature Machines (RFM) that outperform Multilayer Perceptrons. Just as standard kernel machines, RFMs can be trained by solving linear systems of equations. These machines demonstrate state-of-the art performance on tabular data and are considerably more efficient than neural networks for small and medium data sizes. Second, I will discuss some of our recent work on scaling kernel machines to much larger data. While the data sizes used to train modern LLMs remain beyond our reach, given sufficient resources, scaling to these data sizes is not out of the realm of possibility.

    Bio: Mikhail Belkin is a Professor at Halicioglu Data Science Institute and Computer Science and Engineering Department at UCSD and an Amazon Scholar. He received his Ph.D. in 2003 from the Department of Mathematics at the University of Chicago. His research interests are broadly in theory and applications of machine learning and data analysis. Some of his well-known work includes widely used Laplacian Eigenmaps, Graph Regularization and Manifold Regularization algorithms, which brought ideas from classical differential geometry and spectral graph theory to data science. His recent work has been concerned with understanding remarkable mathematical and statistical phenomena observed in the context of deep learning. This empirical evidence necessitated revisiting some of the basic concepts in statistics and optimization. One of his key recent findings has been the "double descent" risk curve that extends the textbook U-shaped bias-variance trade-off curve beyond the point of interpolation.

    Damek Davis , Cornell University

    Monday April 3, 2023, 3:30 pm
    Gates Commons (CSE 691) Allen Center

    Leveraging "partial" smoothness for faster convergence in nonsmooth optimization

    Abstract: Nonsmoothness and nonconvexity are significant challenges in large-scale optimization problems, such as training neural networks and solving inverse problems in computational imaging. Although first-order methods are commonly used to address these problems, they are often considered slow. In this presentation, we introduce a (locally) accelerated first-order method that violates this perception by solving "generic" nonsmooth equations at a superlinear rate. The central insight is that nonsmooth functions often exhibit "partial" smoothness, which can be leveraged through a novel generalization of Newton's method.

    Bio: Damek Davis is an Associate Professor of Operations Research at Cornell University. He received his Ph.D. in mathematics from the University of California, Los Angeles in 2015. His research focuses on the interplay of optimization, signal processing, statistics, and machine learning. He has received several awards for his work, including a Sloan Research Fellowship in Mathematics (2020), the INFORMS Optimization Society Young Researchers Prize (2019), an NSF CAREER Award (2021), and the SIAM Activity Group on Optimization Best Paper Prize (2023).

    Éva Tardos, Cornell University

    Monday January 9, 2023, 3:30 pm
    Gates Commons (CSE 691) Allen Center

    Stability and Learning in Strategic Queueing Systems (recording)

    Abstract: Over the last two decades we have developed good understanding of how to quantify the impact of strategic user behavior on outcomes in many games (including traffic routing and online auctions) and showed that the resulting bounds extend to repeated games assuming players use a form of no-regret learning to adapt to the environment. Unfortunately, these results do not apply when outcomes in one round affect the game in the future, as is the case in many applications. In this talk, we study this phenomenon in the context of a game modeling queuing systems: routers compete for servers, where packets that do not get served need to be resent, resulting in a system where the number of packets at each round depends on the success of the routers in the previous rounds. In joint work with Jason Gaitonde, we analyze the resulting highly dependent random process. We find that if the capacity of the servers is high enough to allow a centralized and knowledgeable scheduler to get all packets served even with double the packet arrival rate, then despite selfish behavior of the queues, the expected number of packets in the queues will remain bounded throughout time, assuming older packets have priority. Further, if queues are more patient in evaluating their outcomes , maximizing their long-run success rate, stability can be ensured with just 1.58 times extra capacity, strictly better than what is possible assuming the no-regret property.

    Bio: Éva Tardos is a Jacob Gould Schurman Professor of Computer Science, currently chair of the Department of Computer Science for a second term after being chair 2006-2010. She was Interim Dean for Computing and Information Sciences 2012-2013 and more recently was Associate Dean for Diversity & Inclusion at Cornell University. She received her BA and PhD from Eötvös University in Budapest. She joined the faculty at Cornell in 1989. Tardos’s research interest is algorithms and interface of algorithms and incentives. She is most known for her work on network-flow algorithms and quantifying the efficiency of selfish routing. She has been elected to the National Academy of Engineering, the National Academy of Sciences, the American Philosophical Society, the American Academy of Arts and Sciences, and to the Hungarian Academy of Sciences. She is the recipient of a number of fellowships and awards including the Packard Fellowship, the Gödel Prize, Dantzig Prize, Fulkerson Prize, ETACS prize, and the IEEE von Neumann Medal. She co-wrote the widely used textbook Algorithms Design. She has been editor-in-Chief of the Journal of the ACM and of the SIAM Journal of Computing, and was editor of several other journals, and was program committee member and chair for several ACM and IEEE conferences in her area.