Machine Learning: From Data to Mathematical Understanding
a CIMEEMS Course
Cetraro (Cosenza), 37 July 2023
Aim
The growing sophistication of machine learning algorithms, rich sources of data, and power of computers has led to intense interest in leveraging machine learning in everything from automatic recognition of the content of images to identifying healthcare best practices, from improving agricultural yields to un derstanding how the human brain encodes information. On the one hand, in the past decade we have already achieved a mature mathematical understanding of some machine learning methods, such as sparse models and compressed sensing, kernel methods, and support vector machines. On the other hand, despite the promising results, machine learning researchers still face significant challenges because many aspects of common algorithms, such as deep neural networks, are poorly understood. The mathematical difficulties rely specifically in the interplay between nonlinearity, nonsmoothness, and nonconvexity with highdimensional probability and optimization.
The primary aim of this school is to teach students the mathematical foun dations of machine learning and better equip them to develop new methodology to address critical and emerging challenges in machine learning. In particular, they will learn about the interplay between nonlinearity, nonsmoothness, and nonconvexity in highdimensional probability and optimization and they will ac quire understanding of relevant practical applications, such as image processing and classification. Armed with this knowledge, students will be wellequipped to help address the above challenges of interpretability, fragility, adaptability, and fairness, and using these methods to advance science, engineering, healthcare, and agriculture.
Lectures will be delivered in English.
Where
 Cetraro (Cosenza)
 Hotel S. Michele Cetraro (Cosenza)
 tel.: +39098291012  fax: +39098291430
More information on the CIME website.
Admission
Application is open
please go here
Lectures

An optimization perspective on sampling
Sampling is a fundamental question in statistics and machine learning, most notably in Bayesian methods. Sampling and optimization present many similarities, some obvious, others more mysterious. In particular, the seminar work of Jordan, Kinderlehrer and Otto (’98) has unveiled a beautiful connection between the Brownian motion and the heat equation on the one hand, and optimal transport on the other. They showed that certain stochastic processes may be viewed as gradient descent over the Wasserstein space of probability distributions. This connection opens the perspective of a novel approach to sampling that leverages the rich toolbox of optimization to derive and analyze sampling algorithms. The goal of this course is to bring together the many ingredients that make this perspective possible starting from the basics and building to some of the most recent advances in sampling.

Regularization Approaches to Machine Learning
Understanding how intelligence works and how it can be emulated in machines is an age old dream and arguably one of the biggest challenges in modern science. Learning, with its principles and computational implementations, is at the very core of this endeavor. Recently, for the first time, we have been able to develop artificial intelligence systems able to solve complex tasks considered out of reach for decades. Modern cameras recognize faces, and smart phones voice commands, cars can see and detect pedestrians and ATM machines automatically read checks. In most cases at the root of these success stories there are machine learning algorithms, that power software that is trained on data rather than being solely programmed to solve a task. Among the variety of approaches to modern machine learning, we will focus on regularization techniques, that are key to highdimensional learning. Indeed, learning from finite data requires incorporating priori knowledge to ensure stability and generalization. In this course, we will introduce the problem of learning as a linear inverse problem and show how regularization theory provides a natural framework for algorithm design. Starting from classical approaches based on penalization we will then consider different ideas including implicit/iterative regularization and regularization with random projections. Throughout the course the emphasis will be on the interplay between computational and statistical aspects. Theory classes will be complemented by practical laboratory sessions that will give the opportunity to have hands on experience on different algorithmic solutions for large scale problems.
References
Lorenzo Rosasco (2016) Introductory Machine Learning Notes, University of Genoa, ML 2016/2017 Lectures Notes. Available online

Datadriven approaches to inverse problems
Inverse problems are about the reconstruction of an unknown physical quantity from indirect measurements. They appear in a variety of places, from medical imaging, for instance MRI or CT, to remote sensing, for instance Radar, to material sciences and molecular biology, for instance electron microscopy. Here, inverse problems is a tool for looking inside specimen, resolving structures beyond the scale visible to the naked eye, and to quantify them. It is a mean for diagnosis, prediction and discovery. Most inverse problems of interest are illposed and require appropriate mathematical treatment for recovering meaningful solutions. Classically, such approaches are derived almost conclusively in a knowledge driven manner, constituting handcrafted mathematical models. Examples include variational regularization methods with Tikhonov regularization, the total variation and several sparsitypromoting regularizers such as the L1 norm of Wavelet coefficients of the solution. While such handcrafted approaches deliver mathematically rigorous and computationally robust solutions to inverse problems, they are also limited by our ability to model solution properties accurately and to realise these approaches in a computationally efficient manner. Recently, a new paradigm has been introduced to the regularization of inverse problems, which derives solutions to inverse problems in a data driven way. Here, the inversion approach is not mathematically modelled in the classical sense, but modelled by highly overparametrised models, typically deep neural networks, that are adapted to the inverse problems at hand by appropriately selected training data. Current approaches that follow this new paradigm distinguish themselves through solution accuracies paired with computational efficieny that were previously unconceivable. In this course I will give an introduction to this new datadriven paradigm for inverse problems. Presented methods include datadriven variational models and plugandplay approaches, learned iterative schemes aka learned unrolling, and learned postprocessing. In the first part of the lecture I will give an introduction to inverse problems, classical solution strategies and discuss applications. In the second part we will investigate learned variational models and plugandplay approaches. In the third part we discuss the idea of unrolling an iterative reconstruction algorithm and turning it into a datadriven reconstruction approach by appropriate parametrisation and optimisation. Throughout presenting these methodologies, we will discuss their theoretical properties and provide numerical examples for image denoising, deconvolution and computed tomography reconstruction. The lecture series will finish with a discussion of open problems and future perspectives.
References
This lecture series is mainly based on Arridge, S., Maass, P., Öktem, O., & Schönlieb, C. B. (2019) Solving inverse problems using datadriven models. Acta Numerica, 28, 1174.
Slides

Topic: Randomized algorithms for linear algebra
Probabilistic algorithms have joined the mainstream of numerical linear algebra over the last 15 years. For certain problems, these methods are faster and more scalable than classical algorithms. They also give more flexibility in algorithm design, allowing for adaptations to new data access models and computer architectures. This course will cover basic principles and algorithms from the field of randomized linear algebra computations. Focus is placed on techniques with a record of good practical performance.
Slides

Fundamental Optimization Algorithms for Data Science
Many problems in machine learning and data analysis are formulated as optimization problems. Such formulations allow tradeoffs between model fit and generalization to be captured, and yield solutions that are useful in the context of the given learning task. In particular, regularization mechanisms can be incorporated into such formulations. Modern optimization algorithms solve these formulations in ways that are theoretically principled with and that have good practical performance. We will start this section by describing how many canonical problems in data analysis can be formulated as optimization problems, including the training of deep neural networks, kernel learning, matrix completion, covariance estimation. We then introduce some basic concepts that underpin many optimization approaches, including convexity, Taylor’s theorem, subgradients, convergence rates, and duality. Next, we discuss algorithms that make use of gradient and subgradient information, including firstorder descent methods, accelerated gradient methods, subgradient methods, and the conditional gradient (“FrankWolfe”) approach. Extensions of these techniques to regularized (or “sparse”) optimization formulations will be discussed next, since such formulations are ubiquitous in data analysis applications. Next, we discuss methods that require the use or approximation of secondorder derivative information, including Newton and quasiNewton methods such as LBFGS. Finally, we discuss methods for nonconvex problems that attain at least local minima with guaranteed complexity. Throughout, we highlight the interactions between optimization, machine learning, statistics, and theoretical computer science that have produced such a powerful toolbox of techniques for solving data analysis problems.
References
 Jorge Nocedal and Stephen J. Wright (2006) Numerical Optimization, Springer, Second edition.
 Stephen J. Wright and Benjamin Recht (2022) Optimization for Data Analysis, Cambridge.
Slides
Program
Day 0 (Sunday, 02 July)
 [18:0020:00] Welcome and Registration
Day 1 (Monday, 03 July)
 [08:0008:45] Registration Desk is open
 [08:4509:00] Opening and Welcome
 [09:0010:00] Joel Tropp (part 1) Video
 [10:1511.15] CarolaBibiane Schoenlieb (part 1) Video
 [11:1511:45] Coffee break
 [11:4512:45] Lorenzo Rosasco (part 1) Video
 [13:0014:00] Lunch
 [16:3017:45] Lorenzo Rosasco (part 2) Video
 [18:0019:15] Stephen Wright (part 1) Video
 [20:0021:30] Dinner
Day 2 (Tuesday, 04 July)
 [09:0010:00] Lorenzo Rosasco (part 3) Video
 [10:1511:15] Joel Tropp (part 2) Video
 [11:1511:45] Coffee break
 [11:4512:45] Stephen Wright (part 2) Video
 [13:0014:00] Lunch
 [16:3017:45] CarolaBibiane Schoenlieb (part 2) Video
 [18:0019:15] Lorenzo Rosasco (part 4) Video
 [20:0021:30] Dinner
Day 3 (Wednesday, 05 July)
 [09:0010:00] Philippe Rigollet (part 1) Video
 [10:1511:15] Lorenzo Rosasco (part 5) Video
 [11:1511:45] Coffee break
 [11:4512:45] CarolaBibiane Schoenlieb (part 3) Video
 [13:0014:00] Lunch
 [16:3017:45] Philippe Rigollet (part 2) Video
 [18:0019:15] Joel Tropp (part 3) Video
 [20:0021:30] Dinner
Day 4 (Thursday, 06 July)
 [09:0010:00] Stephen Wright (part 3) Video
 [10:1511.15] Philippe Rigollet (part 3) Video
 [11:1511:45] Coffee break
 [11:4512:45] Joel Tropp (part 4) Video
 [13:0014:00] Lunch
 [16:3017:45] Philippe Rigollet (part 4) Video
 [18:0019:15] Stephen Wright (part 4) Video
 [20:0021:30] Dinner
Day 5 (Friday, 07 July)
 [09:0010:00] Joel Tropp (part 5) Video
 [10:1511:15] Philippe Rigollet (part 5) Video
 [11:1511:45] Coffee break
 [11:4512:45] Stephen Wright (part 5) Video
 [13:0014:00] Lunch
Photos
Our Sponsors
Organizers
 Claudio Agostinelli (University of Trento) claudio.agostinelli@unitn.it
 Massimo Fornasier (Technical University of Munich) massimo.fornasier@ma.tum.de
 Lorenzo Rosasco (University of Genova  IIT and MIT) lorenzo.rosasco@unige.it
 Rebecca Willett (University of Chicago) willett@uchicago.edu
Information
More information at the CIME website.