a CIME-EMS Course

Cetraro (Cosenza), 3-7 July 2023

Aim

The growing sophistication of machine learning algorithms, rich sources of data, and power of computers has led to intense interest in leveraging machine learning in everything from automatic recognition of the content of images to identifying healthcare best practices, from improving agricultural yields to un- derstanding how the human brain encodes information. On the one hand, in the past decade we have already achieved a mature mathematical understanding of some machine learning methods, such as sparse models and compressed sensing, kernel methods, and support vector machines. On the other hand, despite the promising results, machine learning researchers still face significant challenges because many aspects of common algorithms, such as deep neural networks, are poorly understood. The mathematical difficulties rely specifically in the interplay between nonlinearity, nonsmoothness, and nonconvexity with high-dimensional probability and optimization.

The primary aim of this school is to teach students the mathematical foun- dations of machine learning and better equip them to develop new methodology to address critical and emerging challenges in machine learning. In particular, they will learn about the interplay between nonlinearity, nonsmoothness, and nonconvexity in high-dimensional probability and optimization and they will ac- quire understanding of relevant practical applications, such as image processing and classification. Armed with this knowledge, students will be well-equipped to help address the above challenges of interpretability, fragility, adaptability, and fairness, and using these methods to advance science, engineering, healthcare, and agriculture.

Lectures will be delivered in English.

Where

Cetraro (Cosenza)
Hotel S. Michele Cetraro (Cosenza)
tel.: +39-0982-91012 - fax: +39-0982-91430

More information on the CIME website.

Admission

Application is open

please go here

Lectures

Philippe Rigollet

(MIT, U.S.A.)

Personal website

An optimization perspective on sampling

Sampling is a fundamental question in statistics and machine learning, most notably in Bayesian methods. Sampling and optimization present many similarities, some obvious, others more mysterious. In particular, the seminar work of Jordan, Kinderlehrer and Otto (’98) has unveiled a beautiful connection between the Brownian motion and the heat equation on the one hand, and optimal transport on the other. They showed that certain stochastic processes may be viewed as gradient descent over the Wasserstein space of probability distributions. This connection opens the perspective of a novel approach to sampling that leverages the rich toolbox of optimization to derive and analyze sampling algorithms. The goal of this course is to bring together the many ingredients that make this perspective possible starting from the basics and building to some of the most recent advances in sampling.

Lorenzo Rosasco

(University of Genova, IIT, and MIT, Italy and U.S.A.o)

Personal website

Regularization Approaches to Machine Learning

Understanding how intelligence works and how it can be emulated in machines is an age old dream and arguably one of the biggest challenges in modern science. Learning, with its principles and computational implementations, is at the very core of this endeavor. Recently, for the first time, we have been able to develop artificial intelligence systems able to solve complex tasks considered out of reach for decades. Modern cameras recognize faces, and smart phones voice commands, cars can see and detect pedestrians and ATM machines automatically read checks. In most cases at the root of these success stories there are machine learning algorithms, that power software that is trained on data rather than being solely programmed to solve a task. Among the variety of approaches to modern machine learning, we will focus on regularization techniques, that are key to high-dimensional learning. Indeed, learning from finite data requires incorporating priori knowledge to ensure stability and generalization. In this course, we will introduce the problem of learning as a linear inverse problem and show how regularization theory provides a natural framework for algorithm design. Starting from classical approaches based on penalization we will then consider different ideas including implicit/iterative regularization and regularization with random projections. Throughout the course the emphasis will be on the interplay between computational and statistical aspects. Theory classes will be complemented by practical laboratory sessions that will give the opportunity to have hands on experience on different algorithmic solutions for large scale problems.

References

Lorenzo Rosasco (2016) Introductory Machine Learning Notes, University of Genoa, ML 2016/2017 Lectures Notes. Available online

Carola-Bibiane Schoenlieb

(University of Cambridge, U.K.)

Personal website

Data-driven approaches to inverse problems

Inverse problems are about the reconstruction of an unknown physical quantity from indirect measurements. They appear in a variety of places, from medical imaging, for instance MRI or CT, to remote sensing, for instance Radar, to material sciences and molecular biology, for instance electron microscopy. Here, inverse problems is a tool for looking inside specimen, resolving structures beyond the scale visible to the naked eye, and to quantify them. It is a mean for diagnosis, prediction and discovery. Most inverse problems of interest are ill-posed and require appropriate mathematical treatment for recovering meaningful solutions. Classically, such approaches are derived almost conclusively in a knowledge driven manner, constituting handcrafted mathematical models. Examples include variational regularization methods with Tikhonov regularization, the total variation and several sparsity-promoting regularizers such as the L1 norm of Wavelet coefficients of the solution. While such handcrafted approaches deliver mathematically rigorous and computationally robust solutions to inverse problems, they are also limited by our ability to model solution properties accurately and to realise these approaches in a computationally efficient manner. Recently, a new paradigm has been introduced to the regularization of inverse problems, which derives solutions to inverse problems in a data driven way. Here, the inversion approach is not mathematically modelled in the classical sense, but modelled by highly over-parametrised models, typically deep neural networks, that are adapted to the inverse problems at hand by appropriately selected training data. Current approaches that follow this new paradigm distinguish themselves through solution accuracies paired with computational efficieny that were previously unconceivable. In this course I will give an introduction to this new data-driven paradigm for inverse problems. Presented methods include data-driven variational models and plug-and-play approaches, learned iterative schemes aka learned unrolling, and learned post-processing. In the first part of the lecture I will give an introduction to inverse problems, classical solution strategies and discuss applications. In the second part we will investigate learned variational models and plug-and-play approaches. In the third part we discuss the idea of unrolling an iterative reconstruction algorithm and turning it into a data-driven reconstruction approach by appropriate parametrisation and optimisation. Throughout presenting these methodologies, we will discuss their theoretical properties and provide numerical examples for image denoising, deconvolution and computed tomography reconstruction. The lecture series will finish with a discussion of open problems and future perspectives.

References

This lecture series is mainly based on Arridge, S., Maass, P., Öktem, O., & Schönlieb, C. B. (2019) Solving inverse problems using data-driven models. Acta Numerica, 28, 1-174.

Slides

Joel Tropp

(Caltech, Pasadena, U.S.A.)

Personal website

Topic: Randomized algorithms for linear algebra

Probabilistic algorithms have joined the mainstream of numerical linear algebra over the last 15 years. For certain problems, these methods are faster and more scalable than classical algorithms. They also give more flexibility in algorithm design, allowing for adaptations to new data access models and computer architectures. This course will cover basic principles and algorithms from the field of randomized linear algebra computations. Focus is placed on techniques with a record of good practical performance.

Slides

Stephen Wright

(University of Wisconsin, U.S.A.)

Personal website

Fundamental Optimization Algorithms for Data Science

Many problems in machine learning and data analysis are formulated as optimization problems. Such formulations allow tradeoffs between model fit and generalization to be captured, and yield solutions that are useful in the context of the given learning task. In particular, regularization mechanisms can be incorporated into such formulations. Modern optimization algorithms solve these formulations in ways that are theoretically principled with and that have good practical performance. We will start this section by describing how many canonical problems in data analysis can be formulated as optimization problems, including the training of deep neural networks, kernel learning, matrix completion, covariance estimation. We then introduce some basic concepts that underpin many optimization approaches, including convexity, Taylor’s theorem, subgradients, convergence rates, and duality. Next, we discuss algorithms that make use of gradient and subgradient information, including first-order descent methods, accelerated gradient methods, subgradient methods, and the conditional gradient (“Frank-Wolfe”) approach. Extensions of these techniques to regularized (or “sparse”) optimization formulations will be discussed next, since such formulations are ubiquitous in data analysis applications. Next, we discuss methods that require the use or approximation of second-order derivative information, including Newton and quasi-Newton methods such as L-BFGS. Finally, we discuss methods for nonconvex problems that attain at least local minima with guaranteed complexity. Throughout, we highlight the interactions between optimization, machine learning, statistics, and theoretical computer science that have produced such a powerful toolbox of techniques for solving data analysis problems.

References

Jorge Nocedal and Stephen J. Wright (2006) Numerical Optimization, Springer, Second edition.
Stephen J. Wright and Benjamin Recht (2022) Optimization for Data Analysis, Cambridge.

Slides

Program

Day 0 (Sunday, 02 July)

[18:00-20:00] Welcome and Registration

Day 1 (Monday, 03 July)

[08:00-08:45] Registration Desk is open
[08:45-09:00] Opening and Welcome
[09:00-10:00] Joel Tropp (part 1) Video
[10:15-11.15] Carola-Bibiane Schoenlieb (part 1) Video
[11:15-11:45] Coffee break
[11:45-12:45] Lorenzo Rosasco (part 1) Video
[13:00-14:00] Lunch
[16:30-17:45] Lorenzo Rosasco (part 2) Video
[18:00-19:15] Stephen Wright (part 1) Video
[20:00-21:30] Dinner

Day 2 (Tuesday, 04 July)

[09:00-10:00] Lorenzo Rosasco (part 3) Video
[10:15-11:15] Joel Tropp (part 2) Video
[11:15-11:45] Coffee break
[11:45-12:45] Stephen Wright (part 2) Video
[13:00-14:00] Lunch
[16:30-17:45] Carola-Bibiane Schoenlieb (part 2) Video
[18:00-19:15] Lorenzo Rosasco (part 4) Video
[20:00-21:30] Dinner

Day 3 (Wednesday, 05 July)

[09:00-10:00] Philippe Rigollet (part 1) Video
[10:15-11:15] Lorenzo Rosasco (part 5) Video
[11:15-11:45] Coffee break
[11:45-12:45] Carola-Bibiane Schoenlieb (part 3) Video
[13:00-14:00] Lunch
[16:30-17:45] Philippe Rigollet (part 2) Video
[18:00-19:15] Joel Tropp (part 3) Video
[20:00-21:30] Dinner

Day 4 (Thursday, 06 July)

[09:00-10:00] Stephen Wright (part 3) Video
[10:15-11.15] Philippe Rigollet (part 3) Video
[11:15-11:45] Coffee break
[11:45-12:45] Joel Tropp (part 4) Video
[13:00-14:00] Lunch
[16:30-17:45] Philippe Rigollet (part 4) Video
[18:00-19:15] Stephen Wright (part 4) Video
[20:00-21:30] Dinner

Day 5 (Friday, 07 July)

[09:00-10:00] Joel Tropp (part 5) Video
[10:15-11:15] Philippe Rigollet (part 5) Video
[11:15-11:45] Coffee break
[11:45-12:45] Stephen Wright (part 5) Video
[13:00-14:00] Lunch

Machine Learning: From Data to Mathematical Understanding

a CIME-EMS Course

Cetraro (Cosenza), 3-7 July 2023

Aim

Where

Admission

Application is open

Lectures

Philippe Rigollet

An optimization perspective on sampling

Lorenzo Rosasco

Regularization Approaches to Machine Learning

References

Carola-Bibiane Schoenlieb

Data-driven approaches to inverse problems

References

Slides

Joel Tropp

Topic: Randomized algorithms for linear algebra

Slides

Stephen Wright

Fundamental Optimization Algorithms for Data Science

References

Slides

Program

Day 0 (Sunday, 02 July)

Day 1 (Monday, 03 July)

Day 2 (Tuesday, 04 July)

Day 3 (Wednesday, 05 July)

Day 4 (Thursday, 06 July)

Day 5 (Friday, 07 July)

Photos

Our Sponsors

Organizers

Information