Machine Learning: From Data to Mathematical Understanding


Levico, TBA 2022


The growing sophistication of machine learning algorithms, rich sources of
data, and power of computers has led to intense interest in leveraging machine
learning in everything from automatic recognition of the content of images to
identifying healthcare best practices, from improving agricultural yields to un-
derstanding how the human brain encodes information. On the one hand, in
the past decade we have already achieved a mature mathematical understanding of some machine learning methods, such as sparse models and compressed
sensing, kernel methods, and support vector machines. On the other hand, despite the promising results, machine learning researchers still face significant
challenges because many aspects of common algorithms, such as deep neural
networks, are poorly understood. The mathematical difficulties rely specifically
in the interplay between nonlinearity, nonsmoothness, and nonconvexity with
high-dimensional probability and optimization.

The primary aim of this school is to teach students the mathematical foun-
dations of machine learning and better equip them to develop new methodology
to address critical and emerging challenges in machine learning. In particular,
they will learn about the interplay between nonlinearity, nonsmoothness, and
nonconvexity in high-dimensional probability and optimization and they will ac-
quire understanding of relevant practical applications, such as image processing
and classification. Armed with this knowledge, students will be well-equipped to
help address the above challenges of interpretability, fragility, adaptability, and fairness, and using these methods to advance science, engineering, healthcare, and agriculture.

Lectures will be delivered in English.


All activities are at BellaVista Relax Hotel, Via Vittorio Emanuele III, 7, 38056 Levico Terme (TN), Italy, see here


Application is open

please go here


Philippe Rigollet

Philippe Rigollet

(MIT, U.S.A.)
Personal website

Statistical Optimal Transport

Over the last decade, the statistical and computational aspects of massive data processing have become increasingly intertwined. To sort heuristics from principle methods, the need for a theory of optimality that integrates both aspects is now acute. On the one hand, convex relaxations have provided a fertile ground for developing efficient algorithms with provable guarantees. On the other hand, lower bounds that account for computability have ben developed recently, from different perspectives. In these we will review aspects of statistical and computational optimality. These results blend tools from probability, statistics and theoretical computer science that will be reviewed and applied to several high-dimensional learning problems.


Philippe Rigollet and Jan-Christian Huetter (2017) High Dimensional Statistics, Lecture Notes, available online

Lorenzo Rosasco

Lorenzo Rosasco

(University of Genova, IIT, and MIT, Italy and U.S.A.o)
Personal website

Regularization Approaches to Machine Learning

Understanding how intelligence works and how it can be emulated in machines is an age old dream and arguably one of the biggest challenges in modern science. Learning, with its principles and computational implementations, is at the very core of this endeavor. Recently, for the first time, we have been able to develop artificial intelligence systems able to solve complex tasks considered out of reach for decades. Modern cameras recognize faces, and smart phones voice commands, cars can see and detect pedestrians and ATM machines automatically read checks. In most cases at the root of these success stories there are machine learning algorithms, that power software that is trained on data rather than being solely programmed to solve a task. Among the variety of approaches to modern machine learning, we will focus on regularization techniques, that are key to high-dimensional learning. Indeed, learning from finite data requires incorporating priori knowledge to ensure stability and generalization. In this course, we will introduce the problem of learning as a linear inverse problem and show how regularization theory provides a natural framework for algorithm design. Starting from classical approaches based on penalization we will then consider different ideas including implicit/iterative regularization and regularization with random projections. Throughout the course the emphasis will be on the interplay between computational and statistical aspects. Theory classes will be complemented by practical laboratory sessions that will give the opportunity to have hands on experience on different algorithmic solutions for large scale problems.


Lorenzo Rosasco (2016) Introductory Machine Learning Notes, University of Genoa, ML 2016/2017 Lectures Notes. Available online

Carola-Bibiane Schoenlieb

Carola-Bibiane Schoenlieb

(University of Cambridge, U.K.)
Personal website

Mathematical Imaging and Machine Learning

Mathematical imaging embraces mathematical methods for the analysis and processing of imaging data, including among others image reconstruction from indirect measurements, image segmentation and classification, image restoration and spatio-temporal image analysis, just to name a few. Until recently mathematical imaging mainly revolved around so-called model-based imaging methods composed of variational models and partial differential equations (PDE) in the continuum, and statistical models, discrete filters and morphological image analysis in the discrete setting. How well these hand-crafted models perform depends on the complexity and ill-posedness of the task, the talent of the modeler and analyst as well as the robustness and efficiency of the numerical schemes developed for their solution. They have been successfully applied to a variety of image-based problems in biomedical imaging, computer vision, imaging in chemical engineering, material sciences, remote sensing, digital humanities and arts, and many, many more. Their power lies in their mathematical guarantees of stability and generalisability, their weaknesses in the lack of adaptivity to the imaging data at hand. In the last couple of years, a powerful stream of imaging methods, in parallel to model-based imaging approaches, solely founded in machine learning, has arisen: deep learning. Deep Learning has had a transformative impact on a wide range of tasks related to Artificial Intelligence, e.g. from computer vision, speech recognition, and games. This is due to the capability of deep neural networks adapting to data by extracting the essential information and using it to form decisions in a black box manner. What is required to make deep neural networks work is a sufficiently large and diverse dataset and possibly laborious design of the network architecture to, at least empirically, render trustworthy solutions. These results are equally surprising in their apparent potential for a wide range of applications as well as in the almost complete lack of a stringent theoretical justification of these approaches in terms of approximation properties, convergence results, sampling rates or mathematically justified, efficient algorithms.


Carola-Bibiane Schoenlieb (2015) Partial Differential Equation Methods for Image Inpainting, Cambridge University Press.

Joel Tropp

Joel Tropp

(Caltech, Pasadena, U.S.A.)
Personal website

Topic: Randomized algorithms for linear algebra

Probabilistic algorithms have joined the mainstream of numerical linear algebra over the last 15 years. For certain problems, these methods are faster and more scalable than classical algorithms. They also give more flexibility in algorithm design, allowing for adaptations to new data access models and computer architectures. This course will cover basic principles and algorithms from the field of randomized linear algebra computations. Focus is placed on techniques with a record of good practical performance.

Stephen Wright

Stephen Wright

(University of Wisconsin, U.S.A.)
Personal website

Fundamental Optimization Algorithms for Data Science

Many problems in machine learning and data analysis are formulated as optimization problems. Such formulations allow tradeoffs between model fit and generalization to be captured, and yield solutions that are useful in the context of the given learning task. In particular, regularization mechanisms can be incorporated into such formulations. Modern optimization algorithms solve these formulations in ways that are theoretically principled with and that have good practical performance. We will start this section by describing how many canonical problems in data analysis can be formulated as optimization problems, including the training of deep neural networks, kernel learning, matrix completion, covariance estimation. We then introduce some basic concepts that underpin many optimization approaches, including convexity, Taylor’s theorem, subgradients, convergence rates, and duality. Next, we discuss algorithms that make use of gradient and subgradient information, including first-order descent methods, accelerated gradient methods, subgradient methods, and the conditional gradient (“Frank-Wolfe”) approach. Extensions of these techniques to regularized (or “sparse”) optimization formulations will be discussed next, since such formulations are ubiquitous in data analysis applications. Next, we discuss methods that require the use or approximation of second-order derivative information, including Newton and quasi-Newton methods such as L-BFGS. Finally, we discuss methods for nonconvex problems that attain at least local minima with guaranteed complexity. Throughout, we highlight the interactions between optimization, machine learning, statistics, and theoretical computer science that have produced such a powerful toolbox of techniques for solving data analysis problems.


Jorge Nocedal and Stephen Wright (2006) Numerical Optimization, Springer, Second edition.

Program (TBC)

Day 0

  • [18:00-20:00] Welcome and Registration

Day 1

  • [8:00-8:45] Registration Desk is open
  • [8:45-9:00] Opening and Welcome
  • [9:00-10:45] Carola-Bibiane Schoenlieb (part 1)
  • [10:45-11:30] Coffee break
  • [11:30-13:15] Carola-Bibiane Schoenlieb (part 2)
  • [13:15-15:00] Lunch
  • [15:00-16:45] Joel Tropp (part 1)
  • [16:45-17:00] Break
  • [17:00-18:45] Stephen Wright (part 1)
  • [19:00-20:00] Dinner

Day 2

  • [9:00-10:45] Carola-Bibiane Schoenlieb (part 3)
  • [10:45-11:30] Coffee break
  • [11:30-13:15] Carola-Bibiane Schoenlieb (part 4)
  • [13:15-15:00] Lunch
  • [15:00-16:45] Joel Tropp (part 2)
  • [16:45-17:00] Break
  • [17:00-18:45] Stephen Wright (part 2)
  • [19:00-20:00] Dinner

Day 3

  • [9:00-10:30] Lorenzo Rosasco (part 1)
  • [10:30-11:00] Coffee break
  • [11:00-12:30] Lorenzo Rosasco (part 2)
  • [12:30-14:00] Lunch
  • [14:00-15:30] Joel Tropp (part 3)
  • [15:30-15:45] Break
  • [15:45-17:15] Philippe Rigollet (part 1)
  • [17:15-17:30] Break
  • [17:30-19:00] Stephen Wright (part 3)
  • [19:00-20:00] Dinner

Day 4

  • [9:00-10:45] Lorenzo Rosasco (part 3)
  • [10:45-11:30] Coffee break
  • [11:30-13:15] Joel Tropp (part 4)
  • [13:15-15:00] Lunch
  • [15:00-16:45] Philippe Rigollet (part 2)
  • [16:45-17:00] Break
  • [17:00-18:45] Stephen Wright (part 4)
  • [19:00-20:00] Dinner

Day 5

  • [8:30-10:00] Lorenzo Rosasco (part 4)
  • [10:00-10:30] Coffee break
  • [10:30-12:00] Philippe Rigollet (part 3)
  • [12:00-12:15] Break
  • [12:15-13:45] Philippe Rigollet (part 4)
  • [13:45-15:00] Lunch

Our Sponsors



Department of Mathematics, University of Trento

Department of Mathematics, Technische Universität München

European Research Council



In case you need more information you can contact TBD ().