The growing sophistication of machine learning algorithms, rich sources of data, and power of computers has led to intense interest in leveraging machine learning in everything from automatic recognition of the content of images to identifying healthcare best practices, from improving agricultural yields to un derstanding how the human brain encodes information. On the one hand, in the past decade we have already achieved a mature mathematical understanding of some machine learning methods, such as sparse models and compressed sensing, kernel methods, and support vector machines. On the other hand, despite the promising results, machine learning researchers still face significant challenges because many aspects of common algorithms, such as deep neural networks, are poorly understood. The mathematical difficulties rely specifically in the interplay between nonlinearity, nonsmoothness, and nonconvexity with highdimensional probability and optimization.
The primary aim of this school is to teach students the mathematical foun dations of machine learning and better equip them to develop new methodology to address critical and emerging challenges in machine learning. In particular, they will learn about the interplay between nonlinearity, nonsmoothness, and nonconvexity in highdimensional probability and optimization and they will ac quire understanding of relevant practical applications, such as image processing and classification. Armed with this knowledge, students will be wellequipped to help address the above challenges of interpretability, fragility, adaptability, and fairness, and using these methods to advance science, engineering, healthcare, and agriculture.
Lectures will be delivered in English.
All activities are at BellaVista Relax Hotel, Via Vittorio Emanuele III, 7, 38056 Levico Terme (TN), Italy, see here
please go here

Over the last decade, the statistical and computational aspects of massive data processing have become increasingly intertwined. To sort heuristics from principle methods, the need for a theory of optimality that integrates both aspects is now acute. On the one hand, convex relaxations have provided a fertile ground for developing efficient algorithms with provable guarantees. On the other hand, lower bounds that account for computability have ben developed recently, from different perspectives. In these we will review aspects of statistical and computational optimality. These results blend tools from probability, statistics and theoretical computer science that will be reviewed and applied to several highdimensional learning problems.
Philippe Rigollet and JanChristian Huetter (2017) High Dimensional Statistics, Lecture Notes, available online

Understanding how intelligence works and how it can be emulated in machines is an age old dream and arguably one of the biggest challenges in modern science. Learning, with its principles and computational implementations, is at the very core of this endeavor. Recently, for the first time, we have been able to develop artificial intelligence systems able to solve complex tasks considered out of reach for decades. Modern cameras recognize faces, and smart phones voice commands, cars can see and detect pedestrians and ATM machines automatically read checks. In most cases at the root of these success stories there are machine learning algorithms, that power software that is trained on data rather than being solely programmed to solve a task. Among the variety of approaches to modern machine learning, we will focus on regularization techniques, that are key to highdimensional learning. Indeed, learning from finite data requires incorporating priori knowledge to ensure stability and generalization. In this course, we will introduce the problem of learning as a linear inverse problem and show how regularization theory provides a natural framework for algorithm design. Starting from classical approaches based on penalization we will then consider different ideas including implicit/iterative regularization and regularization with random projections. Throughout the course the emphasis will be on the interplay between computational and statistical aspects. Theory classes will be complemented by practical laboratory sessions that will give the opportunity to have hands on experience on different algorithmic solutions for large scale problems.
Lorenzo Rosasco (2016) Introductory Machine Learning Notes, University of Genoa, ML 2016/2017 Lectures Notes. Available online

Mathematical imaging embraces mathematical methods for the analysis and processing of imaging data, including among others image reconstruction from indirect measurements, image segmentation and classification, image restoration and spatiotemporal image analysis, just to name a few. Until recently mathematical imaging mainly revolved around socalled modelbased imaging methods composed of variational models and partial differential equations (PDE) in the continuum, and statistical models, discrete filters and morphological image analysis in the discrete setting. How well these handcrafted models perform depends on the complexity and illposedness of the task, the talent of the modeler and analyst as well as the robustness and efficiency of the numerical schemes developed for their solution. They have been successfully applied to a variety of imagebased problems in biomedical imaging, computer vision, imaging in chemical engineering, material sciences, remote sensing, digital humanities and arts, and many, many more. Their power lies in their mathematical guarantees of stability and generalisability, their weaknesses in the lack of adaptivity to the imaging data at hand. In the last couple of years, a powerful stream of imaging methods, in parallel to modelbased imaging approaches, solely founded in machine learning, has arisen: deep learning. Deep Learning has had a transformative impact on a wide range of tasks related to Artificial Intelligence, e.g. from computer vision, speech recognition, and games. This is due to the capability of deep neural networks adapting to data by extracting the essential information and using it to form decisions in a black box manner. What is required to make deep neural networks work is a sufficiently large and diverse dataset and possibly laborious design of the network architecture to, at least empirically, render trustworthy solutions. These results are equally surprising in their apparent potential for a wide range of applications as well as in the almost complete lack of a stringent theoretical justification of these approaches in terms of approximation properties, convergence results, sampling rates or mathematically justified, efficient algorithms.
CarolaBibiane Schoenlieb (2015) Partial Differential Equation Methods for Image Inpainting, Cambridge University Press.

Probabilistic algorithms have joined the mainstream of numerical linear algebra over the last 15 years. For certain problems, these methods are faster and more scalable than classical algorithms. They also give more flexibility in algorithm design, allowing for adaptations to new data access models and computer architectures. This course will cover basic principles and algorithms from the field of randomized linear algebra computations. Focus is placed on techniques with a record of good practical performance.

Many problems in machine learning and data analysis are formulated as optimization problems. Such formulations allow tradeoffs between model fit and generalization to be captured, and yield solutions that are useful in the context of the given learning task. In particular, regularization mechanisms can be incorporated into such formulations. Modern optimization algorithms solve these formulations in ways that are theoretically principled with and that have good practical performance. We will start this section by describing how many canonical problems in data analysis can be formulated as optimization problems, including the training of deep neural networks, kernel learning, matrix completion, covariance estimation. We then introduce some basic concepts that underpin many optimization approaches, including convexity, Taylor’s theorem, subgradients, convergence rates, and duality. Next, we discuss algorithms that make use of gradient and subgradient information, including firstorder descent methods, accelerated gradient methods, subgradient methods, and the conditional gradient (“FrankWolfe”) approach. Extensions of these techniques to regularized (or “sparse”) optimization formulations will be discussed next, since such formulations are ubiquitous in data analysis applications. Next, we discuss methods that require the use or approximation of secondorder derivative information, including Newton and quasiNewton methods such as LBFGS. Finally, we discuss methods for nonconvex problems that attain at least local minima with guaranteed complexity. Throughout, we highlight the interactions between optimization, machine learning, statistics, and theoretical computer science that have produced such a powerful toolbox of techniques for solving data analysis problems.
Jorge Nocedal and Stephen Wright (2006) Numerical Optimization, Springer, Second edition.