Plenary Speakers

Genevera Allen

(Department of Statistics, Center for Theoretical Neuroscience, Zuckerman Institute, Irving Institute, Columbia University, New York, NY USA)

Personal website

Inference for Interpretable Machine Learning: Feature Importance and Beyond

Machine Learning (ML) systems are being used to make critical societal, scientific, and business decisions. To promote trust, transparency, and accountability in these systems, many advocate making them interpretable or explainable. In response, there has been dramatic growth in techniques to provide human understandable interpretations of black-box techniques. Yet we ask: Can we trust these ML interpretations? How do we know if they are correct? Unlike for prediction tasks, it is difficult to directly test the veracity of ML interpretations. In this talk, we focus on interpreting predictive models to understand important features and important feature patterns. We first present motivating results from a large-scale empirical stability study illustrating that feature interpretations are generally unreliable and far less reliable than predictions.

Motivated by these issues, we present a new statistical inference framework for quantifying the uncertainty in feature importance and higher-order feature patterns. Based upon the Leave-One-Covariate-Out (LOCO) framework, we develop a computational and inferential approach that does not require data splitting or model refitting by utilizing minipatch ensembles, or ensembles generated by double random subsampling of observations and features. Even though our framework uses the same data for training and inference, we prove the asymptotic validity our confidence intervals for LOCO feature importance under mild assumptions. Finally, we extend our approach to detect and test feature interactions via the iLOCO metric. Our approach allows one to test whether a feature significantly contributes to any ML model’s predictive ability in a completely distribution free manner, thus promoting trust in ML feature interpretations. We highlight our inference for interpretable ML approaches via real scientific case studies and a fun illustrative example.

This is joint work with Lili Zheng, Luqin Gan, and Camille Little.

Keywords

Conformal Inference
Feature Importance Inference
Feature Interactions
Selective Inference
Interpretable Machine Learning
Ensemble Learning
Double Subsampling

References

L. Gan, L. Zheng, and G.I. Allen. Model-agnostic confidence intervals for feature importance: A fast and powerful approach using minipatch ensembles, 2025+
Camille Little, Lili Zheng, and Genevera Allen. iloco: Distribution-free inference for feature interactions. arXiv preprint arXiv:2502.06661, 2025+

Luis Angel Garcia-Escudero

(Departamento de Estadìstica e I.O. and IMUVA, University of Valladolid)

Personal website

Robust clustering in (moderately) high dimensional cases

Outliers can negatively impact Cluster Analysis. One might view outliers as separate clusters, leading to the idea that simply increasing the number of clusters, $K$, could be a natural way to manage them. However, this approach is often not the best strategy and can even be completely impractical. Consequently, several robust clustering techniques have been developed to address this issue. These techniques are also useful for highlighting potentially relevant anomalies in data, especially when dealing with datasets that may naturally include different subpopulations. In this talk, we will focus exclusively on robust clustering methods based on trimming (see Garcìa-Escudero and Mayo-Iscar, 2023, for a recent review). Among these methods, TCLUST (Garcìa-Escudero et al., 2008) stands out as a prominent approach, as it extends the well-known MCD method (Rousseeuw, 1985) for Cluster Analysis by incorporating both trimming and eigenvalue-ratio constraints.

TCLUST, along with the algorithms and packages used for its implementation, is known to be quite reliable when dealing with low-dimensional data. However, in Statistics, it is increasingly common to encounter problems arising from higher dimensionality, where outliers still occur. Detecting outliers while mitigating their harmful effects becomes more challenging. For instance, it is evident that the performance of TCLUST deteriorates significantly as dimensionality increases. This presentation will discuss these challenges, as well as some promising initial solutions for tackling this problem, at least in the case of moderately high dimensionality. The main difficulty in using TCLUST in high dimensions is the large number of parameters that arise when handling the $K$ scatter matrices for the fitted components. Constraining the maximum ratio between the eigenvalues of these scatter matrices is a reasonable way to “regularize” the TCLUST objective function and has been shown to be useful in practice. However, this regularization unfortunately limits the detectable clusters to spherical clusters with the same dispersion, which can be overly restrictive.

An alternative approach, as dimensionality increases, is to assume that the different clusters are grouped around $K$ subspaces of dimension lower than the ambient space. This approach is employed in the Robust Linear Grouping method (Garcìa-Escudero et al., 2009), which can be viewed as a simultaneous clustering and dimensionality reduction technique. However, Robust Linear Grouping does not take into account the information related to the specific “coordinates” of the projection of the observations onto the $K$ approximating subspaces, as its objective function considers only orthogonal errors.

To find a compromise between TCLUST and Robust Linear Grouping, by leveraging the dimensionality reduction power of Robust Linear Grouping and the ability of TCLUST to model the projections of observations onto the $K$ approximating subspaces, we consider a robust extension of the HDD method (Bouveyron et al., 2007) through trimming and suitable constraints. An algorithm for implementing this methodology will be introduced, and its application will be illustrated with examples.

When dealing with increasing dimensionality in robust clustering based on trimming, it is essential to consider additional non-trivial aspects. One such issue is the proper initialization of the concentration steps typically applied at the algorithmic level. While using random initializations is feasible in principle, a large number of random initializations would be needed to ensure a reliable starting point, highlighting the need for improved initialization schemes. Another important consideration is the possibility of incorporating cellwise trimming rather than just casewise trimming, as trimming entire rows of the data matrix may discard too much valuable information. Some proposals to address these two key issues will be presented in the talk. Finally, it is important to emphasize that we are not attempting to solve the problem of handling extremely high-dimensional cases (limiting ourselves to moderately high dimensions). The problem of extremely high-dimensional cases is complex even without contamination, and making certain assumptions about sparsity may become essential in such situations. Some interesting approaches in this direction, such as those by Kondo et al. (2016) and Brodinovà et al. (2019), will be briefly discussed.

Keywords

Robust clustering
Trimming
Model-based clustering
Cellwise contamination

References

Bouveyron, C., Girard, S., and Schmid, C. (2007). High-Dimensional Data Clustering. Computational Statistics and Data Analysis, 52, 502-519.
Brodinovà, S., Filzmoser, P., Ortner, T., Breiteneder, C., and Rohm, M. (2019). Robust and sparse $K$-means clustering for high-dimensional data. Advances in Data Analysis and Classification, 13, 905-932.
Garcìa-Escudero, L.A., Gordaliza, A., Matràn, C., and Mayo-Iscar, A. (2008). A general trimming approach to robust cluster analysis. Annals of Statistics, 36, 1324-1345.
Garcìa-Escudero, L.A., Gordaliza, A., San Martìn, R., van Aelst, S., and Zamar, R. (2009). Robust linear clustering. Journal of the Royal Statistical Society. Series B: Statistical Methodology, 71, 301-318.
Garc'{i}a-Escudero, L.A., and Mayo-Iscar, A. (2024). Robust clustering based on trimming. Wiley Interdisciplinary Reviews. Computational Statistics, 16, e1658.
Kondo, Y., Salibian-Barrera, M., and Zamar, R. (2016). RSKC: An R package for a robust and sparse $k$-means clustering algorithm. Journal of Statistical Software, 72, 1-26.
Rousseeuw, P. (1985). Multivariate estimation with high breakdown point. In W. Grossmann, G. Pflug, I. Vincze, and W. Wertz (Eds.), Mathematical statistics and applications (Vol. B, pp. 283–297). Dordrecht: Reidel.

Marc Hallin

(Department of Mathematics, Université libre de Bruxelles, Belgium and Czech Academy of Sciences Prague, Czech Republic)

Directional Nonlinear Principal and Independent Components: a measure transportation approach

Traditional Principal and Independent Component Analysis (PCA and ICA) are inherently linear and bidirectional: principal directions, in both cases, are linear combinations defined up to their signs. While this approach is perfectly justified in a linear and symmetric context – essentially, under Gaussian or elliptical symmetry assumptions – a more flexible nonlinear and directional one is more appropriate under more general distributions. Measure transportation is offering the ideal tool for such an extension. Inspired by the measure-transportation-based concepts of Monge-Kantorovich depth and center-outward distribution functions introduced in Chernozhukov et al. (2017) and Hallin et al. (2021), we propose new, nonlinear and directional, notions of principal and independent components (grounded in monotone transports to the uniform over the unit ball $\mathbb{S}_d$ and to the uniform over the unit cube $[-1, 1]^d$, respectively). Principal directions, in our approach, are curves originating from a (data-driven) central set (instead of running through some origin) and maximizing the dispersion of appropriate one-sided curvilinear projections; the underlying transports are not necessarily continuous at the center, making one-sidedness a natural feature. Contrary to the classical linear ones, our nonlinear independent components, under absolute continuity assumptions, always exist.

Keywords

Nonlinear principal components
Nonlinear independent components
Measure transportation
Dimension reduction

References

Chernozhukov, V., Galichon, A., Hallin, M. and Henry, M. (2017). Monge-Kantorovich depth, quantiles, ranks and signs. Annals of Statistics 45, 223-256.
Hallin, M., del Barrio, E., Cuesta-Albertos, J. and Matran, C. (2021). Center-outward distribution functions, quantiles, ranks, and signs in ${\mathbb R}^d$. Annals of Statistics 48, 1139-1165.

Johannes Lederer

(Departament of Mathematics, University of Hamburg)

Personal website

Data Science, Where Statistics Meets Optimization

Modern data science spans computer science, mathematics, and applications. Hence, these different fields need to support and nourish each other in order to reach the full potential of data science. This talk will bring a sharp focus on the roles of statistics and optimization. We will discuss two examples: We start with deep learning, where mathematical statistics can lead to a more profound understanding of computer-science pipelines. We then turn to extremes, where efficient computing algorithms can lead to mathematical models for contemporary data. You will walk away from this talk with a clear understanding of how statistics and optimization can work together to improve data science.

Keywords

Data science
Deep learning
Extreme-value theory
Optimization

References

Lederer, Johannes. Fundamentals of high-dimensional statistics. Springer Texts in Statistics, 2022.
Taheri, Mahsa, Néhémy Lim, and Johannes Lederer. Balancing Statistical and Computational Precision: A General Theory and Applications to Sparse Regression. IEEE Transactions on Information Theory 69.1 (2022): 316-333.
Taheri, Mahsa, Fang Xie, and Johannes Lederer. Statistical Guarantees for Approximate Stationary Points of Simple Neural Networks. arXiv preprint arXiv:2205.04491 (2022).
Lederer, Johannes, and Marco Oesting. Extremes in high dimensions: Methods and scalable algorithms. arXiv preprint arXiv:2303.04258 (2023).

Valentin Todorov

(United Nations Industrial Development Organization (retired), Vienna, Austria)

Personal website

Fortifying Statistical Analyses: Software Tools for Robust Methods

The practical deployment and success of robust methods are inconceivable without reliable and user-friendly software. This necessity was recognized early on, leading to the development of initial robust statistical software in platforms such as SAS, S-Plus, and MATLAB. This talk provides an overview of key software ecosystems, highlighting their features, use cases, and suitability for various audiences. Currently two MATLAB toolboxes for robust statistics are popular: LIBRA, developed by the research groups in robust statistics of the Katholieke Universiteit Leuven (Department of Mathematics) and the University of Antwerp (Department of Mathematics and Computer Science) and FSDA, a joint effort by the University of Parma and the Joint Research Center (JRC) of the European Commission. However, the R programming environment, a free software platform for statistical computing and graphics, has emerged as a viable alternative, offering developers and users extensive capabilities for creating and applying robust methods.

Many researchers have significantly contributed to making robust statistical methods accessible. On CRAN alone, over 700 R packages include the terms “robust” or “outlier” in their names, titles, or descriptions. This abundance of options can be overwhelming for both beginners and experienced users. To address this, we review the 25 most significant R packages for various tasks, briefly describing their functionalities. We also explore several key topics in robust statistics, presenting methodologies, implementations in R, and applications to real-world data. Particular attention is given to robust methods and algorithms suited for high-dimensional data.

While robust methods have long been available in R and MATLAB, Python users have lacked a comprehensive package that offers these methods in a cohesive framework. Only recently the Python package RobPy filled to some extent this gap and we still have to see similar development in Julia.

While robust methods have long been available in R and MATLAB, Python users have only recently gained access to a comprehensive package – RobPy – that offers such methods within a cohesive framework. However, comparable development in Julia remains limited. Despite the progress in robust statistical software, challenges persist, including computational efficiency, ease of use, integration with big data frameworks, and compatibility with machine learning systems.

The future undoubtedly holds exciting advancements for R, MATLAB, Python, and Julia, promising to enrich the statistical community with even more powerful and versatile tools.

Keywords

Robustness
Software
R
MATLAB
Python
Julia

References

A.C. Atkinson, M. Riani, A. Corbellini, D. Perrotta, and V. Todorov. Robust Statistics through the Monitoring Approach: Applications in Regression. Springer-Verlag, Heidelberg (2025). In press.
S. Leyder, J. Raymaekers, P.J. Rousseeuw, T. Servotte, and T. Verdonck. RobPy: A Python package for robust statistical methods, arXiv preprint arXiv:2411.01954 (2024).
M. Riani, D. Perrotta, and F. Torti. FSDA: A MATLAB toolbox for robust analysis and interactive data exploration. Chemometrics and Intelligent Laboratory Systems, 116:17-32 (2012).
V. Todorov. The R package ecosystem for robust statistics. Wiley Interdisciplinary Reviews: Computational Statistics, 16(6):e70007 (2024).