This is a short course of the Mathematics for daTa scieNce study plan

Abhik Ghosh

(Indian Staistical Institute)

Bio

Abhik Ghosh is currently an Associate Professor at the Indian Statistical Institute (ISI), Kolkata, India. He received the B.Stat. (Hons.) and M.Stat. (Specialization: Mathematical statistics and probability) degrees from ISI Kolkata in 2010 and 2012, respectively, with gold medals. After completing his PhD in Staistics, from ISI Kolkata in 2015 under the guidance of Prof. Ayanendranath Basu, he joined the University of Oslo, Norway, for his Post-Doctoral Research. His main research interests include robust minimum divergence inferences for different complex data structures, including robust high-dimensional statistical methods, with applications to biostatistics and biometrics. He received the 2017 ISCB Conference Award for Scientists from the International Society of Clinical Biostatistics, the first place of the Jan Tinbergen Award (2013) from the International Statistical Institute, several international travel awards from the Institute of Mathematical Statistics,International Biometric Society and Asian Development Bank, and many more national awards.

Prerequisites

Lasso and its properties under high-dimensional set-ups
Generalizations of Lasso with the SCAD and other penalties
Basic Knowledge about robust inference
M-estimators and minimum divergence estimators
Basic knowledge of the R software

Pre-Course Description

To cover part of the prerequisites there will be a pre-course by Prof. Claudio Agostinelli on the following topics

Basic idea of robust estimation
Classical measure of robustness
- Influence function
- Breakdown point
M-estimators
Minimum divergence estimation
- Density Power Divergence
Examples
References: R.A. Maronna, R.D. Martin, V.J. Yohai and M. Salibián-Barrera (2019) Robust Statistics: Theory and Methods (with R). Wiley.

Course Description

This course will provide an overview of the parametric statistical procedures for high-dimensional data, focusing primarily on the robustness aspects against data contamination. Our main consideration would be the problem of simultaneous variable selection and parameter estimation under the high-dimensional regression set-ups, both under the high-dimensional linear and generalized linear models (GLMs). We will first start with the discussions on the needs of appropriate robust statistical methodologies for deriving stable and correct inferences from noisy high-dimensional data. A brief review of the existing methods for robust and sparse estimation will be covered, with the details for the two major classes of such procedures, namely the class of penalized M-estimators and the regularized minimum divergence estimators with particular emphasis on the density power divergence. The oracle consistency and asymptotic normality of these robust estimators will be discussed under appropriate conditions. In the final part of the course, several practical aspects of these procedures will be dicussed, particularly covering the robust and adaptive procedures to reduce the number of false discoveries and robust variable screening for ultra-high dimensional data in real-life applications.

List of topics

Robust and sparse variable selection
Penalized M-Estimators
Minimum penalized divergence estimators
Density Power divergence and its application in high-dimension
Adaptive Robust Procedures
Robust variable screening
Case studies and R implementations

References & study material

Ghosh, A., & Majumdar, S. (2020). Ultrahigh-dimensional robust and efficient sparse regression using non-concave penalized density power divergence. IEEE Trans. Info. Theory, 66(12), 7812-7827.
Ghosh, A., Jaenada, M., & Pardo, L. (2020). Robust adaptive variable selection in ultra-high dimensional linear regression models. arXiv preprint, arXiv:2004.05470.
Basu, A., Ghosh, A., Jaenada, M., & Pardo, L. (2021). Robust adaptive Lasso in high-dimensional logistic regression with an application to genomic classification of cancer patients. arXiv preprint, arXiv:2109.03028.
Ghosh, A., Ponzi, E., Sandanger, T., & Thoresen, M. (2020). Robust sure independence screening for non-polynomial dimensional generalized linear models. arXiv preprint, arXiv:2005.12068.
Ghosh, A., & Thoresen, M. (2021). A robust variable screening procedure for ultra-high dimensional data. Stat. Meth. Med. Res., 30(8), 1816-1832.
Avella-Medina, M. (2017). Influence functions for penalized M-estimators. Bernoulli, 23, 3178-3196.
Loh, P. L. (2017). Statistical consistency and asymptotic normality for high-dimensional robust M-estimators. Ann. Stat., 45(2), 866-896.
Loh, P. L., and Wainwright, M. J. (2015). Regularized M-estimators with nonconvexity: Statistical and algorithmic theory for local optima. J. Machine Learn. Res., 16(1), 559-616.
Negahban, S. N., Ravikumar, P., Wainwright, M. J., and Yu, B. (2012). A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers. Stat. Sci., 27(4), 538-557.

Schedule

Pre-course

Monday November 21th, 10:30-12:30 room A220
Monday November 28th, 10:30-12:30 room A220

Course

Wednesday November 30th, 16:30-18:30 room A209
Thursday December 1st, 8:30-10:30 room A219
Friday December 2nd, 9:30-11:30 room A209

Details

Venue: Polo Scientifico e Tecnologico F. Ferrari
Language: English
The participation is free. Please send an email to Prof. Claudio Agostinelli. This is important to book the approrpiate room.
For further information, please contact Prof. Claudio Agostinelli

Robust Statistical Inference for High-Dimensional Data

Abhik Ghosh

Prerequisites

Pre-Course Description

Course Description

List of topics

References & study material

Schedule

Pre-course

Course

Details

Material

(Restricted access, user: RHDS2022)

20221121

20221128

20221130

20221201

20221202