This is a short course of the Mathematics for daTa scieNce study plan

Abhik Ghosh

Abhik Ghosh

(Indian Staistical Institute)
Personal website
Bio Abhik Ghosh is currently an Associate Professor at the Indian Statistical Institute (ISI), Kolkata, India. He received the B.Stat. (Hons.) and M.Stat. (Specialization: Mathematical statistics and probability) degrees from ISI Kolkata in 2010 and 2012, respectively, with gold medals. After completing his PhD in Staistics, from ISI Kolkata in 2015 under the guidance of Prof. Ayanendranath Basu, he joined the University of Oslo, Norway, for his Post-Doctoral Research. His main research interests include robust minimum divergence inferences for different complex data structures, including robust high-dimensional statistical methods, with applications to biostatistics and biometrics. He received the 2017 ISCB Conference Award for Scientists from the International Society of Clinical Biostatistics, the first place of the Jan Tinbergen Award (2013) from the International Statistical Institute, several international travel awards from the Institute of Mathematical Statistics,International Biometric Society and Asian Development Bank, and many more national awards.

Prerequisites

  • Lasso and its properties under high-dimensional set-ups
  • Generalizations of Lasso with the SCAD and other penalties
  • Basic Knowledge about robust inference
  • M-estimators and minimum divergence estimators
  • Basic knowledge of the R software

Pre-Course Description

To cover part of the prerequisites there will be a pre-course by Prof. Claudio Agostinelli on the following topics

  • Basic idea of robust estimation
  • Classical measure of robustness
    • Influence function
    • Breakdown point
  • M-estimators
  • Minimum divergence estimation
    • Density Power Divergence
  • Examples
  • References: R.A. Maronna, R.D. Martin, V.J. Yohai and M. Salibián-Barrera (2019) Robust Statistics: Theory and Methods (with R). Wiley.

Course Description

This course will provide an overview of the parametric statistical procedures for high-dimensional data, focusing primarily on the robustness aspects against data contamination. Our main consideration would be the problem of simultaneous variable selection and parameter estimation under the high-dimensional regression set-ups, both under the high-dimensional linear and generalized linear models (GLMs). We will first start with the discussions on the needs of appropriate robust statistical methodologies for deriving stable and correct inferences from noisy high-dimensional data. A brief review of the existing methods for robust and sparse estimation will be covered, with the details for the two major classes of such procedures, namely the class of penalized M-estimators and the regularized minimum divergence estimators with particular emphasis on the density power divergence. The oracle consistency and asymptotic normality of these robust estimators will be discussed under appropriate conditions. In the final part of the course, several practical aspects of these procedures will be dicussed, particularly covering the robust and adaptive procedures to reduce the number of false discoveries and robust variable screening for ultra-high dimensional data in real-life applications.

List of topics

  • Robust and sparse variable selection
  • Penalized M-Estimators
  • Minimum penalized divergence estimators
  • Density Power divergence and its application in high-dimension
  • Adaptive Robust Procedures
  • Robust variable screening
  • Case studies and R implementations

References & study material

  1. Ghosh, A., & Majumdar, S. (2020). Ultrahigh-dimensional robust and efficient sparse regression using non-concave penalized density power divergence. IEEE Trans. Info. Theory, 66(12), 7812-7827.
  2. Ghosh, A., Jaenada, M., & Pardo, L. (2020). Robust adaptive variable selection in ultra-high dimensional linear regression models. arXiv preprint, arXiv:2004.05470.
  3. Basu, A., Ghosh, A., Jaenada, M., & Pardo, L. (2021). Robust adaptive Lasso in high-dimensional logistic regression with an application to genomic classification of cancer patients. arXiv preprint, arXiv:2109.03028.
  4. Ghosh, A., Ponzi, E., Sandanger, T., & Thoresen, M. (2020). Robust sure independence screening for non-polynomial dimensional generalized linear models. arXiv preprint, arXiv:2005.12068.
  5. Ghosh, A., & Thoresen, M. (2021). A robust variable screening procedure for ultra-high dimensional data. Stat. Meth. Med. Res., 30(8), 1816-1832.
  6. Avella-Medina, M. (2017). Influence functions for penalized M-estimators. Bernoulli, 23, 3178-3196.
  7. Loh, P. L. (2017). Statistical consistency and asymptotic normality for high-dimensional robust M-estimators. Ann. Stat., 45(2), 866-896.
  8. Loh, P. L., and Wainwright, M. J. (2015). Regularized M-estimators with nonconvexity: Statistical and algorithmic theory for local optima. J. Machine Learn. Res., 16(1), 559-616.
  9. Negahban, S. N., Ravikumar, P., Wainwright, M. J., and Yu, B. (2012). A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers. Stat. Sci., 27(4), 538-557.

Schedule

Pre-course

  • Monday November 21th, 10:30-12:30 room A220
  • Monday November 28th, 10:30-12:30 room A220

Course

  • Wednesday November 30th, 16:30-18:30 room A209
  • Thursday December 1st, 8:30-10:30 room A219
  • Friday December 2nd, 9:30-11:30 room A209

Details

  • Venue: Polo Scientifico e Tecnologico F. Ferrari
  • Language: English
  • The participation is free. Please send an email to Prof. Claudio Agostinelli. This is important to book the approrpiate room.
  • For further information, please contact Prof. Claudio Agostinelli

Material

All slides are available at www.isical.ac.in/~abhik.ghosh/teach.html

(Restricted access, user: RHDS2022)

20221121

20221128

20221130

20221201

20221202