Fast Approximate MM-Estimation for the Robust Bootstrap
-
School of Mathematics and Statistics, University of Sydney, Sydney, Australia [martin.huang@sydney.edu.au]
-
Faculty of Science and Engineering, Macquarie University, Sydney, Australia [samuel.muller@mq.edu.au]
-
School of Mathematics and Statistics, University of Sydney, Sydney, Australia [garth.tarr@sydney.edu.au]
Keywords: Approximate MM-Estimation – robust bootstrap – model selection
1 Background
Classical estimation methods in linear models are highly sensitive to outliers in the response variable, often leading to biased or unreliable results. Additionally, the bootstrap can exacerbate this issue by inadvertently increasing the proportion of outliers in resampled datasets. A solution to this problem is stratifying the dataset based on the sample’s residuals, enabling control over the proportion of outliers in the bootstrapped sample. To account for outliers in model selection, it is common for researchers to modify the model selection criterion, such as combining a robust penalised criterion and a robust conditional expected prediction loss function, estimated using the stratified bootstrap [Muller and Welsh, 2005, Rabbi et al., 2022].
2 Fast approximate MM-estimation to speed up the bootstrap
While these robust statistical methods provide solutions, many rely on computationally intensive MM-estimators, which require convergence guarantees and can be impractical for large datasets [Yohai, 1987]. The MM-estimator is typically solved through the iteratively reweighted least squares algorithm (IRLS) and is the main source of computational cost. As the robust bootstrap requires fitting the MM-estimator for each bootstrapped sample, this cost is magnified for a considerable number of repetitions.
To reduce the number of MM-estimators, we present a novel fast approximate MM-estimation method (FAMM) tailored for the bootstrap, significantly reducing computational time while maintaining robust model selection performance.
In this talk, we show that we can approximate the bootstrapped samples’ MM-estimator via a weighted least squares estimator, where the weights are extracted from an initial MM-estimator fitted on the full dataset. With this approximation, we can implement a robust bootstrap that only requires one MM-estimator, and hence substantially reduces the computation time.
Through an extensive simulation study, we demonstrate FAMM’s advantages in terms of efficiency, breakdown point, and accuracy across various regression settings. Our findings provide practical insights into improving robustness in resampling-based inference while ensuring scalability for large-scale applications.
References
- Muller and Welsh [2005] S. Muller and A. H Welsh. Outlier Robust Model Selection in Linear Regression. Journal of the American Statistical Association, 100(472):1297–1310, 2005.
- Rabbi et al. [2022] F. Rabbi, A. Khalil, I. Khan, M. A. Almuqrin, U. Khalil, and M. Andualem. Robust model selection using the out-of-bag bootstrap in linear regression. Scientific Reports, 12(1):10992, 2022.
- Yohai [1987] V. J. Yohai. High Breakdown-Point and High Efficiency Robust Estimates for Regression. The Annals of Statistics, 15(2):642–656, 1987.