Cellwise outliers: from identification to regression
J. Raymaekers and P.J. Rousseeuw
University of Antwerp, Belgium,
KU Leuven, Belgium
Modern datasets often contain cellwise outliers, where some individual entries of the data matrix are contaminated. This distinction is important because unaffected cells in a contaminated row may still contain valuable information. We briefly discuss the cellHandler method for identifying cellwise outliers. We explain how it led to the development of cellMCD, a cellwise robust extension of the Minimum Covariance Determinant estimator, for estimating a location and scatter matrix under cellwise contamination. The main focus of the presentation is a new cellwise robust regression methodology called cellLTS. The method achieves the first breakdown result for cellwise robust regression and is specifically designed to provide reliable out-of-sample predictions. A real-data example of modelling cancer rates in counties in the US illustrates the capabilities of cellLTS of providing fresh insights.
Keywords: Cellwise outliers, MCD estimator.