Cellwise outliers: from identification to regression

J. Raymaekersa and P.J. Rousseeuwb

aUniversity of Antwerp, Belgium, b KU Leuven, Belgium

Modern datasets often contain cellwise outliers, where some individual entries of the data matrix are contaminated. This distinction is important because unaffected cells in a contaminated row may still contain valuable information. We briefly discuss the cellHandler method for identifying cellwise outliers. We explain how it led to the development of cellMCD, a cellwise robust extension of the Minimum Covariance Determinant estimator, for estimating a location and scatter matrix under cellwise contamination. The main focus of the presentation is a new cellwise robust regression methodology called cellLTS. The method achieves the first breakdown result for cellwise robust regression and is specifically designed to provide reliable out-of-sample predictions. A real-data example of modelling cancer rates in counties in the US illustrates the capabilities of cellLTS of providing fresh insights.

Keywords: Cellwise outliers, MCD estimator.