Graphical Tools for Visualizing Cellwise and Casewise Outliers
-
Section of Statistics and Data Science, Department of Mathematics, KU Leuven, Belgium;
[Mehdi.Hirari@kuleuven.be, Mia.Hubert@kuleuven.be, Peter.Rousseeuw@kuleuven.be]
Keywords: Outlier Detection, Cellwise Outliers
Dimension reduction is a fundamental task in high-dimensional data analysis. However, the presence of outliers poses a significant challenge, as classical methods are highly sensitive to anomalies and can produce biased results. To address this issue, robust methods aim to downweight outlying cells or observations, ensuring a more reliable estimation process.
The outputs of these robust methods are often used for outlier detection through graphical tools, helping to identify and inspect anomalous cases. For example, the cellmap [Rousseeuw and Van den Bossche, 2018] highlights outlying cells based on standardized residuals, where large residuals indicate potential anomalies. Additionally, metrics such as score distance and orthogonal distance are commonly used to visualize the outlyingness of a case.
Recently, the weights generated by robust algorithms have gained attention as an alternative way to detect outliers. A recent example is cellPCA [Centofanti et al., 2024], an approach to cellwise and casewise robust PCA. It assigns weights to individual cells and observations, downweights anomalous ones, and imputes their values to achieve a robust fit. Building on this, the enhanced outlier map has been introduced as a new diagnostic tool, using these cellwise and casewise weights to visualize the degree and nature of the anomalies.
To further exploit the potential of weights in graphical displays for outlier detection, we propose incorporating imputed values into the analysis. While weights and residuals provide insights into the final state of the algorithm, tracking imputed values reveals how data points are adjusted throughout the estimation process. This approach enhances transparency and provides deeper insights into the nature of outliers.
References
- Centofanti et al. [2024] F. Centofanti, M. Hubert, and P. J. Rousseeuw. Robust principal components by casewise and cellwise weighting. arXiv preprint arXiv:2408.13596.
- Rousseeuw and Van den Bossche [2018] P. J. Rousseeuw and W. Van den Bossche. Detecting deviating data cells. Technometrics, 60:135–145, 2018.