Kernel Outlier Detection

C.H. Dagidir

{}^{1}

M. Hubert

{}^{1}

and P. Rousseeuw

{}^{1}

${}^{1}$

Department of Mathematics, KU Leuven, Leuven, Belgium [canhakan.dagidir@kuleuven.be]

Keywords: Outlier Detection – Kernel Transformation – Projection Depth

High-dimensional datasets with complex distributions and nonlinear relationships are increasingly common in the literature. These characteristics bring significant challenges for outlier detection due to violation of statistical assumptions and high computational costs. Various outlier detection methods employ kernel transformations to handle these complexities. Among them, the One-Class Support Vector Machine (OCSVM) [Schölkopf et al., 1999] is a fundamental method that implicitly models the shape of data in a high-dimensional feature space. However, kernel transformations are sensitive to outliers in the original space, which can distort the feature space representations. Additionally, OCSVM’s performance highly depends on its cost hyperparameter, and its computational complexity remains a concern for large datasets, even without considering the kernel transformation step.

An alternative approach, Kernel Minimum Regularized Covariance Determinant (KMRCD) [Schreurs et al., 2021], combines kernel transformation with robust statistics. This allows it to operate beyond the assumption of elliptical distributions and obtain a robust feature space representation. However, KMRCD relies on the assumption that the transformed space retains a roughly elliptical distribution, and its computational cost scales significantly with dataset size.

To address these challenges, we propose a lightweight and parameter-free alternative that does not rely on distributional assumptions. Our method builds on the principle that outliers deviate significantly along at least one direction. We compute Stahel-Donoho outlyingness [Stahel, 1981] of data points using four complementary sets of directions: (1) directions that connect each data point to a robust center estimate, (2) a random subset of directions connecting pairs of data points, (3) directions derived from the feature space, and (4) randomly generated directions. Additional structures can also be employed, but this combination achieves a balance between accuracy and computational efficiency. We then introduce a simple yet effective ensembling strategy to robustly combine the obtained outlyingness values. This results in a flexible framework that can easily be extended.

On synthetic datasets, our method successfully captures the true structure of non-outlying observations in the original space, similar to SVM but with improved robustness. Additionally, we conduct detailed experiments on real-world image datasets (MNIST, MNIST-C, Fashion-MNIST). Our results demonstrate consistently competitive performance against a wide selection of outlier detection methods.

References

Schölkopf et al. [1999] B. Schölkopf, R. Williamson, A. Smola, J Shawe-Taylor, and J. Platt. Support Vector Method for Novelty Detection. Advances in Neural Information Processing Systems, 12, 1999.
Schreurs et al. [2021] J. Schreurs, I. Vranckx, M. Hubert, J. A. K. Suykens, and P. J. Rousseeuw. Outlier detection in non-elliptical data by kernel MRCD. Statistics and Computing, 31(5):66, 2021. ISSN 0960-3174, 1573-1375. doi: 10.1007/s11222-021-10041-7.
Stahel [1981] W. A. Stahel. Robust Estimation: Infinitesimal Optimality and Covariance Matrix Estimators. PhD thesis, ETH, Zürich, 1981.