Robust clustering for matrix-variate data
M. Mayrhofer, L. A. García Escudero and A. Mayo Íscar
TU Wien, University of Valladolid
Matrix-valued data arise naturally when a sample has more structure than a vector can carry, e.g., a grayscale image is a matrix of pixel intensities. They are often transformed into high-dimensional vectors (by stacking rows or columns), which can limit many multivariate data analysis procedures. Alternatively, they can be treated as samples from a matrix-variate distribution, which enables simultaneous modeling of row and column covariances via a Kronecker-product covariance structure.
We propose MTCLUST, a new robust clustering method for matrix-valued data, combining ideas from the recently introduced matrix minimum covariance determinant (MMCD) estimators for robust mean and covariance estimation for matrix-valued data of [1] and trimmed clustering of [2]. MTCLUST trims the most outlying samples and assigns the regular samples to one of groups. For each group, it estimates the mean as well as the row and column covariances. Finally, it restricts the ratio between the maximum and minimum eigenvalues of the row and column covariance matrices, ensuring the problem is well-defined and simultaneously avoiding ill-conditioned covariances as well as spurious clusters.
In the single-group case, MTCLUST yields a condition-number-regularized version of MMCD. In the multi-group setting, MTCLUST has computational advantages over vectorized methods due to lower sample-size requirements, especially for initialization.
Keywords: Condition number regularization, Trimmed maximum likelihood, Spurious clusters
References
- [1] Mayrhofer, M., Radojičić, U., & Filzmoser, P. (2025). Robust Covariance Estimation and Explainable Outlier Detection for Matrix-Valued Data. Technometrics, 67(3), 516–530.
- [2] García-Escudero, L. A., Gordaliza, A., Matrán, C., & Mayo-Iscar, A. (2008). A general trimming approach to robust cluster analysis. The Annals of Statistics, 36(3), 1324-1345.