Cellwise robust Gaussian mixture model for multi-group data with label noise
P. Puchhammer, I. Wilms and P. Filzmoser
Technische Universität Wien, Maastricht University
Do expert-defined or diagnostically-labeled data groups align with clusters inferred through statistical modeling? If not, where do discrepancies between predefined labels and model-based groupings occur and why? We introduce the multi-group Gaussian mixture model (MG-GMM), the first model developed to investigate these questions. It incorporates prior group information while allowing flexibility to reassign observations to alternative groups based on data-driven evidence.
To this end we model the data based on Gaussian Mixture Models. Let be given data sets from groups consisting of independent observations, for , of the same variables. Assume that each observation from group originates from a Gaussian mixture
for , and where is defined as the multivariate normal density for . Based on the assumption that each individual group is coherent, assume . Thus each group has a main distribution . However, data-driven reassignment of observations outlying in the original group is allowed by the flexibility of the mixture model.
Moreover, our model offers robustness against cellwise outliers that may obscure or distort the underlying group structure based on a penalized likelihood approach. The proposed methodology implemented via an EM-type algorithm provides good simulation results and its potential is illustrated on wine data.
Acknowledgements: Co-funded by the European Union (SEMACRET, Grant Agreement no. 101057741) and UKRI (UK Research and Innovation). Ines Wilms is supported by a grant from the Dutch Research Council (NWO), research program Vidi under the grant number VI.Vidi.211.032.
Keywords: Gaussian mixture models, cellwise outliers, labeled data