Sparse Weighted Multi-view Clustering
N. Nianga, M. L. Ndaoa and M. Ouattarab
aConservatoire National des Arts et Métiers CNAM, Paris bUniversité San Pédro, Côte d’Ivoire
We address multi-view clustering in which individuals are described by variables that are partitioned into several homogeneous and meaningful blocks. As the blocks are intended to be homogeneous, preserving this block homogeneity would help reveal the underlying structure of the individuals. So at a first level, the individuals are clustered according to each block separately, and the resulting partitions (contributory partitions) are aggregated in a consensual partition in a second step Niang and Ouattara [2019]. Therefore, this multi-view clustering issue is reformulated as a consensus of partitions problem. The choice of the first step clustering method is not addressed here. We only focus on the aggregation of the obtained partitions. These partitions are seen as categorical variables and then associated with indicator matrices and connectivity matrices whose entries are 1 if two individuals are in the same cluster and 0 if not. Using connectivity matrices avoids the label switching issue. It has been pointed out that simple consensus methods, such as CSPA (Cluster based Similarity Partitioning Algorithm) Strehl and Ghosh [2002], can yield unstable results when the contributory partitions are significantly different and if some of them are highly correlated.This redundancy could bias the final partition towards these correlated partitions. To address these limitations, weighted consensus methods have then been proposed with methods such as Weighted Non Matrix Factorization (WNMF) Li and Ding [2008]. We propose a sparse weighted consensus method based on Constrained Singular Value Decomposition Guillemot et al. [2019] and the RV correlation coefficient Robert and Escoufier [1976] between the connectivity matrices to find an unique partition from contributory ones. The results on simulated data as well as real ones show the relevance of the proposed method particularly when dealing with redundant partitions. In addition, the RV-based STATIS Lavit et al. [1994] method allows visualisation of the multi-view data as well as the clustering results.

Keywords: Sparsity ; Multi-view Clustering ; RV Coefficient

References

  • Guillemot et al. [2019] Vincent Guillemot, Derek Beaton, Arnaud Gloaguen, Tommy Löfstedt, Brian Levine, Nicolas Raymond, Arthur Tenenhaus, and Hervé Abdi. A constrained singular value decomposition method that integrates sparsity and orthogonality. PloS one, 14(3):e0211463, 2019.
  • Lavit et al. [1994] Christine Lavit, Yves Escoufier, Robert Sabatier, and Pierre Traissac. The act (statis method). Computational Statistics & Data Analysis, 18(1):97–119, 1994.
  • Li and Ding [2008] Tao Li and Chris Ding. Weighted consensus clustering. In Proceedings of the 2008 SIAM International Conference on Data Mining, pages 798–809. SIAM, 2008.
  • Niang and Ouattara [2019] Ndèye Niang and Mory Ouattara. Weighted consensus clustering for multiblock data. In Actes SFC 2019, Paris, France, September 2019. URL https://cnam.hal.science/hal-02471611.
  • Robert and Escoufier [1976] Paul Robert and Yves Escoufier. A unifying tool for linear multivariate statistical methods: the rv-coefficient. Journal of the Royal Statistical Society Series C: Applied Statistics, 25(3):257–265, 1976.
  • Strehl and Ghosh [2002] Alexander Strehl and Joydeep Ghosh. Cluster ensembles—a knowledge reuse framework for combining multiple partitions. Journal of machine learning research, 3(Dec):583–617, 2002.