Principal component analysis for interval-valued data with multiple number of observations from each subject

A. Roy

The University of Texas at San Antonio

We propose a novel method to obtain principal components (PCs) by using patterned variance-covariance matrix that partitions the actual variance-covariance matrix into between subject variation, within interval variation, and within multiple observations variation of the data if the data have multiple observations from each subject ([2]). We apply our method to a Face dataset (Table 1, [1]) that was obtained from a study of face recognition patterns for surveillance purposes: sequence of images (video frame from video source) were obtained with six features and with three sequences from each face. The first PC represents the ‘landmark triangle’ specified by three features (do not move) to quantify a face and the second PC represents the ‘movable portions’ specified by the other three features of a face.

We study a ‘circle of correlation plot’ that adroitly gives a quick and nice visual interpretation on how features are correlated with the PCs. ‘Circle of correlation plot’ clearly show the features related to ‘landmark triangle’ contribute mostly to PC1, while features related to ‘movable portions’ contribute mostly to PC2.

We present a comparison study showing the correlation of the original variables and the PC1 generated by some previous methods and our proposed method. We see the component correlations from our method are mostly stronger or as good as in absolute value in comparison to previous methods.

Keywords: Interval-valued data, Circle of correlation plot.

References

  • [1] A. Douzal-Chouakria, L. Billard, and E. Diday (2011). Principal component analysis for interval-valued observations. Stat Anal Data Min 4(2), 229–246.
  • [2] A. Roy (2025). Two-stage principal component analysis on interval-valued data using patterned covariance structures. Adv Data Anal Classif, https://doi.org/10.1007/s11634-025-00650-9