Robust marginal functional PCA for repeated functional measurements

B. Brune1 U. Radojičić1 S. Greven2 and P. Filzmoser1
  • 1

    Institute of Statistics and Mathematical Methods in Economics, TU Wien, Vienna, Austria [Peter.Filzmoser@tuwien.ac.at]

  • 2

    Chair of Statistics, School of Business and Economics, Humboldt-Universität zu Berlin

Keywords: Dependent Functional Data – Second-Generation Functional Data – Functional Outliers

We consider so-called second-generation functional data [Koner and Staicu, 2023] where functional data are sampled under assumptions that deviate from independence. More specifically, we consider functions sampled in longitudinal or repeated measurement designs. This form of functional data can be interpreted as realization of a bivariate stochastic process X(s,t), s𝒮,t𝒯, 𝒮,𝒯. We assume the process X has a mean function μ:𝒮×𝒯 and covariance function

c({s,t},{s,t})=𝔼(X(s,t)X(s,t))-μ(s,t)μ(s,t).

One can think of 𝒮 as the spatial domain, and of 𝒯 as the time domain. The goal could then be to represent the functional data in a lower-dimensional space. However, instead of analyzing two-dimensional surfaces, it is simpler to separate the analysis of the dynamics in 𝒮 and 𝒯, as suggested in Park and Staicu [2015] and Chen et al. [2017], and to consider a decomposition of the form

X(s,t)-μ(s,t)=k=1ξk(t)ϕk(s).

The basis functions ϕk capture the dynamics in the frequency domain 𝒮 and are obtained from an eigendecomposition of the marginal covariance function

Σ(s,s)=𝒯c({s,t},{s,t})𝑑t. (1)

The resulting eigenpairs {λk,ϕk}k are referred to as marginal eigenvalues and functional principal components. They are optimal in the sense that they minimize the mean squared average reconstruction error, where the averaging is with respect to the time domain. Thus, the time-dynamics are captured by functions {ξk}k, formed by the scores on the corresponding components

ξk(t)=X(,t),ϕk𝒮=𝒮ϕk(s)X(s,t)𝑑s.

The score functions are random functions and they can be treated as functional data as well. We denote their covariance functions by γk(t,t)=𝔼(ξk(t)ξk(t)), for k1, with eigenpairs {ηk,ψk}1 for each k1. Thus, the score functions ξk admit a Karhunen-Loève expansion

ξk(t)=k=1ζkψk(t), for k1.

The model estimation is performed in three steps:

Step 1

Estimation of the mean function μ by employing a bivariate smoothing algorithm.

Step 2

Estimation of the marginal covariance function (1) and the eigenpairs {λk,ϕk}k1.

Step 3

Estimation of the scores ξik(tij), followed by smoothing, to obtain estimates of the score functions ξik(t) at all points t𝒯.

In order to achieve robust estimates, we propose to replace the estimation of the components suggested in Park and Staicu [2015] and Chen et al. [2017] by robust counterparts. Based on the robust estimates, outlier diagnostic tools are introduced, allowing to identify outlying observations. The proposed estimation procedure is outlined in detail in the presentation, and simulation results and real data examples will underline the usefulness of this procedure.

References

  • Chen et al. [2017] K. Chen, P. Delicado, and H.-G. Müller. Modelling function-valued stochastic processes, with applications to fertility dynamics. Journal of the Royal Statistical Society Series B: Statistical Methodology, 79(1):177–196, 2017.
  • Koner and Staicu [2023] S. Koner and A.-M. Staicu. Second-generation functional data. Annual Review of Statistics and Its Application, 10(1):547–572, 2023.
  • Park and Staicu [2015] S.Y. Park and A.-M. Staicu. Longitudinal functional data analysis. Stat, 4(1):212–226, 2015.