Robust estimation of structural equation models
-
Department of Psychology, University of Zurich, Zurich, Switzerland [max.welz@uzh.ch]
-
Department of Psychology, Harvard University, Cambridge, USA [mair@fas.harvard.edu]
-
Department of Econometrics, Erasmus University Rotterdam, Rotterdam, Netherlands [alfons@ese.eur.nl]
Keywords: Careless responding – Latent variables – Rating scales – Survey data
1 Introduction
Structural equation models (SEMs) are ubiquitous in the social and behavioral sciences, where latent constructs of interest are commonly measured by multiple (discrete and bounded) rating-scale items in a survey [e.g., Bollen, 1989]. In this setting, a SEM can be considered to consist of two components: (i) the measurement model, which links the observable variables to the latent variables, and (ii) the structural model, which models the relationships between the latent variables and contains the parameters of primary interest. However, noisy or low-quality observations might be present in such data, such as (but not limited to) careless responses [for an overview and recent developments, see, e.g., Alfons and Welz, 2024, Welz and Alfons, 2024, and references therin]. Although it is common practice to compute mean scores of the respective observable variables as measurements of the latent constructs, after which robust regression estimators can be applied to estimate the parameters of the structural model [see, e.g., Alfons et al., 2022a, b, for the special case of mediation analysis], this approach has several disadvantages. First, using mean scores implicilty assumes that the observable variables measuring a certain latent construct have equal correlation, which may result in biased estimates. Second, standard inference of robust regression estimators does not take measurement uncertainty into account.
2 Robust estimation
SEMs are commonly estimated based on the covariance or correlation matrix, from which estimates of the model parameters are derived [e.g., Bollen, 1989]. However, the presence of careless responses can introduce a sizable bias in traditional (maximum-likelihood based) estimates of the correlation matrix [cf. Welz et al., 2024a], which is inherited by the SEM estimate, possibly leading to deteriorated model fit and biased estimates of the factor structure. Although the term robustness is used in various ways in the SEM literature [see Alfons and Schley, 2024, for an overview], we are unaware of any robust estimation procedures that can handle observable variables on discrete rating scales. As a remedy, we propose to use a robust estimate of the so-called polychoric correlation matrix [see Welz et al., 2024b, for details]. This robust estimator applies the -estimation framework of Welz [2024] and generalizes the commonly used maximum likelihood estimator of the polychoric correlation matrix at no additional computational cost. We demonstrate through simulation studies and empirical applications that fitting a SEM to a robustly estimated polychoric correlation matrix can substantially improve SEM fit, enhance the accuracy of parameter estimates, and help identify potentially low-quality responses.
3 Implementation
The proposed methodology is implemented in the open source R package robcat (for “ROBust CATegorical data analysis”) [Welz et al., 2025], which is freely available from https://github.com/mwelz/robcat. To optimize computational speed and performance, package robcat is primarily developed using C++ and integrated into R via Rcpp [Eddelbuettel, 2013].
References
- Alfons and Schley [2024] A. Alfons and D.R. Schley. Robust mediation analysis: What we talk about when we talk about robustness. PsyArXiv, 2024. doi: 10.31234/osf.io/2hqdy.
- Alfons and Welz [2024] A. Alfons and M. Welz. Open science perspectives on machine learning for the identification of careless responding: A new hope or phantom menace? Social and Personality Psychology Compass, 18(2):e12941, 2024. doi: 10.1111/spc3.12941.
- Alfons et al. [2022a] A. Alfons, N.Y. Ateş, and P.J.F. Groenen. A robust bootstrap test for mediation analysis. Organizational Research Methods, 25(3):591–617, 2022a. doi: 10.1177/1094428121999096.
- Alfons et al. [2022b] A. Alfons, N.Y. Ateş, and P.J.F. Groenen. Robust mediation analysis: The R package robmed. Journal of Statistical Software, 103(13):1–45, 2022b. doi: 10.18637/jss.v103.i13.
- Bollen [1989] K. A. Bollen. Structural Equations with Latent Variables. John Wiley & Sons, 1989. ISBN 0-471-011-01171-1.
- Eddelbuettel [2013] Dirk Eddelbuettel. Seamless R and C++ Integration with Rcpp. Springer-Verlag, 2013.
- Welz [2024] M. Welz. Robust estimation and inference for categorical data, 2024. arXiv:2403.11954. doi: 10.48550/arXiv.2403.11954.
- Welz and Alfons [2024] M. Welz and A. Alfons. When respondents don’t care anymore: Identifying the onset of careless responding. arXiv:2303.07167, 2024. doi: 10.48550/arXiv.2303.07167.
- Welz et al. [2024a] M. Welz, A. Archimbaud, and A. Alfons. How much carelessness is too much? quantifying the impact of careless responding. PsyArXiv, 2024a. doi: 10.31234/osf.io/8fj6p.
- Welz et al. [2024b] M. Welz, P. Mair, and A. Alfons. Robust estimation of polychoric correlation, 2024. arXiv:2407.18835. doi: 10.48550/arXiv.2407.18835.
- Welz et al. [2025] M. Welz, A. Alfons, and P. Mair. robcat: Robust Categorical Data Analysis, 2025. R package version 0.1.0. URL https://github.com/mwelz/robcat.