Robust multiple imputation with GAM

M. Templ1
Abstract

Missing data imputation is a crucial step in data analysis and an established practice in data science. When the relationship between a response and its predictors cannot be adequately captured through linear transformations, interaction terms, or polynomial expansions, nonlinear imputation methods become essential. Generalized Additive Models (GAM) and their extension, Generalized Additive Models for Location, Scale, and Shape (GAMLSS), provide a flexible framework by allowing each distributional parameter—such as mean, variance, skewness, and kurtosis—to be modeled as a function of predictors. However, standard GAM and GAMLSS imputation approaches are sensitive to outliers, leading to biased imputations that can be distorted by both representative outliers (extreme but valid data points) and non-representative outliers (measurement errors). Robust imputation methods mitigate these effects, ensuring greater reliability in handling missing data.

This work presents a novel robust imputation algorithm that addresses three key challenges: (1) it employs a robust bootstrap procedure to account for model uncertainty in the random imputation process, (2) it integrates robust fitting techniques (based on a BACON-EM approach) to enhance stability, and (3) it explicitly accounts for imputation uncertainty in a robust manner. The proposed algorithm is highly adaptable, accommodating complex models for variables with missing values. Using real-world datasets and extensive simulations, our method – imputeRobust – demonstrates superior performance compared to existing GAMLSS-based imputation techniques, particularly in the presence of outliers. A remaining challenge lies in extending robust imputation methods to categorical variables.

  • 1

    School of Economics, University of Applied Sciences and Arts, Northwestern Switzerland, Olten, Switzerland [matthias.templ@fhnw.ch]

Keywords: Multiple Imputation – Nonlinear estimation – GAM – Robustness

References