A stacked approach for chained equations multiple imputation incorporating the substantive model

Research paper by Lauren Beesley, Jeremy M G Taylor

Indexed on: 11 Feb '21Published on: 10 Oct '19Published in: arXiv - Statistics - Methodology


Multiple imputation by chained equations (MICE) has emerged as a popular approach for handling missing data. A central challenge for applying MICE is determining how to incorporate outcome information into covariate imputation models, particularly for complicated outcomes. Often, we have a particular analysis model in mind, and we would like to ensure congeniality between the imputation and analysis models. We propose a novel strategy for directly incorporating the analysis model into the handling of missing data. In our proposed approach, multiple imputations of missing covariates are obtained without using outcome information. We then utilize the strategy of imputation stacking, where multiple imputations are stacked on top of each other to create a large dataset. The analysis model is then incorporated through weights. Instead of applying multiple imputation combining rules, we obtain parameter estimates by fitting a weighted version of the analysis model on the stacked dataset. We propose a novel estimator for obtaining standard errors for this stacked and weighted analysis. Our estimator is based on the observed data information principle in Louis (1982) and can be applied for analyzing stacked multiple imputations more generally. Our approach for analyzing stacked multiple imputations is the first well-motivated method that can be easily applied for a wide variety of standard analysis models and missing data settings. In simulations, the proposed strategy produced unbiased parameter estimates when the analysis model was correctly specified. We developed an R package, StackImpute, allowing this imputation approach to be easily implemented for many standard analysis models.