Sequential imputation for models with latent variables assuming latent ignorability
Corresponding Author
Lauren J. Beesley
Department of Biostatistics, University of Michigan, 1415 Washington Heights, Ann Arbor, MI, 48109 USA
Author to whom correspondence should be addressed.Search for more papers by this authorJeremy M. G. Taylor
Department of Biostatistics, University of Michigan, 1415 Washington Heights, Ann Arbor, MI, 48109 USA
Search for more papers by this authorRoderick J. A. Little
Department of Biostatistics, University of Michigan, 1415 Washington Heights, Ann Arbor, MI, 48109 USA
Search for more papers by this authorCorresponding Author
Lauren J. Beesley
Department of Biostatistics, University of Michigan, 1415 Washington Heights, Ann Arbor, MI, 48109 USA
Author to whom correspondence should be addressed.Search for more papers by this authorJeremy M. G. Taylor
Department of Biostatistics, University of Michigan, 1415 Washington Heights, Ann Arbor, MI, 48109 USA
Search for more papers by this authorRoderick J. A. Little
Department of Biostatistics, University of Michigan, 1415 Washington Heights, Ann Arbor, MI, 48109 USA
Search for more papers by this authorSummary
Models that involve an outcome variable, covariates, and latent variables are frequently the target for estimation and inference. The presence of missing covariate or outcome data presents a challenge, particularly when missingness depends on the latent variables. This missingness mechanism is called latent ignorable or latent missing at random and is a generalisation of missing at random. Several authors have previously proposed approaches for handling latent ignorable missingness, but these methods rely on prior specification of the joint distribution for the complete data. In practice, specifying the joint distribution can be difficult and/or restrictive. We develop a novel sequential imputation procedure for imputing covariate and outcome data for models with latent variables under latent ignorable missingness. The proposed method does not require a joint model; rather, we use results under a joint model to inform imputation with less restrictive modelling assumptions. We discuss identifiability and convergence-related issues, and simulation results are presented in several modelling settings. The method is motivated and illustrated by a study of head and neck cancer recurrence. Imputing missing data for models with latent variables under latent-dependent missingness without specifying a full joint model.
Supporting Information
Filename | Description |
---|---|
anzs12264-sup-0001-Supinfo.pdfPDF document, 4.4 MB |
Appendix S1. Ignorability under a joint model (properties 1–5). Appendix S2. Motivating the algorithm and performing parameter draws. Appendix S3. Bias of complete case analysis under LMAR. Appendix S4. Simulation study. Appendix S5. Example 1: identifiability for joint normal models. Appendix S6. Example 2: identifiability under LMAR for a mixture of GLMs. Appendix S7. Implementation of the SMC imputation algorithm. |
Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.
References
- Bartlett, J.W., Seaman, S.R., White, I.R. & Carpenter, J.R. (2014). Multiple imputation of covariates by fully conditional specification: accomodating the substantive model. Statistical Methods in Medical Research 24, 462–487.
- Beesley, L.J., Bartlett, J.W., Wolf, G.T. & Taylor, J.M.G. (2016). Multiple imputation of missing covariates for the Cox proportional hazards cure model. Statistics in Medicine 35, 4701–4717.
- Chung, H., Flaherty, B.P. & Schafer, J.L. (2006). Latent class logistic regression: application to marijuana use and attitudes among high school seniors. Journal of the Royal Statistical Society 169, 723–743.
10.1111/j.1467-985X.2006.00419.x Google Scholar
- Duffy, S., Taylor, J.M.G., Terrell, J., (2008). IL-6 predicts recurrence among head and neck cancer patients. Cancer 113, 750–757.
- Follmann, D. & Wu, M.C. (1995). An approximate generalized linear model with random effects for informative missing data. Biometrics 51, 151–168.
- Frangakis, C.E. & Rubin, D.B. (1999). Addressing complications of intention-to-treat analysis in the combined presence of all-or-none treatment-noncompliance and subsequent missing outcomes. Biometrika 86, 365–379.
- Gelman, A. (2004). Parameterization and bayesian modeling. Journal of the American Statistical Association 99, 537–545.
- Gelman, A. & Rubin, D.B. (1992). Inference from iterative simulation using multiple sequences. Statistical Science 7, 457–511.
10.1214/ss/1177011136 Google Scholar
- Giusti, C. & Little, R.J.A. (2011). An analysis of nonignorable nonresponse to income in a survey with a rotating panel design. Journal of Official Statistics 27, 211–229.
- Harel, O. (2003). Strategies for data analysis with two types of missing values. Ph.D. thesis, Pennsylvania State University.
- Harel, O. & Schafer, J.L. (2009). Partial and latent ignorability in missing-data problems. Biometrika 96, 37–50.
- Hughes, R.A., White, I.R., Seaman, S.R., Carpenter, J.R., Tilling, K. & Sterne, J.A.C. (2014). Joint modeling rationale for chained equations. BMC Medical Research Methodology 14, 1–10.
- Jung, H. (2007). A latent-class selection model for nonignorable missing data. Ph.D. thesis, Pennsylvania State University.
- Little, R.J.A. (1995). Modeling the drop-out mechanism in repeated-measures studies. Journal of the American Statistical Association 90, 1112–1121.
- Little, R.J. (2009a). Comments on: Missing data methods in longitudinal studies: a review. Test 18, 47–50.
- Little, R.J. (2009b). Selection and pattern-mixture models. In Longitudinal Data Analysis, eds. G. Fitzmaurice, M. Davidian, G. Verbeke & G. Molenberghs, chap. 18, pp. 409–431New York, NY: Taylor & Francis Group.
- Little, R.J.A. & Rubin, D.B. (2002). Statistical Analysis with Missing Data, 2nd edn. Hoboken, NJ: John Wiley and Sons, Inc.
10.1002/9781119013563 Google Scholar
- Liu, J., Gelman, A., Hill, J., Su, Y.S. & Kropko, J. (2013). On the stationary distribution of iterative imputation. Biometrika 101, 155–173.
- Lu, Z.L., Zhang, Z. & Lubke, G. (2011). Bayesian inference for growth mixture models with latent class dependent missing data. Multivariate Behavioral Research 46, 567–597.
- McCulloch, C.E., Neuhaus, J.M. & Olin, R.L. (2016). Biased and unbiased estimation in longitudinal studies with informative visit processes. Biometrics 72, 1315–1324.
- Meng, X.L. (1994). Multiple-imputation inferences with uncongenial sources of input. Statistical Science 9, 538–573.
- Miao, W., Ding, P. & Geng, Z. (2016). Identifiability of normal and normal mixture models with nonignorable missing data. Journal of the American Statistical Association 111, 1673–1683.
- Molenberghs, G., Beunckens, C. & Sotto, C. (2008). Every missing not at random model has got a missing at random counterpart with equal fit. Journal of the Royal Statistical Society (Series B) 70, 371–388.
- Peterson, L.A., Bellile, E.L., Wolf, G.T., Virani, S., Shuman, A.G. & Taylor, J.M.G. (2016). Cigarette use, comorbidities, and prognosis in a prospective head and neck squamous cell carcinoma population. Head and Neck 38, 1810–1820.
- Raghunathan, T.E. (2001). A multivariate technique for multiply imputing missing values using a sequence of regression models. Survey Methodology 27, 85–95.
- Rubin, D.B. (1987). Multiple Imputation for Nonresponse in Surveys, 1st edn. New York, NY: John Wiley and Sons, Inc.
10.1002/9780470316696 Google Scholar
- Schafer, J.L. (1997). Imputation of missing covariates under a multivariate linear mixed model. Technical report, Pennsylvania State University.
- Schafer, J.L. & Yucel, R.M. (2002). Computational strategies for multivariate linear mixed-effects models with missing values. Journal of Computational and Graphical Statistics 11, 437–457.
- Sy, J.P. & Taylor, J.M.G. (2000). Estimation in a Cox proportional hazards cure model. Biometrics 56, 227–236.
- Taylor, J.M.G. (1995). Semiparametric estimation in failure time mixture models. Biometrics 51, 899–907.
- Van Buuren, S. (2007). Multiple imputation of discrete and continuous data by fully conditional specification. Statistical Methods in Medical Research 16, 219–242.
- Van Buuren, S., Brand, J.P.L., Groothuis-Oudshoorn, C.G.M. & Rubin, D.B. (2006). Fully conditional specification in multivariate imputation. Journal of Statistical Computation and Simulation 76, 1049–1064.
- Wang, S., Shao, J. & Kwang Kim J. (2014). An instrumental variable approach for identification and estimation with nonignorable nonresponse. Statistica Sinica 24, 1097–1116.
- White, I.R. & Royston, P. (2009). Imputing missing covariate values for the Cox model. Statistics in Medicine 28, 1982–1998.
- Wu, M.C. & Carroll, R.J. (1988). Estimation and comparison of changes in the presence of informative right censoring by modeling the censoring process. Biometrics 44, 175–188.
- Yang, X., Lu, J. & Shoptaw, S. (2008). Imputation-based strategies for clinical trial longitudinal data with nonignorable missing values. Statistics in Medicine 27, 2826–2849.