Missing data is a problem that many researchers face, particularly when using large surveys. Information is lost when analyzing a dataset with missing data, leading to less precise estimates. Multiple imputation (MI) using chained equations is a way to handle the missing value while using all available information given in the dataset to predict the missing values. In this study, we used data from the Survey of Midlife Development in the United States (MIDUS), a large national study of health and well-being that contains missing data. We created a complete dataset using MI. Following that we performed multiple regression analyses probing the relationships between sociodemographic and psychosocial factors and numbers of chronic conditions. Importantly, we compared the results from analyses using imputed data to those from the original dataset. We found that using multiple imputation substantially increased sample size from 3,204 to 7,108 participants and decreased standard errors by an average of 4.81%. This research supports the use of appropriate methods of multiple imputation to facilitate more accurate estimates of associations between disease risk factors and health outcomes in survey research.
Mark Daniel Ward, Department of Statistics, Purdue University
Peterson, Ashley and Martin, Emily
"Filling in the Gaps: Using Multiple Imputation to Improve Statistical Accuracy,"
Rose-Hulman Undergraduate Mathematics Journal: Vol. 17
, Article 11.
Available at: http://scholar.rose-hulman.edu/rhumj/vol17/iss2/11