Comparison of multiple imputation and maximum likelihood methods and evaluating their efficacy when handling missing data in mental health surveys
Abstract
This research was an analytical study specifically an experimental design and the purpose was to compare the two methods, multiple imputation and maximum likelihood when dealing with missing data in mental health surveys. The objectives were identifying the type of missing data and assess the extent of variation in model estimations after implementing maximum likelihood and multiple imputation. Secondary data was used for this study and it was downloaded from the website of World Health Organization NCD microdata repository, https://extranet.who.int/ncdsmicrodata/index.php/catalog/6/data- dictionary/F2?file_name=UGH2003_public_use.
Binary logistic regression model was applied to test the hypothesis of nature of missingness. The model was generated for each method multiple imputation using chained equations and full information maximum likelihood so as to determine which one produced more accurate results. Major findings included the missing data being Missing At Random (MAR), multiple imputation using chained equations handles missing values for categorical variables exceptionally well unlike the full information maximum likelihood that only handles linear models therefore, making it very cumbersome to build code for it to handle missing data for non-linear models. Therefore, Multiple Imputation using Chained Equations (MICE) was the most accurate method in handling missing data for the variables of interest compared to Full Information Maximum Likelihood (FIML). Researchers in the mental health are advised to only use Full Information Maximum Likelihood (FIML) for linear models when handling missing data and use Multiple Imputation using Chained Equations (MICE) for any model appropriate for the data.