International Journal of Statistics in Medical Research, volume 4, issue 3, pages 287-295

Multiple Imputation by Fully Conditional Specification for Dealing with Missing Data in a Large Epidemiologic Study

Yang Liu
Anindya De
Publication typeJournal Article
Publication date2015-08-19
scimago Q2
SJR0.252
CiteScore0.4
Impact factor
ISSN19296029
Statistics and Probability
Health Informatics
Health Professions (miscellaneous)
Health Information Management
Abstract
Missing data commonly occur in large epidemiologic studies. Ignoring incompleteness or handling the data inappropriately may bias study results, reduce power and efficiency, and alter important risk/benefit relationships. Standard ways of dealing with missing values, such as complete case analysis (CCA), are generally inappropriate due to the loss of precision and risk of bias. Multiple imputation by fully conditional specification (FCS MI) is a powerful and statistically valid method for creating imputations in large data sets which include both categorical and continuous variables. It specifies the multivariate imputation model on a variable-by-variable basis and offers a principled yet flexible method of addressing missing data, which is particularly useful for large data sets with complex data structures. However, FCS MI is still rarely used in epidemiology, and few practical resources exist to guide researchers in the implementation of this technique. We demonstrate the application of FCS MI in support of a large epidemiologic study evaluating national blood utilization patterns in a sub-Saharan African country. A number of practical tips and guidelines for implementing FCS MI based on this experience are described.
Haff N., Horn D.M., Bhatkhande G., Sung M., Colling C., Wood W., Robertson T., Gaposchkin D., Simmons L., Yang J., Yeh J., Crum K.L., Hanken K.E., Lauffenburger J.C., Choudhry N.K.
American Heart Journal scimago Q1 wos Q1
2025-07-01 citations by CoLab: 0
Fields E.L., Evans K.N., Liu Y., Thornton N., Long A., Uzzi M., Gaul Z., Buchacz K., King H., Jennings J.M.
AIDS and Behavior scimago Q1 wos Q2
2025-03-24 citations by CoLab: 0
Obeng-Gyasi B., Gokun Y., Elsaid M.I., Chen J., Andersen B.L., Carson W.E., Jhawar S., Anampa J.D., Quiroga D., Skoracki R., Obeng-Gyasi S.
Supportive Care in Cancer scimago Q1 wos Q1
2025-03-21 citations by CoLab: 0 Abstract  
Abstract Purpose Allostatic load, a measure of physiological dysregulation secondary to chronic exposure to socioenvironmental stressors, is associated with 30-day postoperative complications and mortality in patients with breast cancer. This study aimed to examine the association between allostatic load (AL) at diagnosis and development of breast cancer-related lymphedema (BCRL). Methods Patients aged 18 years or older who received surgical treatment for stage I-III breast cancer between 2012 and 2020 were identified from The Ohio State University Cancer Registry. AL was calculated using biomarkers from the cardiovascular, metabolic, renal, and immunologic systems. A high AL was defined as AL > median. Logistic regression analyses examined the association between AL and BRCL, adjusting for sociodemographic, clinical, and treatment factors. Results Among 3,609 patients, 18.86% (n = 681) developed lymphedema. A higher proportion of patients with lymphedema were Black (11.89% vs. 7.38%, p < 0.0001), Medicaid insured (12.19% vs. 6.97%, p < 0.0001), had stage 3 disease (7.05% vs. 1.57%, p < 0.0001), and had a high AL (53.63% vs. 46.90%, p = 0.0018). In adjusted analysis, high AL was associated with higher odds of developing lymphedema than low AL (OR 1.281 95% CI 1.06–1.55). Moreover, a 1-unit increase in AL was associated with 10% higher odds of lymphedema (OR 1.10, 95% CI 1.04–1.16). There was no statistically significant association between AL and severity of lymphedema (OR 1.02, 95% CI 0.82–1.23). Conclusion In this retrospective cohort of breast cancer survivors, high AL at diagnosis was associated with higher odds of developing lymphedema. Future research should elucidate the pathways by which AL influences lymphedema.
Khan Chowdhury M.R., Stub D., Karim M.N., Brennan A., Reid C.M., Nanayakkara S., Lefkovits J., Moni M.A., Islam M.S., Chew D.P., Dinh D., Billah B.
2025-03-06 citations by CoLab: 0 Abstract  
ABSTRACTBackgroundPCI is an effective treatment for coronary artery disease. Pre-procedural 30-day mortality post-PCI risk prediction aids in clinical decision-making and benchmarking hospital performance. This study aimed to identify pre-procedural factors to predict the risk of 30-day mortality following Percutaneous Coronary Intervention (PCI) using machine learning (ML) approaches.MethodsThe study analysed 93,055 consecutive PCI procedures from the Victorian Cardiac Outcomes Registry (VCOR) in Australia to develop a pre-procedural 30-day mortality prediction model. Five ML approaches—Adaptive Booster (AdB), Decision Tree (DT), Gradient Booster (GB), Random Forest (RF), and Extreme Gradient Booster (XGB) were employed, utilizing Logistic Regression (LR) for comparison. Model performance was evaluated using k-fold cross-validation, with metrics including sensitivity, specificity, accuracy, ROC curve, Brier score, and calibration curve.ResultsThe study showed that the RF model outperformed other ML models in predicting 30-day mortality, achieving accuracy of 98.4% and a ROC of 94.3%. Utilizing the SHapley Additive exPlanations method, the RF model identified cardiogenic shock, ejection fraction, acute coronary syndrome, estimated GFR, cardiac arrest, age, mechanical ventricular support, complex lesion, lesion location, BMI, sex, and diabetes as the variables that were associated with 30-day mortality post-PCI. In comparison, the traditional LR model exhibited an accuracy of 98.2% and a ROC of 92.9%.ConclusionA 30-day mortality post-PCI risk prediction model was developed with high accuracy using a ML method. It’s essential to underscore the need for further validation with external data to ensure the applicability of the model to other populations.WHAT IS ALREADY KNOWN ON THIS TOPICA risk-adjustment model for an Australian PCI patient population was previously developed to predict 30-day mortality using traditional regression model.Medical knowledge, patient characteristics, and clinical practices evolve over time, requiring frequent model updates to reflect new evidence, guidelines, and interventionsWHAT THIS STUDY ADDSA machine learning (ML)-based preprocedural risk prediction model for 30-day mortality following percutaneous coronary intervention (PCI) was developed.The ML-based model was compared with the traditional regression model. HOW THIS STUDY MIGHT AFFECT RESEARCH, PRACTICE OR POLICYRisk prediction models aid clinical decision-making, enhance patient counselling, improve care quality, inform healthcare policies, and advance research.
Cheng C., Lian T., Zhu X., Virdone S., Sun K., Camm J., Li X., Goto S., Pieper K., Kayani G., Fang X., Jing Z., Kakkar A.K.
Open Heart scimago Q1 wos Q2 Open Access
2025-02-06 citations by CoLab: 1 Abstract  
BackgroundDifferences in the clinical outcomes and level of risk among Asian versus non-Asian patients with atrial fibrillation (AF) have been sparsely investigated.ObjectiveTo provide a contemporary prospective comparison of outcomes for newly diagnosed patients with AF, between Asian and non-Asian regions.MethodsSix Asian countries (China, Japan, India, Singapore, South Korea and Thailand) and 29 countries outside Asia participated in the Global Anticoagulant Registry in the FIELD-AF (GARFIELD-AF) study. Newly diagnosed patients with AF, enrolled between 2010 and 2016, were followed up for≥2 years. The outcome studies were all-cause, cardiovascular and non-cardiovascular mortality, non-haemorrhagic stroke/systemic embolism (SE), major bleeding. The association of geographical region with clinical outcomes (event rates per 100 person-years) were estimated using multivariable Cox models.Results13 841/52 057 (26.6%) GARFIELD-AF participants were enrolled in Asia. Average age and prevalence of cardiovascular comorbidities were lower than in non-Asian countries and patients at high risk of stroke (ie, CHA2DS2-VASc≥2 excl. sex) were less frequently anticoagulated (60.1% vs 73.2%). Non-vitamin K oral anticoagulant (NOAC) was similar in both regions (∼28%), though Asian patients were more frequently underdosed. Both Asian and non-Asian patients who received NOAC at enrolment experienced lower all-cause mortality and non-haemorrhagic stroke/SE compared with patients on other treatments or none.All-cause mortality, non-cardiovascular mortality and major bleeding were less frequent in patients from Asia versus non-Asia (HR (95% CI): 0.62 (0.39 to 0.99), 0.52 (0.28 to 0.97), 0.58 (0.36 to 0.96), respectively). Associations of moderate-to-severe chronic kidney disease and vascular disease with increased risk of all-cause mortality were stronger in Asian versus non-Asian patients (interaction p values: 0.0250 and 0.0076, respectively). There was notable heterogeneity in oral anticoagulant (OAC) usage within the Asian countries.ConclusionsPatients in Asian countries had a lower risk of all-cause mortality and major bleeding compared to the rest of the world. NOAC had evident benefits for reducing mortality and stroke across populations. Further studies on sociocultural impacts on OAC outcomes are needed.Trial registration numberClinicalTrials.govNCT01090362.
Expósito-Álvarez C., Roldán-Pardo M., Vargas V., Maeda M., Lila M.
Behavioral Sciences scimago Q2 wos Q2 Open Access
2025-02-01 citations by CoLab: 0 PDF Abstract  
(1) Background: Alcohol and/or other drug use problems (ADUPs) and trauma are key risk factors for intimate partner violence (IPV) that should be addressed in perpetrator programs. Participants with ADUPs and trauma histories may display greater difficulties in emotion regulation, which may increase the likelihood of IPV recidivism. The study aimed to examine differences among participants with trauma, ADUPs, ADUPs and trauma, and without such factors in dropout, IPV, and variables related to emotion regulation at pre- and post-intervention; (2) Methods: A sample of 312 men court-mandated to attend a perpetrator program (Contexto Program) was used. Variables related to emotion regulation difficulties included alexithymia, depressive symptomatology, and clinical syndromes. IPV variables included self-reported physical and psychological IPV and IPV recidivism risk assessed by facilitators. Comparisons between groups were made using one-way ANOVA, chi-square tests, and two-way repeated measures ANOVAs; (3) Results: Participants with ADUPs and trauma presented greater difficulties on variables related to emotion regulation, higher risk of IPV at pre-intervention, and higher dropout rates. At post-intervention, all participants improved their emotion regulation skills and reduced IPV recidivism risk, with participants with ADUPs and trauma maintaining a higher risk of IPV; (4) Conclusions: IPV perpetrators with ADUPs and trauma are high-risk participants. Interventions should target trauma and ADUPs to improve their effectiveness.
Junaid K.P., Kiran T., Gupta M., Kishore K., Siwatch S.
Population Health Metrics scimago Q1 wos Q2 Open Access
2025-02-01 citations by CoLab: 0 PDF Abstract  
The multiple imputation by chained equations (MICE) is a widely used approach for handling missing data. However, its robustness, especially for high missing proportions in health indicators, is under-researched. The study aimed to provide a preliminary guideline for the choice of the extent of missing proportion to impute longitudinal health-related data using the MICE method. The study obtained complete data on five mortality-related health indicators of 100 countries (2015–2019) from the Global Health Observatory. Nine incomplete datasets with missing rates from 10 to 90% were generated and imputed using MICE. The robustness of MICE was assessed through three approaches: comparison of means using the Repeated Measures- Analysis of variance, estimation of evaluation metrics (Root mean square error, mean absolute deviation, Bias, and proportionate variance), and visual inspection of box plots of imputed and non-imputed data. The Repeated Measures- Analysis of variance revealed significant differences between complete and imputed data, primarily in imputed data with over 50% missing proportions. Evaluation metrics exhibited ‘high performance’ for the dataset with a 50% missing proportion for various health indicators However, with missing proportions exceeding 70%, the majority of indicators demonstrated a ‘low’ performance level in terms of most evaluation metrics. The visual inspection of the box plot revealed severe variance shrinkage in imputed datasets with missing proportions beyond 70%, corroborating the findings from the evaluation metrics. It demonstrates high robustness up to 50% missing values, with marginal deviations from complete datasets. Caution is warranted for missing proportions between 50 and 70%, as moderate alterations are observed. Proportions beyond 70% lead to significant variance shrinkage and compromised data reliability, emphasizing the importance of acknowledging imputation limitations for practical decision-making.
Mertens M.G., van Kuijk S.M., Beckers L.W., Zmudzki F., Winkens B., Smeets R.J.
2025-02-01 citations by CoLab: 0
Romano M.E., Buckley J.P., Li X., Herbstman J.B., Kannan K., Lee S., Schantz S.L., Trasande L., Karagas M.R., Perera F.
PLoS ONE scimago Q1 wos Q1 Open Access
2025-01-24 citations by CoLab: 0 PDF Abstract  
Previous research indicates that the COVID-19 pandemic catalyzed alterations in behaviors that may impact exposures to environmental endocrine-disrupting chemicals. This includes changes in the use of chemicals found in consumer products, food packaging, and exposure to air pollutants. Within the Environmental influences on Child Health Outcomes (ECHO) program, a national consortium initiated to understand the effects of environmental exposures on child health and development, our objective was to assess whether urinary concentrations of a wide range of potential endocrine-disrupting chemicals varied before and during the pandemic. Drawing from three racially, ethnically, and socioeconomically diverse ECHO cohorts, we assessed key differences in urinary chemical concentrations related to environmental exposures through food packaging, use of disinfectants, personal care products and air pollutants using repeated urine samples in a subset of 47 participants, who contributed a urine sample prior to the pandemic (between October 2018 and February 2020) and a subsequent urine sample after the pandemic began (between March 2020 and April 2021). We measured urinary concentrations of analytes across several chemical groups, including polycyclic aromatic hydrocarbons (PAHs), phthalates/alternative plasticizers, synthetic phenols (parabens, bisphenols, triclosan, benzophenones), organophosphate esters (OPEs), insecticides and fungicides. Multivariable linear mixed models accounting for key covariates and clustering within cohort and across repeated samples were used to estimate the change in urinary analyte concentrations across time points. We observed decreases in urinary concentrations of some PAHs, bisphenols, benzophenones, and triclosan, and increases in specific OPEs. These biomarker data mirror some of the behavior changes reported in our prior work and support the observation that the pandemic-related behavior changes lead to alterations in chemical exposures that have been linked to adverse health outcomes.
Rowh M.A., Giller T.A., Bliton J.N., Smith R.N., Moran T.P.
Injury Epidemiology scimago Q2 wos Q2 Open Access
2025-01-24 citations by CoLab: 0 PDF Abstract  
Abstract Background Cycling promotes health but carries significant injury risks, especially for older adults. In the U.S., cycling fatalities have increased since 1990, with adults over 50 now at the highest risk. As the population ages, the burden of cycling-related trauma is expected to grow, yet age-specific factors associated with mortality risk remain unclear. This study identifies age-specific mortality risk thresholds to inform targeted public health strategies. Methods We conducted a cross-sectional analysis of the National Trauma Data Bank (NTDB) data (2017–2023) on non-motorized cycling injuries. A total of 185,960 records were analyzed using logistic regression with splines to evaluate the relationship between age and mortality risk. The dataset was split into training (80%) and testing (20%) sets. Age thresholds where mortality risk changed were identified, and models were adjusted for injury severity, comorbidities, and helmet use. Results The median patient age was 43 years (IQR 20–58). Four key age thresholds (12, 17, 31, and 69) were identified, with the largest mortality increase after age 69. Our model achieved an AUC of 0.93, surpassing traditional age cutoff models, with 84.6% sensitivity and 88.0% specificity. Conclusions Age is a significant predictor of mortality in cycling trauma, with marked increases in risk during adolescence and for adults over 69. These findings underscore the need for age-targeted interventions, such as improved cycling infrastructure for teens and enhanced safety measures for older adults. Public health initiatives should prioritize these vulnerable age groups to reduce cycling-related mortality.

Top-30

Journals

1
2
3
4
5
6
7
8
9
1
2
3
4
5
6
7
8
9

Publishers

10
20
30
40
50
60
70
80
90
100
10
20
30
40
50
60
70
80
90
100
  • We do not take into account publications without a DOI.
  • Statistics recalculated only for publications connected to researchers, organizations and labs registered on the platform.
  • Statistics recalculated weekly.

Are you a researcher?

Create a profile to get free access to personal recommendations for colleagues and new articles.
Share
Cite this
GOST | RIS | BibTex | MLA
Found error?