Open Access
Open access
volume 8 issue 3 pages 361-373

Effective and Efficient Handling of Missing Data in Supervised Machine Learning

Publication typeJournal Article
Publication date2025-09-01
scimago Q1
SJR1.370
CiteScore11.9
Impact factor
ISSN26667649
Abstract
The prevailing consensus in statistical literature is that multiple imputation is generally the most suitable method for addressing missing data in statistical analyses, whereas a complete case analysis is deemed appropriate only when the rate of missingness is negligible or when the missingness mechanism is missing completely at random (MCAR). This study investigates the applicability of this consensus within the context of supervised machine learning, with particular emphasis on the interactions between the imputation method, missingness mechanism, and missingness rate. Furthermore, we examine the time efficiency of these “state-of-the-art” imputation methods considering the time-sensitive nature of certain machine learning applications. Utilizing ten real-world datasets, we introduced missingness at rates ranging from approximately 5%–75% under the MCAR, missing at random (MAR), and missing not at random (MNAR) mechanisms. We subsequently address missing data using five methods: complete case analysis (CCA), mean imputation, hot deck imputation, regression imputation, and multiple imputation (MI). Statistical tests are conducted on the machine learning outcomes, and the findings are presented and analyzed. Our investigation reveals that in nearly all scenarios, CCA performs comparably to MI, even with substantial levels of missingness under the MAR and MNAR conditions and with missingness in the output variable for regression problems. Under some conditions, CCA surpasses MI in terms of its performance. Thus, given the considerable computational demands associated with MI, the application of CCA is recommended within the broader context of supervised machine learning, particularly in big-data environments.
Found 
Found 

Top-30

Journals

1
Technology in Society
1 publication, 100%
1

Publishers

1
Elsevier
1 publication, 100%
1
  • We do not take into account publications without a DOI.
  • Statistics recalculated weekly.

Are you a researcher?

Create a profile to get free access to personal recommendations for colleagues and new articles.
Metrics
1
Share
Cite this
GOST |
Cite this
GOST Copy
Popoola P. A. et al. Effective and Efficient Handling of Missing Data in Supervised Machine Learning // Data Science and Management. 2025. Vol. 8. No. 3. pp. 361-373.
GOST all authors (up to 50) Copy
Tapamo J., Assounga A. G. H. Effective and Efficient Handling of Missing Data in Supervised Machine Learning // Data Science and Management. 2025. Vol. 8. No. 3. pp. 361-373.
RIS |
Cite this
RIS Copy
TY - JOUR
DO - 10.1016/j.dsm.2024.12.002
UR - https://linkinghub.elsevier.com/retrieve/pii/S2666764924000663
TI - Effective and Efficient Handling of Missing Data in Supervised Machine Learning
T2 - Data Science and Management
AU - Tapamo, Jules-Raymond
AU - Assounga, Alain Guy Honoré
PY - 2025
DA - 2025/09/01
PB - Elsevier
SP - 361-373
IS - 3
VL - 8
SN - 2666-7649
ER -
BibTex |
Cite this
BibTex (up to 50 authors) Copy
@article{2025_Popoola,
author = {Jules-Raymond Tapamo and Alain Guy Honoré Assounga},
title = {Effective and Efficient Handling of Missing Data in Supervised Machine Learning},
journal = {Data Science and Management},
year = {2025},
volume = {8},
publisher = {Elsevier},
month = {sep},
url = {https://linkinghub.elsevier.com/retrieve/pii/S2666764924000663},
number = {3},
pages = {361--373},
doi = {10.1016/j.dsm.2024.12.002}
}
MLA
Cite this
MLA Copy
Popoola, Peter Ayokunle, et al. “Effective and Efficient Handling of Missing Data in Supervised Machine Learning.” Data Science and Management, vol. 8, no. 3, Sep. 2025, pp. 361-373. https://linkinghub.elsevier.com/retrieve/pii/S2666764924000663.