Open Access
Clipper: An efficient cluster-based data pruning technique for biomedical data to increase the accuracy of machine learning model prediction
Mahmut Burak Karadeniz
1
,
Ebru Efeoglu
2
,
Burak Çelik
1
,
Adem Kocyigit
1, 3
,
Bahattin Türetken
1
1
3
Department of Electronics and Automation, Bilecik Seyh Edebali University, Bilecik, 11000, Turkiye
|
Publication type: Journal Article
Publication date: 2025-06-01
scimago Q1
wos Q2
SJR: 1.050
CiteScore: 11.8
Impact factor: 4.3
ISSN: 11108665, 20904754
Abstract
The exponential rise in clinical research costs can potentially be mitigated by half through the implementation of machine learning-driven efficient data processing techniques. Traditional methods like data preprocessing and hyperparameter tuning, which are effective for model optimization, often introduce complexities that can diminish the benefits of machine learning integration. To overcome this issue, we present Clipper: a novel, cluster-based data pruning approach designed specifically for biomedical data, aiming to enhance the predictive accuracy of machine learning models. Clipper’s key advantage lies in its ability to automate the data pruning process, optimizing accuracy without the need for manual hyperparameter adjustments—a typically cumbersome aspect of machine learning tasks. Upon comprehensive comparative analysis, the proposed Clipper methodology demonstrates superior performance across various medical and biological datasets. Our experiments reveal Clipper’s consistent superiority over baseline models, with significant accuracy improvements: 44% for Heart Disease, 7% for Breast Cancer, 40% for Parkinson’s, and 20% for Raisin classification. Specifically, the model achieves remarkable predictive accuracy, with classification rates of 99.5% for Heart Disease, 99.64% for Breast Cancer, 99.47% for Parkinson’s Disease, and 93% for Raisin Classification, thereby substantially outperforming contemporary state-of-the-art computational techniques. The empirical evidence suggests that Clipper serves as an effective accuracy enhancer for baseline models, eliminating the need for parameter tuning or complex preprocessing steps. Furthermore, Clipper produces robust outputs even at very low split rates, where baseline models typically perform poorly.
Found
Nothing found, try to update filter.
Are you a researcher?
Create a profile to get free access to personal recommendations for colleagues and new articles.
Metrics
0
Total citations:
0
Cite this
GOST |
RIS |
BibTex
Cite this
GOST
Copy
Karadeniz M. B. et al. Clipper: An efficient cluster-based data pruning technique for biomedical data to increase the accuracy of machine learning model prediction // Egyptian Informatics Journal. 2025. Vol. 30. p. 100641.
GOST all authors (up to 50)
Copy
Karadeniz M. B., Efeoglu E., Çelik B., Kocyigit A., Türetken B. Clipper: An efficient cluster-based data pruning technique for biomedical data to increase the accuracy of machine learning model prediction // Egyptian Informatics Journal. 2025. Vol. 30. p. 100641.
Cite this
RIS
Copy
TY - JOUR
DO - 10.1016/j.eij.2025.100641
UR - https://linkinghub.elsevier.com/retrieve/pii/S1110866525000349
TI - Clipper: An efficient cluster-based data pruning technique for biomedical data to increase the accuracy of machine learning model prediction
T2 - Egyptian Informatics Journal
AU - Karadeniz, Mahmut Burak
AU - Efeoglu, Ebru
AU - Çelik, Burak
AU - Kocyigit, Adem
AU - Türetken, Bahattin
PY - 2025
DA - 2025/06/01
PB - Elsevier
SP - 100641
VL - 30
SN - 1110-8665
SN - 2090-4754
ER -
Cite this
BibTex (up to 50 authors)
Copy
@article{2025_Karadeniz,
author = {Mahmut Burak Karadeniz and Ebru Efeoglu and Burak Çelik and Adem Kocyigit and Bahattin Türetken},
title = {Clipper: An efficient cluster-based data pruning technique for biomedical data to increase the accuracy of machine learning model prediction},
journal = {Egyptian Informatics Journal},
year = {2025},
volume = {30},
publisher = {Elsevier},
month = {jun},
url = {https://linkinghub.elsevier.com/retrieve/pii/S1110866525000349},
pages = {100641},
doi = {10.1016/j.eij.2025.100641}
}