CORKI: A Correlation-Driven Imputation Method for Partial Annotation Scenarios in Multi-label Clinical Problems

Ricardo Santos 1, 2
Bruno Ribeiro 1
Isabel Curioso 1
Marília Barandas 1, 2
André V. Carreiro 1
Hugo Gamboa 1, 2
Pedro Coelho 3, 4
José Fragata 3, 4
Inês Sousa 1
2
 
LIBPhys-UNL, NOVA School of Science and Technology, Caparica, Portugal
3
 
Comprehensive Health Research Center, NOVA Medical School, Lisboa, Portugal
4
 
Hospital de Santa Marta, Centro Hospitalar Universitário Lisboa Central, Lisboa, Portugal
Publication typeBook Chapter
Publication date2025-01-01
scimago Q4
SJR0.182
CiteScore1.1
Impact factor
ISSN18650929, 18650937
Abstract
Multi-label classification tasks are relevant in healthcare, as data samples are commonly associated with multiple interdependent, non-mutually exclusive outcomes. Incomplete label information often arises due to unrecorded outcomes at planned checkpoints, varying disease testing across patients, collection constraints, or human error. Dropping partially annotated samples can reduce data size, introduce bias, and compromise accuracy. To address these issues, this study introduces CORKI (Correlation-Optimised and Robust K Nearest Neighbours Imputation for Multi-label Classification), a data-centric method for partial annotation imputation in Multi-label data. This method employs proximity measures and an optional weighting term for outcome prevalence to tackle imbalanced labels. Additionally, it leverages different modalities of correlation that consider not only variable values but also missingness patterns. CORKI’s performance was compared with a domain-knowledge-based rule system and the standard sample-dropping approach on three public and one private cardiothoracic surgery datasets with diverse missing label rates. CORKI yielded performances comparable to those of the domain-knowledge approach, establishing itself as a reliable method, while being highly generalizable. Moreover, it was able to maintain imputation accuracy in demanding partial annotation scenarios, presenting drops of only 5% for missing rates of 50%.
Found 

Are you a researcher?

Create a profile to get free access to personal recommendations for colleagues and new articles.
Metrics
0
Share
Cite this
GOST |
Cite this
GOST Copy
Santos R. et al. CORKI: A Correlation-Driven Imputation Method for Partial Annotation Scenarios in Multi-label Clinical Problems // Communications in Computer and Information Science. 2025. pp. 3-18.
GOST all authors (up to 50) Copy
Santos R., Ribeiro B., Curioso I., Barandas M., V. Carreiro A., Gamboa H., Coelho P., Fragata J., Sousa I. CORKI: A Correlation-Driven Imputation Method for Partial Annotation Scenarios in Multi-label Clinical Problems // Communications in Computer and Information Science. 2025. pp. 3-18.
RIS |
Cite this
RIS Copy
TY - GENERIC
DO - 10.1007/978-3-031-74640-6_1
UR - https://link.springer.com/10.1007/978-3-031-74640-6_1
TI - CORKI: A Correlation-Driven Imputation Method for Partial Annotation Scenarios in Multi-label Clinical Problems
T2 - Communications in Computer and Information Science
AU - Santos, Ricardo
AU - Ribeiro, Bruno
AU - Curioso, Isabel
AU - Barandas, Marília
AU - V. Carreiro, André
AU - Gamboa, Hugo
AU - Coelho, Pedro
AU - Fragata, José
AU - Sousa, Inês
PY - 2025
DA - 2025/01/01
PB - Springer Nature
SP - 3-18
SN - 1865-0929
SN - 1865-0937
ER -
BibTex
Cite this
BibTex (up to 50 authors) Copy
@incollection{2025_Santos,
author = {Ricardo Santos and Bruno Ribeiro and Isabel Curioso and Marília Barandas and André V. Carreiro and Hugo Gamboa and Pedro Coelho and José Fragata and Inês Sousa},
title = {CORKI: A Correlation-Driven Imputation Method for Partial Annotation Scenarios in Multi-label Clinical Problems},
publisher = {Springer Nature},
year = {2025},
pages = {3--18},
month = {jan}
}