Distribution-free tests for lossless feature selection in classification and regression

Publication typeJournal Article
Publication date2024-11-26
scimago Q2
wos Q2
SJR0.505
CiteScore2.0
Impact factor1.3
ISSN11330686, 18638260
Abstract
We study the problem of lossless feature selection for a d-dimensional feature vector $$X=(X^{(1)},\dots ,X^{(d)})$$ and label Y for binary classification as well as nonparametric regression. For an index set $$S\subset \{1,\dots ,d\}$$ , consider the selected |S|-dimensional feature subvector $$X_S=(X^{(i)}, i\in S)$$ . If $$L^*$$ and $$L^*(S)$$ stand for the minimum risk based on X and $$X_S$$ , respectively, then $$X_S$$ is called lossless if $$L^*=L^*(S)$$ . For classification, the minimum risk is the Bayes error probability, while in regression, the minimum risk is the residual variance. We introduce nearest-neighbor-based test statistics to test the hypothesis that $$X_S$$ is lossless. This test statistic is an estimate of the excess risk $$L^*(S)-L^*$$ . Surprisingly, estimating this excess risk turns out to be a functional estimation problem that does not suffer from the curse of dimensionality in the sense that the convergence rate does not depend on the dimension d. For the threshold $$a_n=\log n/\sqrt{n}$$ , the corresponding tests are proved to be consistent under conditions on the distribution of (X, Y) that are significantly milder than in previous work. Also, our threshold is universal (dimension independent), in contrast to earlier methods where for large d the threshold becomes too large to be useful in practice.
Found 

Are you a researcher?

Create a profile to get free access to personal recommendations for colleagues and new articles.
Metrics
0
Share
Cite this
GOST |
Cite this
GOST Copy
Györfi L. et al. Distribution-free tests for lossless feature selection in classification and regression // Test. 2024.
GOST all authors (up to 50) Copy
Györfi L., Linder T., Walk H. Distribution-free tests for lossless feature selection in classification and regression // Test. 2024.
RIS |
Cite this
RIS Copy
TY - JOUR
DO - 10.1007/s11749-024-00958-2
UR - https://link.springer.com/10.1007/s11749-024-00958-2
TI - Distribution-free tests for lossless feature selection in classification and regression
T2 - Test
AU - Györfi, László
AU - Linder, Tamás
AU - Walk, Harro
PY - 2024
DA - 2024/11/26
PB - Springer Nature
SN - 1133-0686
SN - 1863-8260
ER -
BibTex
Cite this
BibTex (up to 50 authors) Copy
@article{2024_Györfi,
author = {László Györfi and Tamás Linder and Harro Walk},
title = {Distribution-free tests for lossless feature selection in classification and regression},
journal = {Test},
year = {2024},
publisher = {Springer Nature},
month = {nov},
url = {https://link.springer.com/10.1007/s11749-024-00958-2},
doi = {10.1007/s11749-024-00958-2}
}