Distribution-free tests for lossless feature selection in classification and regression
Publication type: Journal Article
Publication date: 2024-11-26
scimago Q2
wos Q2
SJR: 0.505
CiteScore: 2.0
Impact factor: 1.3
ISSN: 11330686, 18638260
Abstract
We study the problem of lossless feature selection for a d-dimensional feature vector $$X=(X^{(1)},\dots ,X^{(d)})$$ and label Y for binary classification as well as nonparametric regression. For an index set $$S\subset \{1,\dots ,d\}$$ , consider the selected |S|-dimensional feature subvector $$X_S=(X^{(i)}, i\in S)$$ . If $$L^*$$ and $$L^*(S)$$ stand for the minimum risk based on X and $$X_S$$ , respectively, then $$X_S$$ is called lossless if $$L^*=L^*(S)$$ . For classification, the minimum risk is the Bayes error probability, while in regression, the minimum risk is the residual variance. We introduce nearest-neighbor-based test statistics to test the hypothesis that $$X_S$$ is lossless. This test statistic is an estimate of the excess risk $$L^*(S)-L^*$$ . Surprisingly, estimating this excess risk turns out to be a functional estimation problem that does not suffer from the curse of dimensionality in the sense that the convergence rate does not depend on the dimension d. For the threshold $$a_n=\log n/\sqrt{n}$$ , the corresponding tests are proved to be consistent under conditions on the distribution of (X, Y) that are significantly milder than in previous work. Also, our threshold is universal (dimension independent), in contrast to earlier methods where for large d the threshold becomes too large to be useful in practice.
Found
Nothing found, try to update filter.
Are you a researcher?
Create a profile to get free access to personal recommendations for colleagues and new articles.
Metrics
0
Total citations:
0
Cite this
GOST |
RIS |
BibTex
Cite this
GOST
Copy
Györfi L. et al. Distribution-free tests for lossless feature selection in classification and regression // Test. 2024.
GOST all authors (up to 50)
Copy
Györfi L., Linder T., Walk H. Distribution-free tests for lossless feature selection in classification and regression // Test. 2024.
Cite this
RIS
Copy
TY - JOUR
DO - 10.1007/s11749-024-00958-2
UR - https://link.springer.com/10.1007/s11749-024-00958-2
TI - Distribution-free tests for lossless feature selection in classification and regression
T2 - Test
AU - Györfi, László
AU - Linder, Tamás
AU - Walk, Harro
PY - 2024
DA - 2024/11/26
PB - Springer Nature
SN - 1133-0686
SN - 1863-8260
ER -
Cite this
BibTex (up to 50 authors)
Copy
@article{2024_Györfi,
author = {László Györfi and Tamás Linder and Harro Walk},
title = {Distribution-free tests for lossless feature selection in classification and regression},
journal = {Test},
year = {2024},
publisher = {Springer Nature},
month = {nov},
url = {https://link.springer.com/10.1007/s11749-024-00958-2},
doi = {10.1007/s11749-024-00958-2}
}