ACM Transactions on Asian and Low-Resource Language Information Processing, volume 24, issue 4, pages 1-26

New Bagging Based Ensemble Learning Algorithm Distinguishing Short and Long Texts for Document Classification

Publication typeJournal Article
Publication date2025-03-23
scimago Q2
SJR0.535
CiteScore3.6
Impact factor1.8
ISSN23754699, 23754702
Abstract

To improve the classification accuracy of ensemble learning, a new bootstrap aggregating (Bagging) ensemble learning algorithm distinguishing short and long texts for document classification is proposed. First, the performances of different typical deep learning methods on processing long and short texts are compared, and the optimal base classifiers for long and short texts are selected respectively. Second, the random sampling method in traditional bagging classification algorithms is improved, and a threshold group based random sampling method which can balance the numbers of long and short text subsets is proposed. Moreover, to improve the model inference speed and classification accuracy, the training of long and short text subsets is realized by combining the knowledge distillation theory. Finally, the sample classification probabilities on different categories are considered, and the category similarity information is combined with the traditional weighted voting classifier ensemble method to avoid the problem that the sampling process may decrease the accuracy. Experimental results on multiple datasets show that the algorithm can effectively improve the accuracy of document classification and has obvious advantages over typical deep learning algorithms and ensemble learning algorithms.

Found 
  • We do not take into account publications without a DOI.
  • Statistics recalculated only for publications connected to researchers, organizations and labs registered on the platform.
  • Statistics recalculated weekly.

Are you a researcher?

Create a profile to get free access to personal recommendations for colleagues and new articles.
Share
Cite this
GOST | RIS | BibTex | MLA
Found error?