Software Defect Prediction Based on Feature Subset Selection and Ensemble Classification
Two primary issues have emerged in the machine learning and data mining community: how to deal with imbalanced data and how to choose appropriate features. These are of particular concern in the software engineering domain, and more specifically the field of software defect prediction. This research highlights a procedure which includes a feature selection technique to single out relevant attributes, and an ensemble technique to handle the class-imbalance issue. In order to determine the advantages of feature selection and ensemble methods we look at two potential scenarios: (1) Ensemble models constructed from the original datasets, without feature selection; (2) Ensemble models constructed from the reduced datasets after feature selection has been applied. Four feature selection techniques are employed: Principal Component Analysis (PCA), Pearson’s correlation, Greedy Stepwise Forward selection, and Information Gain (IG). The aim of this research is to assess the effectiveness of feature selection techniques using ensemble techniques. Five datasets, obtained from the PROMISE software depository, are analyzed; tentative results indicate that ensemble methods can improve the model's performance without the use of feature selection techniques. PCA feature selection and bagging based on K-NN perform better than both bagging based on SVM and boosting based on K-NN and SVM, and feature selection techniques including Pearson’s correlation, Greedy stepwise, and IG weaken the ensemble models’ performance.
Top-30
Journals
|
1
2
|
|
|
F1000Research
2 publications, 15.38%
|
|
|
Complex & Intelligent Systems
1 publication, 7.69%
|
|
|
Engineering Applications of Artificial Intelligence
1 publication, 7.69%
|
|
|
Lecture Notes in Networks and Systems
1 publication, 7.69%
|
|
|
Connection Science
1 publication, 7.69%
|
|
|
AIP Conference Proceedings
1 publication, 7.69%
|
|
|
1
2
|
Publishers
|
1
2
3
4
5
6
|
|
|
Institute of Electrical and Electronics Engineers (IEEE)
6 publications, 46.15%
|
|
|
Springer Nature
2 publications, 15.38%
|
|
|
F1000 Research
2 publications, 15.38%
|
|
|
Elsevier
1 publication, 7.69%
|
|
|
Taylor & Francis
1 publication, 7.69%
|
|
|
AIP Publishing
1 publication, 7.69%
|
|
|
1
2
3
4
5
6
|
- We do not take into account publications without a DOI.
- Statistics recalculated weekly.