Machine Learning with Oversampling and Undersampling Techniques: Overview Study and Experimental Results

Publication typeProceedings Article
Publication date2020-04-28
Abstract
Data imbalance in Machine Learning refers to an unequal distribution of classes within a dataset. This issue is encountered mostly in classification tasks in which the distribution of classes or labels in a given dataset is not uniform. The straightforward method to solve this problem is the resampling method by adding records to the minority class or deleting ones from the majority class. In this paper, we have experimented with the two resampling widely adopted techniques: oversampling and undersampling. In order to explore both techniques, we have chosen a public imbalanced dataset from kaggle website Santander Customer Transaction Prediction and have applied a group of well-known machine learning algorithms with different hyperparamters that give best results for both resampling techniques. One of the key findings of this paper is noticing that oversampling performs better than undersampling for different classifiers and obtains higher scores in different evaluation metrics.
Found 
Found 

Top-30

Journals

5
10
15
20
25
IEEE Access
22 publications, 4.39%
Applied Sciences (Switzerland)
13 publications, 2.59%
Lecture Notes in Networks and Systems
12 publications, 2.4%
Scientific Reports
10 publications, 2%
Communications in Computer and Information Science
10 publications, 2%
Lecture Notes in Computer Science
9 publications, 1.8%
Electronics (Switzerland)
6 publications, 1.2%
Procedia Computer Science
6 publications, 1.2%
Mathematics
4 publications, 0.8%
PLoS ONE
4 publications, 0.8%
Engineering Applications of Artificial Intelligence
4 publications, 0.8%
Expert Systems with Applications
4 publications, 0.8%
Sensors
4 publications, 0.8%
Remote Sensing
4 publications, 0.8%
Computers and Industrial Engineering
4 publications, 0.8%
Journal of Personalized Medicine
3 publications, 0.6%
Information (Switzerland)
3 publications, 0.6%
Artificial Intelligence in Data and Big Data Processing
3 publications, 0.6%
Knowledge and Information Systems
3 publications, 0.6%
Neurocomputing
3 publications, 0.6%
Atmosphere
3 publications, 0.6%
Forests
2 publications, 0.4%
Education and Information Technologies
2 publications, 0.4%
Information Sciences
2 publications, 0.4%
Diagnostics
2 publications, 0.4%
Advances in Computational Intelligence and Robotics
2 publications, 0.4%
Engineering Reports
2 publications, 0.4%
Applied Intelligence
2 publications, 0.4%
Multimedia Tools and Applications
2 publications, 0.4%
5
10
15
20
25

Publishers

20
40
60
80
100
120
140
160
180
Institute of Electrical and Electronics Engineers (IEEE)
163 publications, 32.53%
Springer Nature
112 publications, 22.36%
MDPI
68 publications, 13.57%
Elsevier
61 publications, 12.18%
Association for Computing Machinery (ACM)
10 publications, 2%
Taylor & Francis
10 publications, 2%
Wiley
8 publications, 1.6%
SAGE
6 publications, 1.2%
Public Library of Science (PLoS)
5 publications, 1%
American Chemical Society (ACS)
5 publications, 1%
Cold Spring Harbor Laboratory
5 publications, 1%
IGI Global
4 publications, 0.8%
IOP Publishing
3 publications, 0.6%
American Society of Civil Engineers (ASCE)
3 publications, 0.6%
Emerald
3 publications, 0.6%
JMIR Publications
3 publications, 0.6%
SPIE-Intl Soc Optical Eng
3 publications, 0.6%
Frontiers Media S.A.
2 publications, 0.4%
Hindawi Limited
2 publications, 0.4%
Institution of Engineering and Technology (IET)
2 publications, 0.4%
Oxford University Press
2 publications, 0.4%
EDP Sciences
2 publications, 0.4%
OAE Publishing Inc.
2 publications, 0.4%
Mary Ann Liebert
1 publication, 0.2%
World Scientific
1 publication, 0.2%
The Royal Society
1 publication, 0.2%
Onkoloski Institut Ljubljana/Institute of Oncology Ljubljana
1 publication, 0.2%
AIP Publishing
1 publication, 0.2%
Research Square Platform LLC
1 publication, 0.2%
20
40
60
80
100
120
140
160
180
  • We do not take into account publications without a DOI.
  • Statistics recalculated weekly.

Are you a researcher?

Create a profile to get free access to personal recommendations for colleagues and new articles.
Metrics
502
Share