volume 57 issue 4 pages 710-716

Influence of Varying Training Set Composition and Size on Support Vector Machine-Based Prediction of Active Compounds

Publication typeJournal Article
Publication date2017-04-10
scimago Q1
wos Q1
SJR1.467
CiteScore9.8
Impact factor5.3
ISSN15499596, 1549960X
General Chemistry
Computer Science Applications
General Chemical Engineering
Library and Information Sciences
Abstract
Support vector machine (SVM) modeling is one of the most popular machine learning approaches in chemoinformatics and drug design. The influence of training set composition and size on predictions currently is an underinvestigated issue in SVM modeling. In this study, we have derived SVM classification and ranking models for a variety of compound activity classes under systematic variation of the number of positive and negative training examples. With increasing numbers of negative training compounds, SVM classification calculations became increasingly accurate and stable. However, this was only the case if a required threshold of positive training examples was also reached. In addition, consideration of class weights and optimization of cost factors substantially aided in balancing the calculations for increasing numbers of negative training examples. Taken together, the results of our analysis have practical implications for SVM learning and the prediction of active compounds. For all compound classes under study, top recall performance and independence of compound recall of training set composition was achieved when 250–500 active and 500–1000 randomly selected inactive training instances were used. However, as long as ∼50 known active compounds were available for training, increasing numbers of 500–1000 randomly selected negative training examples significantly improved model performance and gave very similar results for different training sets.
Found 
Found 

Top-30

Journals

1
2
3
4
5
6
ACS Omega
6 publications, 16.22%
Journal of Chemical Information and Modeling
3 publications, 8.11%
Applied Sciences (Switzerland)
1 publication, 2.7%
International Journal of Molecular Sciences
1 publication, 2.7%
Journal of Computer-Aided Molecular Design
1 publication, 2.7%
Journal of Soils and Sediments
1 publication, 2.7%
Acta Neurochirurgica
1 publication, 2.7%
Cell Reports Physical Science
1 publication, 2.7%
Energy
1 publication, 2.7%
International Journal of Human Computer Studies
1 publication, 2.7%
Ecotoxicology and Environmental Safety
1 publication, 2.7%
Drug Discovery Today
1 publication, 2.7%
Artificial Intelligence in the Life Sciences
1 publication, 2.7%
Chemical Biology and Drug Design
1 publication, 2.7%
Journal of Medicinal Chemistry
1 publication, 2.7%
Chemical Reviews
1 publication, 2.7%
Journal of Proteome Research
1 publication, 2.7%
Expert Opinion on Drug Discovery
1 publication, 2.7%
Artificial Intelligence Chemistry
1 publication, 2.7%
Chemical Research in Toxicology
1 publication, 2.7%
International Journal of Applied Earth Observation and Geoinformation
1 publication, 2.7%
Frontiers in Nuclear Engineering
1 publication, 2.7%
BMC Psychiatry
1 publication, 2.7%
Journal of Organic Chemistry
1 publication, 2.7%
Lecture Notes in Networks and Systems
1 publication, 2.7%
Journal of Pharmaceutical and Biomedical Analysis
1 publication, 2.7%
1
2
3
4
5
6

Publishers

2
4
6
8
10
12
14
American Chemical Society (ACS)
14 publications, 37.84%
Elsevier
9 publications, 24.32%
Springer Nature
5 publications, 13.51%
MDPI
2 publications, 5.41%
Cold Spring Harbor Laboratory
2 publications, 5.41%
Wiley
1 publication, 2.7%
Taylor & Francis
1 publication, 2.7%
Institute of Electrical and Electronics Engineers (IEEE)
1 publication, 2.7%
Frontiers Media S.A.
1 publication, 2.7%
2
4
6
8
10
12
14
  • We do not take into account publications without a DOI.
  • Statistics recalculated weekly.

Are you a researcher?

Create a profile to get free access to personal recommendations for colleagues and new articles.
Metrics
37
Share
Cite this
GOST |
Cite this
GOST Copy
Rodríguez Pérez R., Vogt M., Bajorath J. Influence of Varying Training Set Composition and Size on Support Vector Machine-Based Prediction of Active Compounds // Journal of Chemical Information and Modeling. 2017. Vol. 57. No. 4. pp. 710-716.
GOST all authors (up to 50) Copy
Rodríguez Pérez R., Vogt M., Bajorath J. Influence of Varying Training Set Composition and Size on Support Vector Machine-Based Prediction of Active Compounds // Journal of Chemical Information and Modeling. 2017. Vol. 57. No. 4. pp. 710-716.
RIS |
Cite this
RIS Copy
TY - JOUR
DO - 10.1021/acs.jcim.7b00088
UR - https://doi.org/10.1021/acs.jcim.7b00088
TI - Influence of Varying Training Set Composition and Size on Support Vector Machine-Based Prediction of Active Compounds
T2 - Journal of Chemical Information and Modeling
AU - Rodríguez Pérez, Raquel
AU - Vogt, Martin
AU - Bajorath, Jürgen
PY - 2017
DA - 2017/04/10
PB - American Chemical Society (ACS)
SP - 710-716
IS - 4
VL - 57
PMID - 28376613
SN - 1549-9596
SN - 1549-960X
ER -
BibTex |
Cite this
BibTex (up to 50 authors) Copy
@article{2017_Rodríguez Pérez,
author = {Raquel Rodríguez Pérez and Martin Vogt and Jürgen Bajorath},
title = {Influence of Varying Training Set Composition and Size on Support Vector Machine-Based Prediction of Active Compounds},
journal = {Journal of Chemical Information and Modeling},
year = {2017},
volume = {57},
publisher = {American Chemical Society (ACS)},
month = {apr},
url = {https://doi.org/10.1021/acs.jcim.7b00088},
number = {4},
pages = {710--716},
doi = {10.1021/acs.jcim.7b00088}
}
MLA
Cite this
MLA Copy
Rodríguez Pérez, Raquel, et al. “Influence of Varying Training Set Composition and Size on Support Vector Machine-Based Prediction of Active Compounds.” Journal of Chemical Information and Modeling, vol. 57, no. 4, Apr. 2017, pp. 710-716. https://doi.org/10.1021/acs.jcim.7b00088.
Profiles