Open Access
Open access
Scientific Reports, volume 7, issue 1, publication number 46710

Performance of machine-learning scoring functions in structure-based virtual screening

Maciej Wójcikowski 1
Pedro J Ballester 2, 3, 4, 5
Pawel Siedlecki 1, 6
1
 
Institute of Biochemistry and Biophysics PAS, Warsaw, Poland
2
 
Centre de Recherche en Cancérologie de Marseille (CRCM), Inserm, U1068, Marseille, F-13009, France ,
3
 
CNRS, UMR7258, Marseille, F-13009, France ,
4
 
Institut Paoli-Calmettes, Marseille, F-13009, France ,
Publication typeJournal Article
Publication date2017-04-25
Quartile SCImago
Q1
Quartile WOS
Q2
Impact factor4.6
ISSN20452322
Multidisciplinary
Abstract
Classical scoring functions have reached a plateau in their performance in virtual screening and binding affinity prediction. Recently, machine-learning scoring functions trained on protein-ligand complexes have shown great promise in small tailored studies. They have also raised controversy, specifically concerning model overfitting and applicability to novel targets. Here we provide a new ready-to-use scoring function (RF-Score-VS) trained on 15 426 active and 893 897 inactive molecules docked to a set of 102 targets. We use the full DUD-E data sets along with three docking tools, five classical and three machine-learning scoring functions for model building and performance assessment. Our results show RF-Score-VS can substantially improve virtual screening performance: RF-Score-VS top 1% provides 55.6% hit rate, whereas that of Vina only 16.2% (for smaller percent the difference is even more encouraging: RF-Score-VS top 0.1% achieves 88.6% hit rate for 27.5% using Vina). In addition, RF-Score-VS provides much better prediction of measured binding affinity than Vina (Pearson correlation of 0.56 and −0.18, respectively). Lastly, we test RF-Score-VS on an independent test set from the DEKOIS benchmark and observed comparable results. We provide full data sets to facilitate further research in this area ( http://github.com/oddt/rfscorevs ) as well as ready-to-use RF-Score-VS ( http://github.com/oddt/rfscorevs_binary ).

Top-30

Journals

5
10
15
20
25
30
35
Journal of Chemical Information and Modeling
34 publications, 13.23%
Briefings in Bioinformatics
10 publications, 3.89%
Expert Opinion on Drug Discovery
10 publications, 3.89%
Molecules
9 publications, 3.5%
Wiley Interdisciplinary Reviews: Computational Molecular Science
8 publications, 3.11%
Drug Discovery Today
6 publications, 2.33%
International Journal of Molecular Sciences
5 publications, 1.95%
Journal of Cheminformatics
5 publications, 1.95%
Frontiers in Pharmacology
4 publications, 1.56%
Journal of Computer-Aided Molecular Design
4 publications, 1.56%
ACS Omega
4 publications, 1.56%
Bioinformatics
4 publications, 1.56%
Molecular Diversity
3 publications, 1.17%
PLoS ONE
3 publications, 1.17%
Computational and Structural Biotechnology Journal
3 publications, 1.17%
Chemical Reviews
3 publications, 1.17%
Journal of Biomolecular Structure and Dynamics
3 publications, 1.17%
Methods in Molecular Biology
3 publications, 1.17%
Mathematical Biology and Bioinformatics
3 publications, 1.17%
Current Topics in Medicinal Chemistry
2 publications, 0.78%
Frontiers in Chemistry
2 publications, 0.78%
Scientific Reports
2 publications, 0.78%
Journal of Advanced Research
2 publications, 0.78%
Journal of Molecular Graphics and Modelling
2 publications, 0.78%
Medical Oncology
2 publications, 0.78%
Journal of Chemical Theory and Computation
2 publications, 0.78%
Drugs and Drug Candidates
2 publications, 0.78%
Protein Science
2 publications, 0.78%
Journal of Medicinal Chemistry
2 publications, 0.78%
5
10
15
20
25
30
35

Publishers

5
10
15
20
25
30
35
40
45
50
American Chemical Society (ACS)
49 publications, 19.07%
Elsevier
44 publications, 17.12%
Springer Nature
29 publications, 11.28%
MDPI
23 publications, 8.95%
Wiley
21 publications, 8.17%
Cold Spring Harbor Laboratory
17 publications, 6.61%
Oxford University Press
15 publications, 5.84%
Taylor & Francis
14 publications, 5.45%
Frontiers Media S.A.
7 publications, 2.72%
Bentham Science Publishers Ltd.
5 publications, 1.95%
Public Library of Science (PLoS)
4 publications, 1.56%
IntechOpen
4 publications, 1.56%
Institute of Mathematical Problems of Biology of RAS (IMPB RAS)
3 publications, 1.17%
IGI Global
2 publications, 0.78%
Institute of Electrical and Electronics Engineers (IEEE)
2 publications, 0.78%
Royal Society of Chemistry (RSC)
2 publications, 0.78%
Proceedings of the National Academy of Sciences (PNAS)
2 publications, 0.78%
PeerJ
1 publication, 0.39%
World Scientific
1 publication, 0.39%
1 publication, 0.39%
SAGE
1 publication, 0.39%
Cairo University
1 publication, 0.39%
Hindawi Limited
1 publication, 0.39%
1 publication, 0.39%
Autonomous Non-profit Organization Editorial Board of the journal Uspekhi Khimii
1 publication, 0.39%
5
10
15
20
25
30
35
40
45
50
  • We do not take into account publications without a DOI.
  • Statistics recalculated only for publications connected to researchers, organizations and labs registered on the platform.
  • Statistics recalculated weekly.

Are you a researcher?

Create a profile to get free access to personal recommendations for colleagues and new articles.
Metrics
Share
Cite this
GOST |
Cite this
GOST Copy
Wójcikowski M. et al. Performance of machine-learning scoring functions in structure-based virtual screening // Scientific Reports. 2017. Vol. 7. No. 1. 46710
GOST all authors (up to 50) Copy
Wójcikowski M., Ballester P. J., Siedlecki P. Performance of machine-learning scoring functions in structure-based virtual screening // Scientific Reports. 2017. Vol. 7. No. 1. 46710
RIS |
Cite this
RIS Copy
TY - JOUR
DO - 10.1038/srep46710
UR - https://doi.org/10.1038/srep46710
TI - Performance of machine-learning scoring functions in structure-based virtual screening
T2 - Scientific Reports
AU - Wójcikowski, Maciej
AU - Ballester, Pedro J
AU - Siedlecki, Pawel
PY - 2017
DA - 2017/04/25
PB - Springer Nature
IS - 1
VL - 7
SN - 2045-2322
ER -
BibTex
Cite this
BibTex Copy
@article{2017_Wójcikowski,
author = {Maciej Wójcikowski and Pedro J Ballester and Pawel Siedlecki},
title = {Performance of machine-learning scoring functions in structure-based virtual screening},
journal = {Scientific Reports},
year = {2017},
volume = {7},
publisher = {Springer Nature},
month = {apr},
url = {https://doi.org/10.1038/srep46710},
number = {1},
doi = {10.1038/srep46710}
}
Found error?