том 60 издание 3 страницы 1122-1136

Tapping on the Black Box: How Is the Scoring Power of a Machine-Learning Scoring Function Dependent on the Training Set?

Minyi Su 1, 2
Guoqin Feng 1, 2
Zhihai Liu 1
Yan Li 1, 3
Renxiao Wang 1, 3, 4
Тип публикацииJournal Article
Дата публикации2020-02-21
scimago Q1
wos Q1
БС1
SJR1.467
CiteScore9.8
Impact factor5.3
ISSN15499596, 1549960X
General Chemistry
Computer Science Applications
General Chemical Engineering
Library and Information Sciences
Краткое описание
In recent years, protein-ligand interaction scoring functions derived through machine-learning are repeatedly reported to outperform conventional scoring functions. However, several published studies have questioned that the superior performance of machine-learning scoring functions is dependent on the between the training set and the test set. In order to examine the true power of machine-learning algorithms in scoring function formulation, we have conducted a systematic study of six off-the-shelf machine-learning algorithms, including Bayesian Ridge Regression (BRR), Decision Tree (DT), K-Nearest Neighbors (KNN), Multilayer Perceptron (MLP), Linear Support Vector Regression (L-SVR), and Random Forest (RF). Model scoring functions were derived with these machine-learning algorithms on various training sets selected from over 3700 protein-ligand complexes in the PDBbind refined set (version 2016). All resulting scoring functions were then applied to the CASF-2016 test set to validate their scoring power. In our first series of trial, the size of the training set was fixed; while the overall similarity between the training set and the test set was varied systematically. In our second series of trial, the overall similarity between the training set and the test set was fixed, while the size of the training set was varied. Our results indicate that the performance of those machine-learning models are more or less dependent on the contents or the size of the training set, where the RF model demonstrates the best learning capability. In contrast, the performance of three conventional scoring functions (i.e., ChemScore, ASP, and X-Score) is basically insensitive to the use of different training sets. Therefore, one has to consider not only hard overlap but also soft overlap between the training set and the test set in order to evaluate machine-learning scoring functions. In this spirit, we have complied data sets based on the PDBbind refined set by removing redundant samples under several similarity thresholds. Scoring functions developers are encouraged to employ them as standard training sets if they want to evaluate their new models on the CASF-2016 benchmark.
Найдено 
Найдено 

Топ-30

Журналы

2
4
6
8
10
12
14
16
Journal of Chemical Information and Modeling
15 публикаций, 21.13%
Briefings in Bioinformatics
5 публикаций, 7.04%
Scientific Reports
4 публикации, 5.63%
Journal of Cheminformatics
3 публикации, 4.23%
ACS Omega
3 публикации, 4.23%
Drug Discovery Today
2 публикации, 2.82%
Physical Chemistry Chemical Physics
2 публикации, 2.82%
International Journal of Molecular Sciences
1 публикация, 1.41%
Molecular Informatics
1 публикация, 1.41%
Molecules
1 публикация, 1.41%
Frontiers in Molecular Biosciences
1 публикация, 1.41%
Frontiers in Bioinformatics
1 публикация, 1.41%
Computers
1 публикация, 1.41%
BMC Bioinformatics
1 публикация, 1.41%
Chemical Physics Letters
1 публикация, 1.41%
Analytica Chimica Acta
1 публикация, 1.41%
Journal of Molecular Graphics and Modelling
1 публикация, 1.41%
Journal of Medicinal Chemistry
1 публикация, 1.41%
Expert Opinion on Drug Discovery
1 публикация, 1.41%
Saudi Dental Journal
1 публикация, 1.41%
Chemical Science
1 публикация, 1.41%
Analytical Chemistry
1 публикация, 1.41%
Proteins: Structure, Function and Genetics
1 публикация, 1.41%
Machine Learning: Science and Technology
1 публикация, 1.41%
Journal of Physical Chemistry B
1 публикация, 1.41%
Mendeleev Communications
1 публикация, 1.41%
Digital Discovery
1 публикация, 1.41%
Wiley Interdisciplinary Reviews: Computational Molecular Science
1 публикация, 1.41%
Nature Machine Intelligence
1 публикация, 1.41%
2
4
6
8
10
12
14
16

Издатели

5
10
15
20
25
American Chemical Society (ACS)
21 публикация, 29.58%
Springer Nature
13 публикаций, 18.31%
Elsevier
6 публикаций, 8.45%
Oxford University Press
6 публикаций, 8.45%
Royal Society of Chemistry (RSC)
5 публикаций, 7.04%
Wiley
4 публикации, 5.63%
MDPI
3 публикации, 4.23%
Cold Spring Harbor Laboratory
3 публикации, 4.23%
Frontiers Media S.A.
2 публикации, 2.82%
Taylor & Francis
1 публикация, 1.41%
King Saud University
1 публикация, 1.41%
IOP Publishing
1 публикация, 1.41%
OOO Zhurnal "Mendeleevskie Soobshcheniya"
1 публикация, 1.41%
IntechOpen
1 публикация, 1.41%
Institute of Electrical and Electronics Engineers (IEEE)
1 публикация, 1.41%
International Press of Boston
1 публикация, 1.41%
5
10
15
20
25
  • Мы не учитываем публикации, у которых нет DOI.
  • Статистика публикаций обновляется еженедельно.

Вы ученый?

Создайте профиль, чтобы получать персональные рекомендации коллег, конференций и новых статей.
Метрики
71
Поделиться
Цитировать
ГОСТ |
Цитировать
Su M. et al. Tapping on the Black Box: How Is the Scoring Power of a Machine-Learning Scoring Function Dependent on the Training Set? // Journal of Chemical Information and Modeling. 2020. Vol. 60. No. 3. pp. 1122-1136.
ГОСТ со всеми авторами (до 50) Скопировать
Su M., Feng G., Liu Z., Li Y., Wang R. Tapping on the Black Box: How Is the Scoring Power of a Machine-Learning Scoring Function Dependent on the Training Set? // Journal of Chemical Information and Modeling. 2020. Vol. 60. No. 3. pp. 1122-1136.
RIS |
Цитировать
TY - JOUR
DO - 10.1021/acs.jcim.9b00714
UR - https://doi.org/10.1021/acs.jcim.9b00714
TI - Tapping on the Black Box: How Is the Scoring Power of a Machine-Learning Scoring Function Dependent on the Training Set?
T2 - Journal of Chemical Information and Modeling
AU - Su, Minyi
AU - Feng, Guoqin
AU - Liu, Zhihai
AU - Li, Yan
AU - Wang, Renxiao
PY - 2020
DA - 2020/02/21
PB - American Chemical Society (ACS)
SP - 1122-1136
IS - 3
VL - 60
PMID - 32085675
SN - 1549-9596
SN - 1549-960X
ER -
BibTex |
Цитировать
BibTex (до 50 авторов) Скопировать
@article{2020_Su,
author = {Minyi Su and Guoqin Feng and Zhihai Liu and Yan Li and Renxiao Wang},
title = {Tapping on the Black Box: How Is the Scoring Power of a Machine-Learning Scoring Function Dependent on the Training Set?},
journal = {Journal of Chemical Information and Modeling},
year = {2020},
volume = {60},
publisher = {American Chemical Society (ACS)},
month = {feb},
url = {https://doi.org/10.1021/acs.jcim.9b00714},
number = {3},
pages = {1122--1136},
doi = {10.1021/acs.jcim.9b00714}
}
MLA
Цитировать
Su, Minyi, et al. “Tapping on the Black Box: How Is the Scoring Power of a Machine-Learning Scoring Function Dependent on the Training Set?.” Journal of Chemical Information and Modeling, vol. 60, no. 3, Feb. 2020, pp. 1122-1136. https://doi.org/10.1021/acs.jcim.9b00714.