volume 53 issue 12 pages 3244-3261

Binary Classification of a Large Collection of Environmental Chemicals from Estrogen Receptor Assays by Quantitative Structure–Activity Relationship and Machine Learning Methods

Publication typeJournal Article
Publication date2013-12-11
scimago Q1
wos Q1
SJR1.467
CiteScore9.8
Impact factor5.3
ISSN15499596, 1549960X
PubMed ID:  24279462
General Chemistry
Computer Science Applications
General Chemical Engineering
Library and Information Sciences
Abstract
There are thousands of environmental chemicals subject to regulatory decisions for endocrine disrupting potential. The ToxCast and Tox21 programs have tested ∼8200 chemicals in a broad screening panel of in vitro high-throughput screening (HTS) assays for estrogen receptor (ER) agonist and antagonist activity. The present work uses this large data set to develop in silico quantitative structure-activity relationship (QSAR) models using machine learning (ML) methods and a novel approach to manage the imbalanced data distribution. Training compounds from the ToxCast project were categorized as active or inactive (binding or nonbinding) classes based on a composite ER Interaction Score derived from a collection of 13 ER in vitro assays. A total of 1537 chemicals from ToxCast were used to derive and optimize the binary classification models while 5073 additional chemicals from the Tox21 project, evaluated in 2 of the 13 in vitro assays, were used to externally validate the model performance. In order to handle the imbalanced distribution of active and inactive chemicals, we developed a cluster-selection strategy to minimize information loss and increase predictive performance and compared this strategy to three currently popular techniques: cost-sensitive learning, oversampling of the minority class, and undersampling of the majority class. QSAR classification models were built to relate the molecular structures of chemicals to their ER activities using linear discriminant analysis (LDA), classification and regression trees (CART), and support vector machines (SVM) with 51 molecular descriptors from QikProp and 4328 bits of structural fingerprints as explanatory variables. A random forest (RF) feature selection method was employed to extract the structural features most relevant to the ER activity. The best model was obtained using SVM in combination with a subset of descriptors identified from a large set via the RF algorithm, which recognized the active and inactive compounds at the accuracies of 76.1% and 82.8% with a total accuracy of 81.6% on the internal test set and 70.8% on the external test set. These results demonstrate that a combination of high-quality experimental data and ML methods can lead to robust models that achieve excellent predictive accuracy, which are potentially useful for facilitating the virtual screening of chemicals for environmental risk assessment.
Found 
Found 

Top-30

Journals

1
2
3
4
Science of the Total Environment
4 publications, 7.69%
Journal of Cheminformatics
3 publications, 5.77%
Chemical Research in Toxicology
3 publications, 5.77%
Journal of Chemical Information and Modeling
3 publications, 5.77%
Environmental Health Perspectives
2 publications, 3.85%
Chemosphere
2 publications, 3.85%
Environmental Toxicology and Pharmacology
2 publications, 3.85%
Journal of Applied Toxicology
2 publications, 3.85%
Environmental Toxicology and Chemistry
2 publications, 3.85%
Environmental Science & Technology
2 publications, 3.85%
Sensors
1 publication, 1.92%
Crystals
1 publication, 1.92%
Frontiers in Environmental Science
1 publication, 1.92%
Frontiers in Bioengineering and Biotechnology
1 publication, 1.92%
Scientific Reports
1 publication, 1.92%
Environmental Health: A Global Access Science Source
1 publication, 1.92%
Journal of Computer-Aided Molecular Design
1 publication, 1.92%
Environmental International
1 publication, 1.92%
Trends in Food Science and Technology
1 publication, 1.92%
Trends in Plant Science
1 publication, 1.92%
Aquatic Toxicology
1 publication, 1.92%
Chemical Physics Letters
1 publication, 1.92%
Chemometrics and Intelligent Laboratory Systems
1 publication, 1.92%
ACS Sustainable Chemistry and Engineering
1 publication, 1.92%
RSC Advances
1 publication, 1.92%
Molecular BioSystems
1 publication, 1.92%
MedChemComm
1 publication, 1.92%
Journal of Drug Targeting
1 publication, 1.92%
Methods in Molecular Biology
1 publication, 1.92%
1
2
3
4

Publishers

2
4
6
8
10
12
14
16
Elsevier
15 publications, 28.85%
American Chemical Society (ACS)
9 publications, 17.31%
Springer Nature
8 publications, 15.38%
MDPI
3 publications, 5.77%
Wiley
3 publications, 5.77%
Royal Society of Chemistry (RSC)
3 publications, 5.77%
Environmental Health Perspectives
2 publications, 3.85%
Frontiers Media S.A.
2 publications, 3.85%
Taylor & Francis
1 publication, 1.92%
Hindawi Limited
1 publication, 1.92%
Association for Computing Machinery (ACM)
1 publication, 1.92%
Institute of Electrical and Electronics Engineers (IEEE)
1 publication, 1.92%
Hans Publishers
1 publication, 1.92%
Oxford University Press
1 publication, 1.92%
2
4
6
8
10
12
14
16
  • We do not take into account publications without a DOI.
  • Statistics recalculated weekly.

Are you a researcher?

Create a profile to get free access to personal recommendations for colleagues and new articles.
Metrics
52
Share
Cite this
GOST |
Cite this
GOST Copy
Zang Q., Rotroff D. M. Binary Classification of a Large Collection of Environmental Chemicals from Estrogen Receptor Assays by Quantitative Structure–Activity Relationship and Machine Learning Methods // Journal of Chemical Information and Modeling. 2013. Vol. 53. No. 12. pp. 3244-3261.
GOST all authors (up to 50) Copy
Zang Q., Rotroff D. M. Binary Classification of a Large Collection of Environmental Chemicals from Estrogen Receptor Assays by Quantitative Structure–Activity Relationship and Machine Learning Methods // Journal of Chemical Information and Modeling. 2013. Vol. 53. No. 12. pp. 3244-3261.
RIS |
Cite this
RIS Copy
TY - JOUR
DO - 10.1021/ci400527b
UR - https://doi.org/10.1021/ci400527b
TI - Binary Classification of a Large Collection of Environmental Chemicals from Estrogen Receptor Assays by Quantitative Structure–Activity Relationship and Machine Learning Methods
T2 - Journal of Chemical Information and Modeling
AU - Zang, Qingda
AU - Rotroff, Daniel M.
PY - 2013
DA - 2013/12/11
PB - American Chemical Society (ACS)
SP - 3244-3261
IS - 12
VL - 53
PMID - 24279462
SN - 1549-9596
SN - 1549-960X
ER -
BibTex |
Cite this
BibTex (up to 50 authors) Copy
@article{2013_Zang,
author = {Qingda Zang and Daniel M. Rotroff},
title = {Binary Classification of a Large Collection of Environmental Chemicals from Estrogen Receptor Assays by Quantitative Structure–Activity Relationship and Machine Learning Methods},
journal = {Journal of Chemical Information and Modeling},
year = {2013},
volume = {53},
publisher = {American Chemical Society (ACS)},
month = {dec},
url = {https://doi.org/10.1021/ci400527b},
number = {12},
pages = {3244--3261},
doi = {10.1021/ci400527b}
}
MLA
Cite this
MLA Copy
Zang, Qingda, et al. “Binary Classification of a Large Collection of Environmental Chemicals from Estrogen Receptor Assays by Quantitative Structure–Activity Relationship and Machine Learning Methods.” Journal of Chemical Information and Modeling, vol. 53, no. 12, Dec. 2013, pp. 3244-3261. https://doi.org/10.1021/ci400527b.