Open Access
Machine learning using intrinsic genomic signatures for rapid classification of novel pathogens: COVID-19 case study
Gurjit S Randhawa
1
,
Maximillian PM Soltysiak
2
,
Hadi El Roz
2
,
Camila P E de Souza
3
,
Kathleen A. Hill
2
,
Lila Kari
4
Publication type: Journal Article
Publication date: 2020-04-24
scimago Q1
wos Q2
SJR: 0.803
CiteScore: 5.4
Impact factor: 2.6
ISSN: 19326203
PubMed ID:
32330208
Multidisciplinary
Abstract
The 2019 novel coronavirus (renamed SARS-CoV-2, and generally referred to as the COVID-19 virus) has spread to 184 countries with over 1.5 million confirmed cases. Such major viral outbreaks demand early elucidation of taxonomic classification and origin of the virus genomic sequence, for strategic planning, containment, and treatment. This paper identifies an intrinsic COVID-19 virus genomic signature and uses it together with a machine learning-based alignment-free approach for an ultra-fast, scalable, and highly accurate classification of whole COVID-19 virus genomes. The proposed method combines supervised machine learning with digital signal processing (MLDSP) for genome analyses, augmented by a decision tree approach to the machine learning component, and a Spearman’s rank correlation coefficient analysis for result validation. These tools are used to analyze a large dataset of over 5000 unique viral genomic sequences, totalling 61.8 million bp, including the 29 COVID-19 virus sequences available on January 27, 2020. Our results support a hypothesis of a bat origin and classify the COVID-19 virus as Sarbecovirus, within Betacoronavirus. Our method achieves 100% accurate classification of the COVID-19 virus sequences, and discovers the most relevant relationships among over 5000 viral genomes within a few minutes, ab initio, using raw DNA sequence data alone, and without any specialized biological knowledge, training, gene or genome annotations. This suggests that, for novel viral and pathogen genome sequences, this alignment-free whole-genome machine-learning approach can provide a reliable real-time option for taxonomic classification.
Found
Nothing found, try to update filter.
Found
Nothing found, try to update filter.
Top-30
Journals
|
2
4
6
8
10
12
14
16
18
|
|
|
BMJ Open
18 publications, 2.06%
|
|
|
bioRxiv
14 publications, 1.61%
|
|
|
medRxiv : the preprint server for health sciences
13 publications, 1.49%
|
|
|
PLoS ONE
12 publications, 1.38%
|
|
|
Scientific Reports
9 publications, 1.03%
|
|
|
IEEE Access
8 publications, 0.92%
|
|
|
Computers in Biology and Medicine
5 publications, 0.57%
|
|
|
Briefings in Bioinformatics
5 publications, 0.57%
|
|
|
Expert Systems with Applications
4 publications, 0.46%
|
|
|
Journal of Biomedical Informatics
3 publications, 0.34%
|
|
|
BMC Bioinformatics
3 publications, 0.34%
|
|
|
Lecture Notes in Networks and Systems
3 publications, 0.34%
|
|
|
Journal of Medical Internet Research
2 publications, 0.23%
|
|
|
Current Medical Imaging Reviews
2 publications, 0.23%
|
|
|
PeerJ
2 publications, 0.23%
|
|
|
Multimedia Tools and Applications
2 publications, 0.23%
|
|
|
Frontiers in Public Health
2 publications, 0.23%
|
|
|
Sensors
2 publications, 0.23%
|
|
|
Journal of Personalized Medicine
2 publications, 0.23%
|
|
|
Healthcare
2 publications, 0.23%
|
|
|
Applied Network Science
2 publications, 0.23%
|
|
|
Applied Intelligence
2 publications, 0.23%
|
|
|
Archives of Computational Methods in Engineering
2 publications, 0.23%
|
|
|
Infection, Genetics and Evolution
2 publications, 0.23%
|
|
|
Computational and Structural Biotechnology Journal
2 publications, 0.23%
|
|
|
Informatics in Medicine Unlocked
2 publications, 0.23%
|
|
|
Chaos, Solitons and Fractals
2 publications, 0.23%
|
|
|
BMJ Global Health
2 publications, 0.23%
|
|
|
Advances in Industrial Internet of Things, Engineering and Management
2 publications, 0.23%
|
|
|
2
4
6
8
10
12
14
16
18
|
Publishers
|
100
200
300
400
500
600
|
|
|
Cold Spring Harbor Laboratory
543 publications, 62.27%
|
|
|
Springer Nature
70 publications, 8.03%
|
|
|
Elsevier
50 publications, 5.73%
|
|
|
Institute of Electrical and Electronics Engineers (IEEE)
37 publications, 4.24%
|
|
|
BMJ
23 publications, 2.64%
|
|
|
Public Library of Science (PLoS)
13 publications, 1.49%
|
|
|
MDPI
11 publications, 1.26%
|
|
|
Frontiers Media S.A.
9 publications, 1.03%
|
|
|
Wiley
7 publications, 0.8%
|
|
|
Oxford University Press
7 publications, 0.8%
|
|
|
Taylor & Francis
6 publications, 0.69%
|
|
|
JMIR Publications
5 publications, 0.57%
|
|
|
Pleiades Publishing
5 publications, 0.57%
|
|
|
Bentham Science Publishers Ltd.
4 publications, 0.46%
|
|
|
SAGE
4 publications, 0.46%
|
|
|
PeerJ
3 publications, 0.34%
|
|
|
Association for Computing Machinery (ACM)
2 publications, 0.23%
|
|
|
World Scientific
2 publications, 0.23%
|
|
|
Tech Science Press
2 publications, 0.23%
|
|
|
American Chemical Society (ACS)
2 publications, 0.23%
|
|
|
Hindawi Limited
2 publications, 0.23%
|
|
|
Emerald
2 publications, 0.23%
|
|
|
IGI Global
2 publications, 0.23%
|
|
|
Society for Neuroscience
2 publications, 0.23%
|
|
|
British Psychological Society
2 publications, 0.23%
|
|
|
American Geophysical Union
1 publication, 0.11%
|
|
|
Institution of Engineering and Technology (IET)
1 publication, 0.11%
|
|
|
S. Karger AG
1 publication, 0.11%
|
|
|
100
200
300
400
500
600
|
- We do not take into account publications without a DOI.
- Statistics recalculated weekly.
Are you a researcher?
Create a profile to get free access to personal recommendations for colleagues and new articles.
Metrics
872
Total citations:
872
Citations from 2024:
175
(20.06%)
Cite this
GOST |
RIS |
BibTex |
MLA
Cite this
GOST
Copy
Randhawa G. S. et al. Machine learning using intrinsic genomic signatures for rapid classification of novel pathogens: COVID-19 case study // PLoS ONE. 2020. Vol. 15. No. 4. p. e0232391.
GOST all authors (up to 50)
Copy
Randhawa G. S., Soltysiak M. P., El Roz H., de Souza C. P. E., Hill K. A., Kari L. Machine learning using intrinsic genomic signatures for rapid classification of novel pathogens: COVID-19 case study // PLoS ONE. 2020. Vol. 15. No. 4. p. e0232391.
Cite this
RIS
Copy
TY - JOUR
DO - 10.1371/journal.pone.0232391
UR - https://doi.org/10.1371/journal.pone.0232391
TI - Machine learning using intrinsic genomic signatures for rapid classification of novel pathogens: COVID-19 case study
T2 - PLoS ONE
AU - Randhawa, Gurjit S
AU - Soltysiak, Maximillian PM
AU - El Roz, Hadi
AU - de Souza, Camila P E
AU - Hill, Kathleen A.
AU - Kari, Lila
PY - 2020
DA - 2020/04/24
PB - Public Library of Science (PLoS)
SP - e0232391
IS - 4
VL - 15
PMID - 32330208
SN - 1932-6203
ER -
Cite this
BibTex (up to 50 authors)
Copy
@article{2020_Randhawa,
author = {Gurjit S Randhawa and Maximillian PM Soltysiak and Hadi El Roz and Camila P E de Souza and Kathleen A. Hill and Lila Kari},
title = {Machine learning using intrinsic genomic signatures for rapid classification of novel pathogens: COVID-19 case study},
journal = {PLoS ONE},
year = {2020},
volume = {15},
publisher = {Public Library of Science (PLoS)},
month = {apr},
url = {https://doi.org/10.1371/journal.pone.0232391},
number = {4},
pages = {e0232391},
doi = {10.1371/journal.pone.0232391}
}
Cite this
MLA
Copy
Randhawa, Gurjit S., et al. “Machine learning using intrinsic genomic signatures for rapid classification of novel pathogens: COVID-19 case study.” PLoS ONE, vol. 15, no. 4, Apr. 2020, p. e0232391. https://doi.org/10.1371/journal.pone.0232391.