BiG-SLiCE: A highly scalable tool maps the diversity of 1.2 million biosynthetic gene clusters
Background
Genome mining for biosynthetic gene clusters (BGCs) has become an integral part of natural product discovery. The >200,000 microbial genomes now publicly available hold information on abundant novel chemistry. One way to navigate this vast genomic diversity is through comparative analysis of homologous BGCs, which allows identification of cross-species patterns that can be matched to the presence of metabolites or biological activities. However, current tools are hindered by a bottleneck caused by the expensive network-based approach used to group these BGCs into gene cluster families (GCFs).
Results
Here, we introduce BiG-SLiCE, a tool designed to cluster massive numbers of BGCs. By representing them in Euclidean space, BiG-SLiCE can group BGCs into GCFs in a non-pairwise, near-linear fashion. We used BiG-SLiCE to analyze 1,225,071 BGCs collected from 209,206 publicly available microbial genomes and metagenome-assembled genomes within 10 days on a typical 36-core CPU server. We demonstrate the utility of such analyses by reconstructing a global map of secondary metabolic diversity across taxonomy to identify uncharted biosynthetic potential. BiG-SLiCE also provides a “query mode” that can efficiently place newly sequenced BGCs into previously computed GCFs, plus a powerful output visualization engine that facilitates user-friendly data exploration.
Conclusions
BiG-SLiCE opens up new possibilities to accelerate natural product discovery and offers a first step towards constructing a global and searchable interconnected network of BGCs. As more genomes are sequenced from understudied taxa, more information can be mined to highlight their potentially novel chemistry. BiG-SLiCE is available via https://github.com/medema-group/bigslice.
Top-30
Journals
|
2
4
6
8
10
|
|
|
Natural Product Reports
10 publications, 5.99%
|
|
|
mSystems
7 publications, 4.19%
|
|
|
Microbial genomics
7 publications, 4.19%
|
|
|
Marine Drugs
7 publications, 4.19%
|
|
|
Nucleic Acids Research
6 publications, 3.59%
|
|
|
Frontiers in Microbiology
4 publications, 2.4%
|
|
|
Nature Communications
3 publications, 1.8%
|
|
|
Microbiome
3 publications, 1.8%
|
|
|
Angewandte Chemie - International Edition
3 publications, 1.8%
|
|
|
Angewandte Chemie
3 publications, 1.8%
|
|
|
Biotechnology Advances
3 publications, 1.8%
|
|
|
bioRxiv
3 publications, 1.8%
|
|
|
PLoS Computational Biology
2 publications, 1.2%
|
|
|
Microbiology
2 publications, 1.2%
|
|
|
Nature
2 publications, 1.2%
|
|
|
Chem
2 publications, 1.2%
|
|
|
Journal of Industrial Microbiology and Biotechnology
2 publications, 1.2%
|
|
|
Methods in Enzymology
2 publications, 1.2%
|
|
|
Proceedings of the National Academy of Sciences of the United States of America
2 publications, 1.2%
|
|
|
TrAC - Trends in Analytical Chemistry
2 publications, 1.2%
|
|
|
Current Biotechnology
2 publications, 1.2%
|
|
|
Current Opinion in Microbiology
2 publications, 1.2%
|
|
|
Synthetic and Systems Biotechnology
2 publications, 1.2%
|
|
|
mBio
2 publications, 1.2%
|
|
|
Bioresource Technology
2 publications, 1.2%
|
|
|
Applied and Environmental Microbiology
2 publications, 1.2%
|
|
|
Cell Genomics
1 publication, 0.6%
|
|
|
Essays in Biochemistry
1 publication, 0.6%
|
|
|
Nature Chemical Biology
1 publication, 0.6%
|
|
|
2
4
6
8
10
|
Publishers
|
5
10
15
20
25
30
35
|
|
|
Elsevier
31 publications, 18.56%
|
|
|
Cold Spring Harbor Laboratory
25 publications, 14.97%
|
|
|
Springer Nature
24 publications, 14.37%
|
|
|
American Society for Microbiology
11 publications, 6.59%
|
|
|
MDPI
11 publications, 6.59%
|
|
|
Microbiology Society
10 publications, 5.99%
|
|
|
Royal Society of Chemistry (RSC)
10 publications, 5.99%
|
|
|
Oxford University Press
9 publications, 5.39%
|
|
|
Wiley
8 publications, 4.79%
|
|
|
American Chemical Society (ACS)
6 publications, 3.59%
|
|
|
Frontiers Media S.A.
5 publications, 2.99%
|
|
|
Taylor & Francis
4 publications, 2.4%
|
|
|
Public Library of Science (PLoS)
2 publications, 1.2%
|
|
|
Proceedings of the National Academy of Sciences (PNAS)
2 publications, 1.2%
|
|
|
Bentham Science Publishers Ltd.
2 publications, 1.2%
|
|
|
Portland Press
1 publication, 0.6%
|
|
|
Scientific Societies
1 publication, 0.6%
|
|
|
Institute of Electrical and Electronics Engineers (IEEE)
1 publication, 0.6%
|
|
|
Annual Reviews
1 publication, 0.6%
|
|
|
American Association for the Advancement of Science (AAAS)
1 publication, 0.6%
|
|
|
5
10
15
20
25
30
35
|
- We do not take into account publications without a DOI.
- Statistics recalculated weekly.