Open Access
Open access
volume 10 issue 1

BiG-SLiCE: A highly scalable tool maps the diversity of 1.2 million biosynthetic gene clusters

Publication typeJournal Article
Publication date2021-01-19
scimago Q1
wos Q1
SJR5.314
CiteScore20.0
Impact factor3.9
ISSN2047217X
Computer Science Applications
Health Informatics
Abstract
Background

Genome mining for biosynthetic gene clusters (BGCs) has become an integral part of natural product discovery. The >200,000 microbial genomes now publicly available hold information on abundant novel chemistry. One way to navigate this vast genomic diversity is through comparative analysis of homologous BGCs, which allows identification of cross-species patterns that can be matched to the presence of metabolites or biological activities. However, current tools are hindered by a bottleneck caused by the expensive network-based approach used to group these BGCs into gene cluster families (GCFs).

Results

Here, we introduce BiG-SLiCE, a tool designed to cluster massive numbers of BGCs. By representing them in Euclidean space, BiG-SLiCE can group BGCs into GCFs in a non-pairwise, near-linear fashion. We used BiG-SLiCE to analyze 1,225,071 BGCs collected from 209,206 publicly available microbial genomes and metagenome-assembled genomes within 10 days on a typical 36-core CPU server. We demonstrate the utility of such analyses by reconstructing a global map of secondary metabolic diversity across taxonomy to identify uncharted biosynthetic potential. BiG-SLiCE also provides a “query mode” that can efficiently place newly sequenced BGCs into previously computed GCFs, plus a powerful output visualization engine that facilitates user-friendly data exploration.

Conclusions

BiG-SLiCE opens up new possibilities to accelerate natural product discovery and offers a first step towards constructing a global and searchable interconnected network of BGCs. As more genomes are sequenced from understudied taxa, more information can be mined to highlight their potentially novel chemistry. BiG-SLiCE is available via https://github.com/medema-group/bigslice.

Found 
Found 

Top-30

Journals

2
4
6
8
10
Natural Product Reports
10 publications, 5.99%
mSystems
7 publications, 4.19%
Microbial genomics
7 publications, 4.19%
Marine Drugs
7 publications, 4.19%
Nucleic Acids Research
6 publications, 3.59%
Frontiers in Microbiology
4 publications, 2.4%
Nature Communications
3 publications, 1.8%
Microbiome
3 publications, 1.8%
Angewandte Chemie - International Edition
3 publications, 1.8%
Angewandte Chemie
3 publications, 1.8%
Biotechnology Advances
3 publications, 1.8%
bioRxiv
3 publications, 1.8%
PLoS Computational Biology
2 publications, 1.2%
Microbiology
2 publications, 1.2%
Nature
2 publications, 1.2%
Chem
2 publications, 1.2%
Journal of Industrial Microbiology and Biotechnology
2 publications, 1.2%
Methods in Enzymology
2 publications, 1.2%
Proceedings of the National Academy of Sciences of the United States of America
2 publications, 1.2%
TrAC - Trends in Analytical Chemistry
2 publications, 1.2%
Current Biotechnology
2 publications, 1.2%
Current Opinion in Microbiology
2 publications, 1.2%
Synthetic and Systems Biotechnology
2 publications, 1.2%
mBio
2 publications, 1.2%
Bioresource Technology
2 publications, 1.2%
Applied and Environmental Microbiology
2 publications, 1.2%
Cell Genomics
1 publication, 0.6%
Essays in Biochemistry
1 publication, 0.6%
Nature Chemical Biology
1 publication, 0.6%
2
4
6
8
10

Publishers

5
10
15
20
25
30
35
Elsevier
31 publications, 18.56%
Cold Spring Harbor Laboratory
25 publications, 14.97%
Springer Nature
24 publications, 14.37%
American Society for Microbiology
11 publications, 6.59%
MDPI
11 publications, 6.59%
Microbiology Society
10 publications, 5.99%
Royal Society of Chemistry (RSC)
10 publications, 5.99%
Oxford University Press
9 publications, 5.39%
Wiley
8 publications, 4.79%
American Chemical Society (ACS)
6 publications, 3.59%
Frontiers Media S.A.
5 publications, 2.99%
Taylor & Francis
4 publications, 2.4%
Public Library of Science (PLoS)
2 publications, 1.2%
Proceedings of the National Academy of Sciences (PNAS)
2 publications, 1.2%
Bentham Science Publishers Ltd.
2 publications, 1.2%
Portland Press
1 publication, 0.6%
Scientific Societies
1 publication, 0.6%
Institute of Electrical and Electronics Engineers (IEEE)
1 publication, 0.6%
Annual Reviews
1 publication, 0.6%
American Association for the Advancement of Science (AAAS)
1 publication, 0.6%
5
10
15
20
25
30
35
  • We do not take into account publications without a DOI.
  • Statistics recalculated weekly.

Are you a researcher?

Create a profile to get free access to personal recommendations for colleagues and new articles.
Metrics
167
Share
Cite this
GOST |
Cite this
GOST Copy
Kautsar S. et al. BiG-SLiCE: A highly scalable tool maps the diversity of 1.2 million biosynthetic gene clusters // GigaScience. 2021. Vol. 10. No. 1.
GOST all authors (up to 50) Copy
Kautsar S., van der Hooft J. J. J., de Ridder D., Medema M. H. BiG-SLiCE: A highly scalable tool maps the diversity of 1.2 million biosynthetic gene clusters // GigaScience. 2021. Vol. 10. No. 1.
RIS |
Cite this
RIS Copy
TY - JOUR
DO - 10.1093/gigascience/giaa154
UR - https://doi.org/10.1093/gigascience/giaa154
TI - BiG-SLiCE: A highly scalable tool maps the diversity of 1.2 million biosynthetic gene clusters
T2 - GigaScience
AU - Kautsar, Satria
AU - van der Hooft, Justin J. J.
AU - de Ridder, Dick
AU - Medema, Marnix H.
PY - 2021
DA - 2021/01/19
PB - Oxford University Press
IS - 1
VL - 10
PMID - 33438731
SN - 2047-217X
ER -
BibTex
Cite this
BibTex (up to 50 authors) Copy
@article{2021_Kautsar,
author = {Satria Kautsar and Justin J. J. van der Hooft and Dick de Ridder and Marnix H. Medema},
title = {BiG-SLiCE: A highly scalable tool maps the diversity of 1.2 million biosynthetic gene clusters},
journal = {GigaScience},
year = {2021},
volume = {10},
publisher = {Oxford University Press},
month = {jan},
url = {https://doi.org/10.1093/gigascience/giaa154},
number = {1},
doi = {10.1093/gigascience/giaa154}
}