Open Access
Open access
Nucleic Acids Research, volume 51, issue W1, pages W46-W50

antiSMASH 7.0: new and improved predictions for detection, regulation, chemical structures and visualisation

Kai Blin 1
Simon Shaw 1
Friederike Biermann 3, 4, 5
Artem Fetter 3, 6
Barbara R Terlouw 3
William Metcalf 7, 8
Eric J N Helfrich 4, 5
Gilles P. van Wezel 2
Marnix H. Medema 3
Tilmann Weber 1
Show full list: 13 authors
Publication typeJournal Article
Publication date2023-05-04
scimago Q1
SJR7.048
CiteScore27.1
Impact factor16.6
ISSN03051048, 13624962
Genetics
Abstract

Microorganisms produce small bioactive compounds as part of their secondary or specialised metabolism. Often, such metabolites have antimicrobial, anticancer, antifungal, antiviral or other bio-activities and thus play an important role for applications in medicine and agriculture. In the past decade, genome mining has become a widely-used method to explore, access, and analyse the available biodiversity of these compounds. Since 2011, the ‘antibiotics and secondary metabolite analysis shell—antiSMASH’ (https://antismash.secondarymetabolites.org/) has supported researchers in their microbial genome mining tasks, both as a free to use web server and as a standalone tool under an OSI-approved open source licence. It is currently the most widely used tool for detecting and characterising biosynthetic gene clusters (BGCs) in archaea, bacteria, and fungi. Here, we present the updated version 7 of antiSMASH. antiSMASH 7 increases the number of supported cluster types from 71 to 81, as well as containing improvements in the areas of chemical structure prediction, enzymatic assembly-line visualisation and gene cluster regulation.

Pascal Andreu V., Augustijn H.E., Chen L., Zhernakova A., Fu J., Fischbach M.A., Dodd D., Medema M.H.
Nature Biotechnology scimago Q1 wos Q1
2023-02-13 citations by CoLab: 59 Abstract  
The gut microbiota produce hundreds of small molecules, many of which modulate host physiology. Although efforts have been made to identify biosynthetic genes for secondary metabolites, the chemical output of the gut microbiome consists predominantly of primary metabolites. Here we introduce the gutSMASH algorithm for identification of primary metabolic gene clusters, and we used it to systematically profile gut microbiome metabolism, identifying 19,890 gene clusters in 4,240 high-quality microbial genomes. We found marked differences in pathway distribution among phyla, reflecting distinct strategies for energy capture. These data explain taxonomic differences in short-chain fatty acid production and suggest a characteristic metabolic niche for each taxon. Analysis of 1,135 individuals from a Dutch population-based cohort shows that the level of microbiome-derived metabolites in plasma and feces is almost completely uncorrelated with the metagenomic abundance of corresponding metabolic genes, indicating a crucial role for pathway-specific gene regulation and metabolite flux. This work is a starting point for understanding differences in how bacterial taxa contribute to the chemistry of the microbiome. Taxon-specific primary metabolic pathways are identified using profile hidden Markov models.
Reitz Z.L., Butler A., Medema M.H.
2022-12-16 citations by CoLab: 8 Abstract  
AbstractMicrobial competition for trace metals shapes their communities and interactions with humans and plants. Many bacteria scavenge trace metals with metallophores, small molecules that chelate environmental metal ions and transport them back into the cell. Our incomplete knowledge of metallophores diversity stymies our ability to fight infectious diseases and harness beneficial microbiome interactions. The majority of known metallophores are non-ribosomal peptides (NRPs), which feature metal-chelating moieties rarely found in other classes of natural products. NRP metallophore production may be predicted by genome mining, where genomes are scanned for homologs of known biosynthetic gene clusters (BGCs). However, accurately detecting NRP metallophore biosynthesis currently requires expert manual inspection. Here, we introduce automated identification of NRP metallophore BGCs through a comprehensive detection algorithm, newly implemented in antiSMASH. Custom-designed profile hidden Markov models detect genes encoding the biosynthesis of most known NRP metallophore chelating moieties (2,3-dihydroxybenzoate, hydroxamates, salicylate, β-hydroxyamino acids, graminine, Dmaq, and the pyoverdine chromophore), achieving 97% precision and 78% recall against manual curation. We leveraged the algorithm, in combination with transporter gene detection, to detect NRP metallophore BGCs in 15,562 representative bacterial genomes and predict that 25% of all non-ribosomal peptide synthetases encode metallophore production. BiG-SCAPE clustering of 2,562 NRP metallophore BGCs revealed that significant diversity remains unexplored, including new combinations of chelating groups. Additionally, we find that Cyanobacteria are severely understudied and should be the focus of more metallophore isolation efforts. The inclusion of NRP metallophore detection in antiSMASH version 7 will aid non-expert researchers and facilitate large-scale investigations into metallophore biology.
Terlouw B.R., Blin K., Navarro-Muñoz J.C., Avalon N.E., Chevrette M.G., Egbert S., Lee S., Meijer D., Recchia M.J., Reitz Z., van Santen J., Selem-Mojica N., Tørring T., Zaroubi L., Alanjary M., et. al.
Nucleic Acids Research scimago Q1 wos Q1 Open Access
2022-11-18 citations by CoLab: 266 PDF Abstract  
Abstract With an ever-increasing amount of (meta)genomic data being deposited in sequence databases, (meta)genome mining for natural product biosynthetic pathways occupies a critical role in the discovery of novel pharmaceutical drugs, crop protection agents and biomaterials. The genes that encode these pathways are often organised into biosynthetic gene clusters (BGCs). In 2015, we defined the Minimum Information about a Biosynthetic Gene cluster (MIBiG): a standardised data format that describes the minimally required information to uniquely characterise a BGC. We simultaneously constructed an accompanying online database of BGCs, which has since been widely used by the community as a reference dataset for BGCs and was expanded to 2021 entries in 2019 (MIBiG 2.0). Here, we describe MIBiG 3.0, a database update comprising large-scale validation and re-annotation of existing entries and 661 new entries. Particular attention was paid to the annotation of compound structures and biological activities, as well as protein domain selectivities. Together, these new features keep the database up-to-date, and will provide new opportunities for the scientific community to use its freely available data, e.g. for the training of new machine learning models to predict sequence-structure-function relationships for diverse natural products. MIBiG 3.0 is accessible online at https://mibig.secondarymetabolites.org/.
Klau L.J., Podell S., Creamer K.E., Demko A.M., Singh H.W., Allen E.E., Moore B.S., Ziemert N., Letzel A.C., Jensen P.R.
Journal of Biological Chemistry scimago Q1 wos Q2 Open Access
2022-10-01 citations by CoLab: 45 Abstract  
The Natural Product Domain Seeker (NaPDoS) webtool detects and classifies ketosynthase (KS) and condensation domains from genomic, metagenomic, and amplicon sequence data. Unlike other tools, a phylogeny-based classification scheme is used to make broader predictions about the polyketide synthase (PKS) and nonribosomal peptide synthetase (NRPS) genes in which these domains are found. NaPDoS is particularly useful for the analysis of incomplete biosynthetic genes or gene clusters, as are often observed in poorly assembled genomes and metagenomes, or when loci are not clustered, as in eukaryotic genomes. To help support the growing interest in sequence-based analyses of natural product biosynthetic diversity, here we introduce version 2 of the webtool, NaPDoS2, available at http://napdos.ucsd.edu/napdos2. This update includes the addition of 1417 KS sequences, representing a major expansion of the taxonomic and functional diversity represented in the webtool database. The phylogeny-based KS classification scheme now recognizes 41 class and subclass assignments, including new type II PKS subclasses. Workflow modifications accelerate run times, allowing larger datasets to be analyzed. In addition, default parameters were established using statistical validation tests to maximize KS detection and classification accuracy while minimizing false positives. We further demonstrate the applications of NaPDoS2 to assess PKS biosynthetic potential using genomic, metagenomic, and PCR amplicon datasets. These examples illustrate how NaPDoS2 can be used to predict biosynthetic potential and detect genes involved in the biosynthesis of specific structure classes or new biosynthetic mechanisms.
Castro-Mondragon J.A., Riudavets-Puig R., Rauluseviciute I., Berhanu Lemma R., Turchi L., Blanc-Mathieu R., Lucas J., Boddie P., Khan A., Manosalva Pérez N., Fornes O., Leung T., Aguirre A., Hammal F., Schmelter D., et. al.
Nucleic Acids Research scimago Q1 wos Q1 Open Access
2021-11-30 citations by CoLab: 1392 PDF Abstract  
Abstract JASPAR (http://jaspar.genereg.net/) is an open-access database containing manually curated, non-redundant transcription factor (TF) binding profiles for TFs across six taxonomic groups. In this 9th release, we expanded the CORE collection with 341 new profiles (148 for plants, 101 for vertebrates, 85 for urochordates, and 7 for insects), which corresponds to a 19% expansion over the previous release. We added 298 new profiles to the Unvalidated collection when no orthogonal evidence was found in the literature. All the profiles were clustered to provide familial binding profiles for each taxonomic group. Moreover, we revised the structural classification of DNA binding domains to consider plant-specific TFs. This release introduces word clouds to represent the scientific knowledge associated with each TF. We updated the genome tracks of TFBSs predicted with JASPAR profiles in eight organisms; the human and mouse TFBS predictions can be visualized as native tracks in the UCSC Genome Browser. Finally, we provide a new tool to perform JASPAR TFBS enrichment analysis in user-provided genomic regions. All the data is accessible through the JASPAR website, its associated RESTful API, the R/Bioconductor data package, and a new Python package, pyJASPAR, that facilitates serverless access to the data.
Sparks T.C., Bryant R.J.
Pest Management Science scimago Q1 wos Q1
2021-10-11 citations by CoLab: 63 Abstract  
Natural products (NPs) have long been an important source of, and inspiration for, developing novel compounds to control weeds, pathogens and insect pests. In this review, we use a dataset of 800 historic, current and emerging crop protection compounds to explore the influence of NPs on the introduction of new crop protection compounds (fungicides, herbicides, insecticides) as a function of time. NPs, their semisynthetic derivatives (NPDs) and compounds inspired by NPs (NP mimics, NPMs) account for 17% of all crop protection compounds. NPs, NPDs, and NPMs have been a fairly constant source of new agrochemicals over the past 70 years. NP synthetic equivalents (NPSEs) is a fourth group of NP-related crop protection compounds composed of synthetic compounds which by chance also happen to have an NP model (but are not involved in the discovery). If NPSE compounds are also included, then 50% of all crop protection compounds hypothetically could have had a NP origin. Similar trends also hold true for the impact of NPs on the discovery of new modes of action (MoA) or innovation in crop protection compounds as measured by the number of first-in-class compounds. NPs have had the largest impact on the numbers and global sales (2018 USD) of insecticides compared to fungicides and herbicides. The present analysis highlights NPs as a long-standing and continuing source of new chemistry, new MoAs and innovation in crop protection compound discovery. © 2021 Society of Chemical Industry.
Pascal Andreu V., Roel-Touris J., Dodd D., Fischbach M., Medema M.
Nucleic Acids Research scimago Q1 wos Q1 Open Access
2021-05-21 citations by CoLab: 42 PDF Abstract  
Abstract Anaerobic bacteria from the human microbiome produce a wide array of molecules at high concentrations that can directly or indirectly affect the host. The production of these molecules, mostly derived from their primary metabolism, is frequently encoded in metabolic gene clusters (MGCs). However, despite the importance of microbiome-derived primary metabolites, no tool existed to predict the gene clusters responsible for their production. For this reason, we recently introduced gutSMASH. gutSMASH can predict 41 different known pathways, including MGCs involved in bioenergetics, but also putative ones that are candidates for novel pathway discovery. To make the tool more user-friendly and accessible, we here present the gutSMASH web server, hosted at https://gutsmash.bioinformatics.nl/. The user can either input the GenBank assembly accession or upload a genome file in FASTA or GenBank format. Optionally, the user can enable additional analyses to obtain further insights into the predicted MGCs. An interactive HTML output (viewable online or downloadable for offline use) provides a user-friendly way to browse functional gene annotations and sequence comparisons with reference gene clusters as well as gene clusters predicted in other genomes. Thus, this web server provides the community with a streamlined and user-friendly interface to analyze the metabolic potential of gut microbiomes.
Blin K., Shaw S., Kloosterman A.M., Charlop-Powers Z., van Wezel G.P., Medema M., Weber T.
Nucleic Acids Research scimago Q1 wos Q1 Open Access
2021-05-12 citations by CoLab: 1937 PDF Abstract  
Abstract Many microorganisms produce natural products that form the basis of antimicrobials, antivirals, and other drugs. Genome mining is routinely used to complement screening-based workflows to discover novel natural products. Since 2011, the "antibiotics and secondary metabolite analysis shell—antiSMASH" (https://antismash.secondarymetabolites.org/) has supported researchers in their microbial genome mining tasks, both as a free-to-use web server and as a standalone tool under an OSI-approved open-source license. It is currently the most widely used tool for detecting and characterising biosynthetic gene clusters (BGCs) in bacteria and fungi. Here, we present the updated version 6 of antiSMASH. antiSMASH 6 increases the number of supported cluster types from 58 to 71, displays the modular structure of multi-modular BGCs, adds a new BGC comparison algorithm, allows for the integration of results from other prediction tools, and more effectively detects tailoring enzymes in RiPP clusters.
Kautsar S.A., van der Hooft J.J., de Ridder D., Medema M.H.
GigaScience scimago Q1 wos Q1 Open Access
2021-01-19 citations by CoLab: 139 PDF Abstract  
Abstract Background Genome mining for biosynthetic gene clusters (BGCs) has become an integral part of natural product discovery. The >200,000 microbial genomes now publicly available hold information on abundant novel chemistry. One way to navigate this vast genomic diversity is through comparative analysis of homologous BGCs, which allows identification of cross-species patterns that can be matched to the presence of metabolites or biological activities. However, current tools are hindered by a bottleneck caused by the expensive network-based approach used to group these BGCs into gene cluster families (GCFs). Results Here, we introduce BiG-SLiCE, a tool designed to cluster massive numbers of BGCs. By representing them in Euclidean space, BiG-SLiCE can group BGCs into GCFs in a non-pairwise, near-linear fashion. We used BiG-SLiCE to analyze 1,225,071 BGCs collected from 209,206 publicly available microbial genomes and metagenome-assembled genomes within 10 days on a typical 36-core CPU server. We demonstrate the utility of such analyses by reconstructing a global map of secondary metabolic diversity across taxonomy to identify uncharted biosynthetic potential. BiG-SLiCE also provides a “query mode” that can efficiently place newly sequenced BGCs into previously computed GCFs, plus a powerful output visualization engine that facilitates user-friendly data exploration. Conclusions BiG-SLiCE opens up new possibilities to accelerate natural product discovery and offers a first step towards constructing a global and searchable interconnected network of BGCs. As more genomes are sequenced from understudied taxa, more information can be mined to highlight their potentially novel chemistry. BiG-SLiCE is available via https://github.com/medema-group/bigslice.
Blin K., Shaw S., Kautsar S.A., Medema M.H., Weber T.
Nucleic Acids Research scimago Q1 wos Q1 Open Access
2020-11-05 citations by CoLab: 119 PDF Abstract  
Abstract Microorganisms produce natural products that are frequently used in the development of antibacterial, antiviral, and anticancer drugs, pesticides, herbicides, or fungicides. In recent years, genome mining has evolved into a prominent method to access this potential. antiSMASH is one of the most popular tools for this task. Here, we present version 3 of the antiSMASH database, providing a means to access and query precomputed antiSMASH-5.2-detected biosynthetic gene clusters from representative, publicly available, high-quality microbial genomes via an interactive graphical user interface. In version 3, the database contains 147 517 high quality BGC regions from 388 archaeal, 25 236 bacterial and 177 fungal genomes and is available at https://antismash-db.secondarymetabolites.org/.
Mistry J., Chuguransky S., Williams L., Qureshi M., Salazar G., Sonnhammer E.L., Tosatto S.C., Paladin L., Raj S., Richardson L.J., Finn R.D., Bateman A.
Nucleic Acids Research scimago Q1 wos Q1 Open Access
2020-10-30 citations by CoLab: 4367 PDF Abstract  
Abstract The Pfam database is a widely used resource for classifying protein sequences into families and domains. Since Pfam was last described in this journal, over 350 new families have been added in Pfam 33.1 and numerous improvements have been made to existing entries. To facilitate research on COVID-19, we have revised the Pfam entries that cover the SARS-CoV-2 proteome, and built new entries for regions that were not covered by Pfam. We have reintroduced Pfam-B which provides an automatically generated supplement to Pfam and contains 136 730 novel clusters of sequences that are not yet matched by a Pfam family. The new Pfam-B is based on a clustering by the MMseqs2 software. We have compared all of the regions in the RepeatsDB to those in Pfam and have started to use the results to build and refine Pfam repeat families. Pfam is freely available for browsing and download at http://pfam.xfam.org/.
Letunic I., Khedkar S., Bork P.
Nucleic Acids Research scimago Q1 wos Q1 Open Access
2020-10-26 citations by CoLab: 1318 PDF Abstract  
Abstract SMART (Simple Modular Architecture Research Tool) is a web resource (https://smart.embl.de) for the identification and annotation of protein domains and the analysis of protein domain architectures. SMART version 9 contains manually curated models for more than 1300 protein domains, with a topical set of 68 new models added since our last update article (1). All the new models are for diverse recombinase families and subfamilies and as a set they provide a comprehensive overview of mobile element recombinases namely transposase, integrase, relaxase, resolvase, cas1 casposase and Xer like cellular recombinase. Further updates include the synchronization of the underlying protein databases with UniProt (2), Ensembl (3) and STRING (4), greatly increasing the total number of annotated domains and other protein features available in architecture analysis mode. Furthermore, SMART’s vector-based protein display engine has been extended and updated to use the latest web technologies and the domain architecture analysis components have been optimized to handle the increased number of protein features available.
Kautsar S.A., Blin K., Shaw S., Weber T., Medema M.H.
Nucleic Acids Research scimago Q1 wos Q1 Open Access
2020-10-03 citations by CoLab: 171 PDF Abstract  
Abstract Computational analysis of biosynthetic gene clusters (BGCs) has revolutionized natural product discovery by enabling the rapid investigation of secondary metabolic potential within microbial genome sequences. Grouping homologous BGCs into Gene Cluster Families (GCFs) facilitates mapping their architectural and taxonomic diversity and provides insights into the novelty of putative BGCs, through dereplication with BGCs of known function. While multiple databases exist for exploring BGCs from publicly available data, no public resources exist that focus on GCF relationships. Here, we present BiG-FAM, a database of 29,955 GCFs capturing the global diversity of 1,225,071 BGCs predicted from 209,206 publicly available microbial genomes and metagenome-assembled genomes (MAGs). The database offers rich functionalities, such as multi-criterion GCF searches, direct links to BGC databases such as antiSMASH-DB, and rapid GCF annotation of user-supplied BGCs from antiSMASH results. BiG-FAM can be accessed online at https://bigfam.bioinformatics.nl.
Blin K., Shaw S., Tong Y., Weber T.
2020-06-13 citations by CoLab: 25 Abstract  
CRISPR/Cas9 systems are an established tool in genome engineering. As double strand breaks caused by the standard Cas9-based knock-out techniques can be problematic in some organisms, new systems were developed that can efficiently create knock-outs without causing double strand breaks to elegantly sidestep these issues. The recently published CRISPR-BEST base editor system for actinobacteria is built around a C to T or A to G base exchange. These base editing systems however require additional constraints to be considered for designing the sgRNAs. Here, we present an updated version of the interactive CRISPy-web single guide RNA design tool https://crispy.secondarymetabolites.org/that was built to support "classical" CRISPR and now also CRISPR-BEST workflows.
Ziemert N., Medema M., Weber T., Blin K., Alanjary M., Mungan M.D.
Nucleic Acids Research scimago Q1 wos Q1 Open Access
2020-05-19 citations by CoLab: 142 PDF Abstract  
Abstract Multi-drug resistant pathogens have become a major threat to human health and new antibiotics are urgently needed. Most antibiotics are derived from secondary metabolites produced by bacteria. In order to avoid suicide, these bacteria usually encode resistance genes, in some cases within the biosynthetic gene cluster (BGC) of the respective antibiotic compound. Modern genome mining tools enable researchers to computationally detect and predict BGCs that encode the biosynthesis of secondary metabolites. The major challenge now is the prioritization of the most promising BGCs encoding antibiotics with novel modes of action. A recently developed target-directed genome mining approach allows researchers to predict the mode of action of the encoded compound of an uncharacterized BGC based on the presence of resistant target genes. In 2017, we introduced the ‘Antibiotic Resistant Target Seeker’ (ARTS). ARTS allows for specific and efficient genome mining for antibiotics with interesting and novel targets by rapidly linking housekeeping and known resistance genes to BGC proximity, duplication and horizontal gene transfer (HGT) events. Here, we present ARTS 2.0 available at http://arts.ziemertlab.com. ARTS 2.0 now includes options for automated target directed genome mining in all bacterial taxa as well as metagenomic data. Furthermore, it enables comparison of similar BGCs from different genomes and their putative resistance genes.
Mallick T.T., Rahman M.M., Siddique N., Shuvo K.H., Arafat K.Y., Homa S.F., Akter S., Karim M.R., Chandra Das Z., Hoque M.N.
Microbial Pathogenesis scimago Q2 wos Q2
2025-06-01 citations by CoLab: 0
Su C., Tuan N., Li W., Cheng J., Jin Y., Hong S., Lee H., Qader M., Klein L., Shetye G., Pauli G.F., Flanzblau S.G., Cho S., Zhao X., Suh J.
2025-06-01 citations by CoLab: 0
Saticioglu I.B., Ajmi N., Coskuner-Weber O., Alpsoy S., Ay H., Aydin F., Abay S., Karakaya E., Kayman T., Dalyan C., Koca F.D., Tasci G., Yarim D., Morick D., Yibar A., et. al.
2025-05-01 citations by CoLab: 0
MESGUIDA O., COMPANT S., WALLNER A., ANTONIELLI L., LOBINSKI R., GODIN S., LE BECHEC M., TERRASSE M., TAIBI A., DREUX-ZIGHA A., BERTHON J., GUYONEAUD R., REY P., ATTARD E.
Microbiological Research scimago Q1 wos Q1
2025-04-01 citations by CoLab: 0
Dirks A.C., Methven A.S., Miller A.N., Orozco-Quime M., Maurice S., Bonito G., Van Wyk J., Ahrendt S., Kuo A., Andreopoulos W., Riley R., Lipzen A., Chovatia M., Savage E., Barry K., et. al.
2025-04-01 citations by CoLab: 0

Top-30

Journals

10
20
30
40
50
60
70
10
20
30
40
50
60
70

Publishers

20
40
60
80
100
120
140
160
180
200
20
40
60
80
100
120
140
160
180
200
  • We do not take into account publications without a DOI.
  • Statistics recalculated only for publications connected to researchers, organizations and labs registered on the platform.
  • Statistics recalculated weekly.

Are you a researcher?

Create a profile to get free access to personal recommendations for colleagues and new articles.
Share
Cite this
GOST | RIS | BibTex | MLA
Found error?