Open Access
Open access
Chemistry - Methods, volume 2, issue 1

Image2SMILES: Transformer‐Based Molecular Optical Recognition Engine**

Publication typeJournal Article
Publication date2022-01-11
wos Q1
SJR
CiteScore7.3
Impact factor6.1
ISSN26289725
Materials Science (miscellaneous)
Abstract

The rise of deep learning in various scientific and technology areas promotes the development of AI‐based tools for information retrieval. Optical recognition of organic structures is a key part of the automated extraction of chemical information. However, this is a challenging task because there is a large variety of representation styles. In this research, we present a Transformer‐based artificial neural network to convert images of organic structures to molecular structures. To train the model, we created a comprehensive data generator that stochastically simulates various drawing styles, functional groups, functional group placeholders (R‐groups), and visual contamination. We demonstrate that the Transformer‐based architecture can gather chemical insights from our generator with almost absolute confidence. That means that, with Transformer, one can fully concentrate on data simulation to build a good recognition model. A web demo of our optical recognition engine is available online at Syntelly platform, and the code for dataset generation is available on GitHub.

Clevert D., Le T., Winter R., Montanari F.
Chemical Science scimago Q1 wos Q1 Open Access
2021-09-29 citations by CoLab: 46 PDF Abstract  
The automatic recognition of the molecular content of a molecule's graphical depiction is an extremely challenging problem that remains largely unsolved despite decades of research. Recent advances in neural machine translation enable the auto-encoding of molecular structures in a continuous vector space of fixed size (latent representation) with low reconstruction errors. In this paper, we present a fast and accurate model combining deep convolutional neural network learning from molecule depictions and a pre-trained decoder that translates the latent representation into the SMILES representation of the molecules. This combination allows us to precisely infer a molecular structure from an image. Our rigorous evaluation shows that Img2Mol is able to correctly translate up to 88% of the molecular depictions into their SMILES representation. A pretrained version of Img2Mol is made publicly available on GitHub for non-commercial users.
Rajan K., Zielesny A., Steinbeck C.
Journal of Cheminformatics scimago Q1 wos Q1 Open Access
2021-08-17 citations by CoLab: 44 PDF Abstract  
The amount of data available on chemical structures and their properties has increased steadily over the past decades. In particular, articles published before the mid-1990 are available only in printed or scanned form. The extraction and storage of data from those articles in a publicly accessible database are desirable, but doing this manually is a slow and error-prone process. In order to extract chemical structure depictions and convert them into a computer-readable format, Optical Chemical Structure Recognition (OCSR) tools were developed where the best performing OCSR tools are mostly rule-based. The DECIMER (Deep lEarning for Chemical ImagE Recognition) project was launched to address the OCSR problem with the latest computational intelligence methods to provide an automated open-source software solution. Various current deep learning approaches were explored to seek a best-fitting solution to the problem. In a preliminary communication, we outlined the prospect of being able to predict SMILES encodings of chemical structure depictions with about 90% accuracy using a dataset of 50–100 million molecules. In this article, the new DECIMER model is presented, a transformer-based network, which can predict SMILES with above 96% accuracy from depictions of chemical structures without stereochemical information and above 89% accuracy for depictions with stereochemical information.
Lee S., Kim S., Yoon S.H., Dagar A., Kim I.
Journal of Organic Chemistry scimago Q2 wos Q1
2021-08-16 citations by CoLab: 7 Abstract  
A new domino mode of assembly was discovered from the one-pot three-component reactions of pyrrole derivatives, active methylene compounds (malononitrile, methyl cyanoacetate, or ethyl cyanoacetate), and sodium cyanide in the presence of piperidinium acetate in EtOH at room temperature, leading to a novel tricyclic skeleton in excellent yield under mild and eco-friendly conditions. This well-choreographed domino process enabled formation of multiple bonds (three C-C and one C-O) for consecutive construction of two rings (pyrrolidine and dihydrofuran) in a diastereoselective manner.
Krasnov L., Khokhlov I., Fedorov M.V., Sosnin S.
Scientific Reports scimago Q1 wos Q1 Open Access
2021-07-20 citations by CoLab: 26 PDF Abstract  
AbstractWe developed a Transformer-based artificial neural approach to translate between SMILES and IUPAC chemical notations: Struct2IUPAC and IUPAC2Struct. The overall performance level of our model is comparable to the rule-based solutions. We proved that the accuracy and speed of computations as well as the robustness of the model allow to use it in production. Our showcase demonstrates that a neural-based solution can facilitate rapid development keeping the required level of accuracy. We believe that our findings will inspire other developers to reduce development costs by replacing complex rule-based solutions with neural-based ones.
Weir H., Thompson K., Woodward A., Choi B., Braun A., Martínez T.J.
Chemical Science scimago Q1 wos Q1 Open Access
2021-07-03 citations by CoLab: 27 PDF Abstract  
Inputting molecules into chemistry software, such as quantum chemistry packages, currently requires domain expertise, expensive software and/or cumbersome procedures. Leveraging recent breakthroughs in machine learning, we develop ChemPix: an offline, hand-drawn hydrocarbon structure recognition tool designed to remove these barriers. A neural image captioning approach consisting of a convolutional neural network (CNN) encoder and a long short-term memory (LSTM) decoder learned a mapping from photographs of hand-drawn hydrocarbon structures to machine-readable SMILES representations. We generated a large auxiliary training dataset, based on RDKit molecular images, by combining image augmentation, image degradation and background addition. Additionally, a small dataset of ∼600 hand-drawn hydrocarbon chemical structures was crowd-sourced using a phone web application. These datasets were used to train the image-to-SMILES neural network with the goal of maximizing the hand-drawn hydrocarbon recognition accuracy. By forming a committee of the trained neural networks where each network casts one vote for the predicted molecule, we achieved a nearly 10 percentage point improvement of the molecule recognition accuracy and were able to assign a confidence value for the prediction based on the number of agreeing votes. The ensemble model achieved an accuracy of 76% on hand-drawn hydrocarbons, increasing to 86% if the top 3 predictions were considered.
Feng X., Liao D., Liu D., Ping A., Li Z., Bian J.
Journal of Medicinal Chemistry scimago Q1 wos Q1
2020-11-20 citations by CoLab: 40 Abstract  
Indoleamine 2,3-dioxygenase 1 (IDO1) has received increasing attention due to its immunosuppressive function in connection with various diseases, including cancer. A recent increase in the understanding of IDO1 has significantly contributed to the discovery of numerous novel inhibitors, but the latest clinical outcomes raised questions and have indicated a future direction of IDO1 inhibition for therapeutic approaches. Herein, we present a comprehensive review of IDO1, discussing the latest advances in understanding the IDO1 structure and mechanism, an overview of recent IDO1 inhibitor discoveries and potential therapeutic applications to provide helpful information for medicinal chemists investigating IDO1 inhibitors.
Bazzaro M., Linder S.
Journal of Medicinal Chemistry scimago Q1 wos Q1
2020-11-04 citations by CoLab: 15 Abstract  
The biological responses to dienone compounds with a 1,5-diaryl-3-oxo-1,4-pentadienyl pharmacophore have been studied extensively. Despite their expected general thiol reactivity, these compounds display considerable degrees of tumor cell selectivity. Here we review in vitro and preclinical studies of dienone compounds including b-AP15, VLX1570, RA-9, RA-190, EF24, HO-3867, and MCB-613. A common property of these compounds is their targeting of the ubiquitin–proteasome system (UPS), known to be essential for the viability of tumor cells. Gene expression profiling experiments have shown induction of responses characteristic of UPS inhibition, and experiments using cellular reporter proteins have shown that proteasome inhibition is associated with cell death. Other mechanisms of action such as reactivation of mutant p53, stimulation of steroid receptor coactivators, and induction of protein cross-linking have also been described. Although unsuitable as biological probes due to widespread reactivity, dienone compounds are cytotoxic to apoptosis-resistant tumor cells and show activity in animal tumor models.
Krenn M., Häse F., Nigam A., Friederich P., Aspuru-Guzik A.
2020-10-28 citations by CoLab: 392 PDF Abstract  
Abstract The discovery of novel materials and functional molecules can help to solve some of society’s most urgent challenges, ranging from efficient energy harvesting and storage to uncovering novel pharmaceutical drug candidates. Traditionally matter engineering–generally denoted as inverse design–was based massively on human intuition and high-throughput virtual screening. The last few years have seen the emergence of significant interest in computer-inspired designs based on evolutionary or deep learning methods. The major challenge here is that the standard strings molecular representation SMILES shows substantial weaknesses in that task because large fractions of strings do not correspond to valid molecules. Here, we solve this problem at a fundamental level and introduce SELFIES (SELF-referencIng Embedded Strings), a string-based representation of molecules which is 100% robust. Every SELFIES string corresponds to a valid molecule, and SELFIES can represent every molecule. SELFIES can be directly applied in arbitrary machine learning models without the adaptation of the models; each of the generated molecule candidates is valid. In our experiments, the model’s internal memory stores two orders of magnitude more diverse molecules than a similar test with SMILES. Furthermore, as all molecules are valid, it allows for explanation and interpretation of the internal working of the generative models.
Rajan K., Zielesny A., Steinbeck C.
Journal of Cheminformatics scimago Q1 wos Q1 Open Access
2020-10-27 citations by CoLab: 58 PDF Abstract  
The automatic recognition of chemical structure diagrams from the literature is an indispensable component of workflows to re-discover information about chemicals and to make it available in open-access databases. Here we report preliminary findings in our development of Deep lEarning for Chemical ImagE Recognition (DECIMER), a deep learning method based on existing show-and-tell deep neural networks, which makes very few assumptions about the structure of the underlying problem. It translates a bitmap image of a molecule, as found in publications, into a SMILES. The training state reported here does not yet rival the performance of existing traditional approaches, but we present evidence that our method will reach a comparable detection power with sufficient training time. Training success of DECIMER depends on the input data representation: DeepSMILES are superior over SMILES and we have a preliminary indication that the recently reported SELFIES outperform DeepSMILES. An extrapolation of our results towards larger training data sizes suggests that we might be able to achieve near-accurate prediction with 50 to 100 million training structures. This work is entirely based on open-source software and open data and is available to the general public for any purpose.
Rajan K., Brinkhaus H.O., Zielesny A., Steinbeck C.
Journal of Cheminformatics scimago Q1 wos Q1 Open Access
2020-10-07 citations by CoLab: 45 PDF Abstract  
Structural information about chemical compounds is typically conveyed as 2D images of molecular structures in scientific documents. Unfortunately, these depictions are not a machine-readable representation of the molecules. With a backlog of decades of chemical literature in printed form not properly represented in open-access databases, there is a high demand for the translation of graphical molecular depictions into machine-readable formats. This translation process is known as Optical Chemical Structure Recognition (OCSR). Today, we are looking back on nearly three decades of development in this demanding research field. Most OCSR methods follow a rule-based approach where the key step of vectorization of the depiction is followed by the interpretation of vectors and nodes as bonds and atoms. Opposed to that, some of the latest approaches are based on deep neural networks (DNN). This review provides an overview of all methods and tools that have been published in the field of OCSR. Additionally, a small benchmark study was performed with the available open-source OCSR tools in order to examine their performance.
Cui X., Qiao X., Wang H., Huang G.
Journal of Organic Chemistry scimago Q2 wos Q1
2020-09-29 citations by CoLab: 15 Abstract  
A facile and expeditious protocol for the synthesis of 2-arylindoles compounds from readily available N-(2-pyridyl)anilines and commercially available α-Cl ketones through iridium-catalyzed C-H activation and cyclization is reported here. As a complementary approach to the conventional strategies for indole synthesis, the transformation exhibits powerful reactivity, tolerates a large number of functional groups and proceeds in good to excellent yields under mild conditions, providing a straightforward method to access structurally diverse and valuable indole scaffolds. Further, the reaction could be easily scaled up to gram scale.
Pesciullesi G., Schwaller P., Laino T., Reymond J.
Nature Communications scimago Q1 wos Q1 Open Access
2020-09-25 citations by CoLab: 127 PDF Abstract  
Organic synthesis methodology enables the synthesis of complex molecules and materials used in all fields of science and technology and represents a vast body of accumulated knowledge optimally suited for deep learning. While most organic reactions involve distinct functional groups and can readily be learned by deep learning models and chemists alike, regio- and stereoselective transformations are more challenging because their outcome also depends on functional group surroundings. Here, we challenge the Molecular Transformer model to predict reactions on carbohydrates where regio- and stereoselectivity are notoriously difficult to predict. We show that transfer learning of the general patent reaction model with a small set of carbohydrate reactions produces a specialized model returning predictions for carbohydrate reactions with remarkable accuracy. We validate these predictions experimentally with the synthesis of a lipid-linked oligosaccharide involving regioselective protections and stereoselective glycosylations. The transfer learning approach should be applicable to any reaction class of interest. Organic reactions can readily be learned by deep learning models, however, stereochemistry is still a challenge. Here, the authors fine tune a general model using a small dataset, then predict and validate experimentally regio- and stereo-selectivity for various carbohydrates transformations.
Oldenhof M., Arany A., Moreau Y., Simm J.
2020-09-14 citations by CoLab: 41 Abstract  
In drug discovery, knowledge of the graph structure of chemical compounds is essential. Many thousands of scientific articles and patents in chemistry and pharmaceutical sciences have investigated chemical compounds, but in many cases the details of the structure of these chemical compounds is published only as an image. A tool to analyze these images automatically and convert them into a chemical graph structure would be useful for many applications, such as drug discovery. A few such tools are available and they are mostly derived from optical character recognition. However, our evaluation of the performance of these tools reveals that they make often mistakes in recognizing the correct bond multiplicity and stereochemical information. In addition, errors sometimes even lead to missing atoms in the resulting graph. In our work, we address these issues by developing a compound recognition method based on machine learning. More specifically, we develop a deep neural network model for optical compound recognition. The deep learning solution presented here consists of a segmentation model, followed by three classification models that predict atom locations, bonds and charges. Furthermore, this model not only predicts the graph structure of the molecule but also produces all information necessary to relate each component of the resulting graph to the source image. This solution is scalable and can rapidly process thousands of images. Finally, we compare empirically the proposed method to a well-established tool and observe significant error reduction.
Aksenov N.A., Aksenov D.A., Skomorokhov A.A., Prityko L.A., Aksenov A.V., Griaznov G.D., Rubin M.
Journal of Organic Chemistry scimago Q2 wos Q1
2020-09-03 citations by CoLab: 14 Abstract  
Efficient and straightforward Bronsted-acid mediated cascade process was developed, involving cyclization of readily available beta-ketonitriles into 2-aminofurans and their subsequent recyclization into 2-(1H-indol-2-yl)acetamides is developed. This synthetic route opens a new avenue for an expeditious assembly of various isotryptamine derivatives for medicinal chemistry.
Tejeneki H.Z., Nikbakht A., Balalaie S., Rominger F.
Journal of Organic Chemistry scimago Q2 wos Q1
2020-06-16 citations by CoLab: 9 Abstract  
An efficient synthesis of diketopiperazinoindolines through an indium-catalyzed intramolecular 5-exo-dig cyclization of ortho-alkynyl diketopiperazines has been reported. The formation of diketopiperazinoindolines proceeds via a regio- and diastereoselective Conia-ene reaction. This synthetic method opens a new door for easy access to functionalized fused diketopiperazinoindolines in high to excellent yields with exclusive Z diastereoselectivity.
Liu P., Tao J., Ren Z.
Nature Machine Intelligence scimago Q1 wos Q1
2025-01-17 citations by CoLab: 0 Abstract  
Deep learning has significantly advanced molecular modelling and design, enabling an efficient understanding and discovery of novel molecules. In particular, large language models introduce a fresh research paradigm to tackle scientific problems from a natural language processing perspective. Large language models significantly enhance our understanding and generation of molecules, often surpassing existing methods with their capabilities to decode and synthesize complex molecular patterns. However, two key issues remain: how to quantify the match between model and data modalities and how to identify the knowledge-learning preferences of models. To address these challenges, we propose a multimodal benchmark, named ChEBI-20-MM, and perform 1,263 experiments to assess the model’s compatibility with data modalities and knowledge acquisition. Through the modal transition probability matrix, we provide insights into the most suitable modalities for tasks. Furthermore, we introduce a statistically interpretable approach to discover context-specific knowledge mapping by localized feature filtering. Our analysis offers an exploration of the learning mechanism and paves the way for advancing large language models in molecular science. Large language models promise substantial advances in molecular modelling and design. A multimodal benchmark is proposed to analyse performance, and 1,263 experiments are conducted to examine the compatibility of a large language model with data modalities and knowledge acquisition.
Wang R., Ji Y., Li Y., Lee S.
2024-12-31 citations by CoLab: 1
Chen Y., Leung C.T., Huang Y., Sun J., Chen H., Gao H.
Journal of Cheminformatics scimago Q1 wos Q1 Open Access
2024-12-18 citations by CoLab: 0 PDF Abstract  
In the field of chemical structure recognition, the task of converting molecular images into machine-readable data formats such as SMILES string stands as a significant challenge, primarily due to the varied drawing styles and conventions prevalent in chemical literature. To bridge this gap, we proposed MolNexTR, a novel image-to-graph deep learning model that collaborates to fuse the strengths of ConvNext, a powerful Convolutional Neural Network variant, and Vision-TRansformer. This integration facilitates a more detailed extraction of both local and global features from molecular images. MolNexTR can predict atoms and bonds simultaneously and understand their layout rules. It also excels at flexibly integrating symbolic chemistry principles to discern chirality and decipher abbreviated structures. We further incorporate a series of advanced algorithms, including an improved data augmentation module, an image contamination module, and a post-processing module for getting the final SMILES output. These modules cooperate to enhance the model’s robustness to diverse styles of molecular images found in real literature. In our test sets, MolNexTR has demonstrated superior performance, achieving an accuracy rate of 81–97%, marking a significant advancement in the domain of molecular structure recognition. Scientific contribution MolNexTR is a novel image-to-graph model that incorporates a unique dual-stream encoder to extract complex molecular image features, and combines chemical rules to predict atoms and bonds while understanding atom and bond layout rules. In addition, it employs a series of novel augmentation algorithms to significantly enhance the robustness and performance of the model.
Jiang J., Chen L., Ke L., Dou B., Zhang C., Feng H., Zhu Y., Qiu H., Zhang B., Wei G.
2024-08-30 citations by CoLab: 3
Ouyang H., Liu W., Tao J., Luo Y., Zhang W., Zhou J., Geng S., Zhang C.
Scientific Reports scimago Q1 wos Q1 Open Access
2024-07-25 citations by CoLab: 0 PDF Abstract  
AbstractChemical molecular structures are a direct and convenient means of expressing chemical knowledge, playing a vital role in academic communication. In chemistry, hand drawing is a common task for students and researchers. If we can convert hand-drawn chemical molecular structures into machine-readable formats, like SMILES encoding, computers can efficiently process and analyze these structures, significantly enhancing the efficiency of chemical research. Furthermore, with the progress of educational technology, automated grading is gaining popularity. When machines automatically recognize chemical molecular structures and assess the correctness of the drawings, it offers great convenience to teachers. We created ChemReco, a tool designed to identify chemical molecular structures involving three atoms: C, H, and O, providing convenience for chemical researchers. Currently, there are limited studies on hand-drawn chemical molecular structures. Therefore, the primary focus of this paper is constructing datasets. We propose a synthetic image method to rapidly generate images resembling hand-drawn chemical molecular structures, enhancing dataset acquisition efficiency. Regarding model selection, the hand-drawn chemical molecule structural recognition model developed in this article achieves a final recognition accuracy of 96.90%. This model employs the encoder-decoder architecture of EfficientNet + Transformer, demonstrating superior performance compared to other encoder-decoder combinations.
Lin F., Li J.
Complex & Intelligent Systems scimago Q1 wos Q1 Open Access
2024-07-22 citations by CoLab: 1 PDF Abstract  
AbstractOptical chemical structure recognition (OCSR) is a fundamental and crucial task in the field of chemistry, which aims at transforming intricate chemical structure images into machine-readable formats. Current deep learning-based OCSR methods typically use image feature extractors to extract visual features and employ encoder-decoder architectures for chemical structure recognition. However, the performance of these methods is limited by their image feature extractors and the class imbalance of elements in chemical structure representation. This paper proposes MPOCSR (multi-path optical chemical structure recognition), which introduces the multi-path Vision Transformer (MPViT) and the class-balanced (CB) loss function to address these two challenges. MPOCSR uses MPViT as an image feature extractor, combining the advantages of convolutional neural networks and Vision Transformers. This strategy enables the provision of richer visual information for subsequent decoding processes. Furthermore, MPOCSR incorporates CB loss function to rebalance the loss weights among different categories. For training and validation of our method, we constructed a dataset that includes both Markush and non-Markush structures. Experimental results show that MPOCSR achieves an accuracy of 90.95% on the test set, surpassing other existing methods.
Rajan K., Brinkhaus H.O., Zielesny A., Steinbeck C.
Journal of Cheminformatics scimago Q1 wos Q1 Open Access
2024-07-05 citations by CoLab: 1 PDF Abstract  
Abstract Accurate recognition of hand-drawn chemical structures is crucial for digitising hand-written chemical information in traditional laboratory notebooks or facilitating stylus-based structure entry on tablets or smartphones. However, the inherent variability in hand-drawn structures poses challenges for existing Optical Chemical Structure Recognition (OCSR) software. To address this, we present an enhanced Deep lEarning for Chemical ImagE Recognition (DECIMER) architecture that leverages a combination of Convolutional Neural Networks (CNNs) and Transformers to improve the recognition of hand-drawn chemical structures. The model incorporates an EfficientNetV2 CNN encoder that extracts features from hand-drawn images, followed by a Transformer decoder that converts the extracted features into Simplified Molecular Input Line Entry System (SMILES) strings. Our models were trained using synthetic hand-drawn images generated by RanDepict, a tool for depicting chemical structures with different style elements. A benchmark was performed using a real-world dataset of hand-drawn chemical structures to evaluate the model's performance. The results indicate that our improved DECIMER architecture exhibits a significantly enhanced recognition accuracy compared to other approaches. Scientific contribution The new DECIMER model presented here refines our previous research efforts and is currently the only open-source model tailored specifically for the recognition of hand-drawn chemical structures. The enhanced model performs better in handling variations in handwriting styles, line thicknesses, and background noise, making it suitable for real-world applications. The DECIMER hand-drawn structure recognition model and its source code have been made available as an open-source package under a permissive license. Graphical Abstract
Zhang D., Zhao D., Wang Z., Li J., Li J.
RSC Advances scimago Q1 wos Q2 Open Access
2024-06-06 citations by CoLab: 0 PDF Abstract  
In the growing body of scientific literature, the structure and information of drugs are usually represented in two-dimensional vector graphics.
Luong K., Singh A.
2024-05-30 citations by CoLab: 6
Liu H., Zhang H., Wang J., Dou J., Guo R., Li G., Liang Y., Yu J.
Energy scimago Q1 wos Q1
2024-05-01 citations by CoLab: 4 Abstract  
The construction of macromolecular models for the amorphous structure of coal can help reveal its physicochemical properties from a microscopic perspective and provide insight into its reaction mechanisms, leading to the development of cleaner coal technologies. However, this process requires careful consideration of characterization information. Researchers often need to intervene manually, which makes the task time-consuming. In this study, we proposed a multi-modal deep learning technique, namely ClipIRMol (contrastive language-image pre-training for infrared-molecule), for predicting coal molecular fragments based on the reverse molecular design method. On this basis, a structure evolution algorithm was developed to transform these fragments into a complex molecular structure model. Our approach takes elemental analysis, IR spectrum, and 13C NMR data as inputs. It is capable of constructing highly accurate molecular models of any different types of coal with atom count ranging from tens to thousands in just a few minutes. These spectra were simulated by quantum chemical calculations to show alignment with their experimental data. The introduced 3D molecular models grounded in topological structures overcome the limitation of traditional nearly-planar structures. This offers a new direction for macromolecular modeling of amorphous organic macromolecules.
Li D., Xu X., Pan J., Gao W., Zhang S.
2024-02-15 citations by CoLab: 1
Deagen M.E., Dalle-Cort B., Rebello N.J., Lin T., Walsh D.J., Olsen B.D.
Macromolecules scimago Q1 wos Q1
2023-12-18 citations by CoLab: 5
Wu Z., Zhang Z., Ding Z., Rodrigues J.J.
2023-12-04 citations by CoLab: 0
Morin L., Danelljan M., Agea M.I., Nassar A., Weber V., Meijer I., Staar P., Yu F.
2023-10-01 citations by CoLab: 4
He R., Gu S., Xu J., Li X., Chen H., Shao Z., Wang H., Shao J., Yin W., Qian L., Wei Z., Li Z.
2023-09-01 citations by CoLab: 4 Abstract  
AbstractSiderophores, a highly diverse family of secondary metabolites, play a crucial role in facilitating the acquisition of the essential iron. However, the current discovery of siderophore relies largely on manual approaches. In this work, we introduced SIDERTE, a digitized siderophore information database containing 822 siderophore records with 649 unique structures. Leveraging this digitalized dataset, we gained a systematic overview of siderophores by their clustering patterns in the chemical space. Building upon this, we developed a ligand-based method for predicting new iron-binding molecules. Applying this method to a commercial library, we experimentally confirmed that 40 out of the 48 molecules predicted as siderophore candidates possessed iron-binding abilities. Expanding our approach to the COCONUT natural product database, we predicted a staggering 3,199 siderophore candidates, showcasing remarkable structure diversity that are largely unexplored. Our study provides a valuable resource for accelerating the discovery of novel iron-binding molecules and advancing our understanding towards siderophores.

Top-30

Journals

1
2
3
4
1
2
3
4

Publishers

2
4
6
8
10
2
4
6
8
10
  • We do not take into account publications without a DOI.
  • Statistics recalculated only for publications connected to researchers, organizations and labs registered on the platform.
  • Statistics recalculated weekly.

Are you a researcher?

Create a profile to get free access to personal recommendations for colleagues and new articles.
Share
Cite this
GOST | RIS | BibTex
Found error?