Image2SMILES: Transformer‐Based Molecular Optical Recognition Engine**
The rise of deep learning in various scientific and technology areas promotes the development of AI‐based tools for information retrieval. Optical recognition of organic structures is a key part of the automated extraction of chemical information. However, this is a challenging task because there is a large variety of representation styles. In this research, we present a Transformer‐based artificial neural network to convert images of organic structures to molecular structures. To train the model, we created a comprehensive data generator that stochastically simulates various drawing styles, functional groups, functional group placeholders (R‐groups), and visual contamination. We demonstrate that the Transformer‐based architecture can gather chemical insights from our generator with almost absolute confidence. That means that, with Transformer, one can fully concentrate on data simulation to build a good recognition model. A web demo of our optical recognition engine is available online at Syntelly platform, and the code for dataset generation is available on GitHub.
Top-30
Journals
|
1
2
3
4
|
|
|
Journal of Cheminformatics
4 publications, 12.5%
|
|
|
Journal of Chemical Information and Modeling
3 publications, 9.38%
|
|
|
Briefings in Bioinformatics
1 publication, 3.13%
|
|
|
Molecular Informatics
1 publication, 3.13%
|
|
|
Lecture Notes in Computer Science
1 publication, 3.13%
|
|
|
28th International Conference on Intelligent User Interfaces
1 publication, 3.13%
|
|
|
npj Computational Materials
1 publication, 3.13%
|
|
|
Nature Communications
1 publication, 3.13%
|
|
|
Macromolecules
1 publication, 3.13%
|
|
|
Energy
1 publication, 3.13%
|
|
|
RSC Advances
1 publication, 3.13%
|
|
|
Complex & Intelligent Systems
1 publication, 3.13%
|
|
|
Scientific Reports
1 publication, 3.13%
|
|
|
Journal of Pharmaceutical Analysis
1 publication, 3.13%
|
|
|
Journal of Physical Chemistry Letters
1 publication, 3.13%
|
|
|
Nature Machine Intelligence
1 publication, 3.13%
|
|
|
Journal of Supercomputing
1 publication, 3.13%
|
|
|
Chemical Reviews
1 publication, 3.13%
|
|
|
Chemical Society Reviews
1 publication, 3.13%
|
|
|
Environmental Science and Technology Letters
1 publication, 3.13%
|
|
|
Plants
1 publication, 3.13%
|
|
|
1
2
3
4
|
Publishers
|
2
4
6
8
10
12
|
|
|
Springer Nature
11 publications, 34.38%
|
|
|
American Chemical Society (ACS)
7 publications, 21.88%
|
|
|
Institute of Electrical and Electronics Engineers (IEEE)
4 publications, 12.5%
|
|
|
Elsevier
2 publications, 6.25%
|
|
|
Royal Society of Chemistry (RSC)
2 publications, 6.25%
|
|
|
Oxford University Press
1 publication, 3.13%
|
|
|
Wiley
1 publication, 3.13%
|
|
|
Association for Computing Machinery (ACM)
1 publication, 3.13%
|
|
|
MDPI
1 publication, 3.13%
|
|
|
2
4
6
8
10
12
|
- We do not take into account publications without a DOI.
- Statistics recalculated weekly.