Open Access
Open access
Chemical Science, volume 12, issue 42, pages 14174-14181

Img2Mol – accurate SMILES recognition from molecular graphical depictions

Djork-Arné Clevert 1
Tuan Le 1
Robin Winter 1
Floriane Montanari 1
1
 
Machine Learning Research, Bayer AG, Berlin, Germany
Publication typeJournal Article
Publication date2021-09-29
Journal: Chemical Science
scimago Q1
wos Q1
SJR2.333
CiteScore14.4
Impact factor7.6
ISSN20416520, 20416539
PubMed ID:  34760202
General Chemistry
Abstract
The automatic recognition of the molecular content of a molecule's graphical depiction is an extremely challenging problem that remains largely unsolved despite decades of research. Recent advances in neural machine translation enable the auto-encoding of molecular structures in a continuous vector space of fixed size (latent representation) with low reconstruction errors. In this paper, we present a fast and accurate model combining deep convolutional neural network learning from molecule depictions and a pre-trained decoder that translates the latent representation into the SMILES representation of the molecules. This combination allows us to precisely infer a molecular structure from an image. Our rigorous evaluation shows that Img2Mol is able to correctly translate up to 88% of the molecular depictions into their SMILES representation. A pretrained version of Img2Mol is made publicly available on GitHub for non-commercial users.

Top-30

Journals

1
2
3
4
5
6
7
1
2
3
4
5
6
7

Publishers

2
4
6
8
10
12
14
16
2
4
6
8
10
12
14
16
  • We do not take into account publications without a DOI.
  • Statistics recalculated only for publications connected to researchers, organizations and labs registered on the platform.
  • Statistics recalculated weekly.

Are you a researcher?

Create a profile to get free access to personal recommendations for colleagues and new articles.
Share
Cite this
GOST | RIS | BibTex | MLA
Found error?