Open Access
Chemical Science, volume 12, issue 42, pages 14174-14181
Img2Mol – accurate SMILES recognition from molecular graphical depictions
Djork-Arné Clevert
1
,
Tuan Le
1
,
Robin Winter
1
,
Floriane Montanari
1
1
Machine Learning Research, Bayer AG, Berlin, Germany
|
Publication type: Journal Article
Publication date: 2021-09-29
Journal:
Chemical Science
scimago Q1
wos Q1
SJR: 2.333
CiteScore: 14.4
Impact factor: 7.6
ISSN: 20416520, 20416539
PubMed ID:
34760202
General Chemistry
Abstract
The automatic recognition of the molecular content of a molecule's graphical depiction is an extremely challenging problem that remains largely unsolved despite decades of research. Recent advances in neural machine translation enable the auto-encoding of molecular structures in a continuous vector space of fixed size (latent representation) with low reconstruction errors. In this paper, we present a fast and accurate model combining deep convolutional neural network learning from molecule depictions and a pre-trained decoder that translates the latent representation into the SMILES representation of the molecules. This combination allows us to precisely infer a molecular structure from an image. Our rigorous evaluation shows that Img2Mol is able to correctly translate up to 88% of the molecular depictions into their SMILES representation. A pretrained version of Img2Mol is made publicly available on GitHub for non-commercial users.
Are you a researcher?
Create a profile to get free access to personal recommendations for colleagues and new articles.