Open Access

Journal of Cheminformatics, volume 12, issue 1, publication number 65

DECIMER: towards deep learning for chemical image recognition

Kohulan Rajan ¹

Achim Zielesny ²

Christoph Steinbeck ¹

Institute for Inorganic and Analytical Chemistry, Friedrich-Schiller-University Jena, Jena, Germany |

Institute for Bioinformatics and Chemoinformatics, Westphalian University of Applied Sciences, Recklinghausen, Germany |

Publication type: Journal Article

Publication date: 2020-10-27

Springer Nature

Journal: Journal of Cheminformatics

scimago Q1

wos Q1

SJR: 1.745

CiteScore: 14.1

Impact factor: 7.1

ISSN: 17582946

DOI: 10.1186/s13321-020-00469-w

Copy DOI

PubMed ID: 33372621

Physical and Theoretical Chemistry

Computer Science Applications

Library and Information Sciences

Computer Graphics and Computer-Aided Design

Abstract

The automatic recognition of chemical structure diagrams from the literature is an indispensable component of workflows to re-discover information about chemicals and to make it available in open-access databases. Here we report preliminary findings in our development of Deep lEarning for Chemical ImagE Recognition (DECIMER), a deep learning method based on existing show-and-tell deep neural networks, which makes very few assumptions about the structure of the underlying problem. It translates a bitmap image of a molecule, as found in publications, into a SMILES. The training state reported here does not yet rival the performance of existing traditional approaches, but we present evidence that our method will reach a comparable detection power with sufficient training time. Training success of DECIMER depends on the input data representation: DeepSMILES are superior over SMILES and we have a preliminary indication that the recently reported SELFIES outperform DeepSMILES. An extrapolation of our results towards larger training data sizes suggests that we might be able to achieve near-accurate prediction with 50 to 100 million training structures. This work is entirely based on open-source software and open data and is available to the general public for any purpose.

Found 17

By date By citations

Self-referencing embedded strings (SELFIES): A 100% robust molecular string representation

Krenn M., Häse F., Nigam A., Friederich P., Aspuru-Guzik A.

Machine Learning: Science and Technology scimago Q1 wos Q1 Open Access

2020-10-28, citations by CoLab: 392 , PDF, Abstract

ChemGrapher: Optical Graph Recognition of Chemical Compounds by Deep Learning

Oldenhof M., Arany A., Moreau Y., Simm J.

Journal of Chemical Information and Modeling scimago Q1 wos Q1 ,

2020-09-14, citations by CoLab: 41 , Abstract

Cytotoxic Scalarane Sesterterpenes from the Sponge Hyrtios erectus

Kwon O., Kim D., Kim C., Sun J., Sim C.J., Oh D., Lee S.K., Oh K., Shin J.

Marine Drugs scimago Q1 wos Q1 Open Access

2020-05-13, citations by CoLab: 12 , PDF, Abstract

Molecular Structure Extraction From Documents Using Deep Learning

Staker J., Marshall K., Abel R., McQuaw C.M.

Journal of Chemical Information and Modeling scimago Q1 wos Q1 ,

2019-02-13, citations by CoLab: 67 , Abstract

PubChem 2019 update: improved access to chemical data

Kim S., Chen J., Cheng T., Gindulyte A., He J., He S., Li Q., Shoemaker B.A., Thiessen P.A., Yu B., Zaslavsky L., Zhang J., Bolton E.E.

Nucleic Acids Research scimago Q1 wos Q1 Open Access

2018-10-29, citations by CoLab: 2385 , PDF, Abstract

Mastering the game of Go without human knowledge

Silver D., Schrittwieser J., Simonyan K., Antonoglou I., Huang A., Guez A., Hubert T., Baker L., Lai M., Bolton A., Chen Y., Lillicrap T., Hui F., Sifre L., van den Driessche G., et. al.

Nature scimago Q1 wos Q1 ,

2017-10-17, citations by CoLab: 5545 , Abstract

The Chemistry Development Kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching

Willighagen E.L., Mayfield J.W., Alvarsson J., Berg A., Carlsson L., Jeliazkova N., Kuhn S., Pluskal T., Rojas-Chertó M., Spjuth O., Torrance G., Evelo C.T., Guha R., Steinbeck C.

Journal of Cheminformatics scimago Q1 wos Q1 Open Access

2017-06-06, citations by CoLab: 301 , PDF, Abstract

Institute of Electrical and Electronics Engineers (IEEE)

Rethinking the Inception Architecture for Computer Vision

Szegedy C., Vanhoucke V., Ioffe S., Shlens J., Wojna Z.

2016-06-01, citations by CoLab: 17642 , Abstract

Optical Structure Recognition Software To Recover Chemical Information: OSRA, An Open Source Solution

Filippov I.V., Nicklaus M.C.

Journal of Chemical Information and Modeling scimago Q1 wos Q1 ,

2009-02-17, citations by CoLab: 126 , Abstract

Automated extraction of chemical structure information from digital raster images

Park J., Rosania G.R., Shedden K.A., Nguyen M., Lyu N., Saitou K.

Chemistry Central Journal Open Access

2009-02-05, citations by CoLab: 60 , PDF, Abstract

Automatic Recognition of Chemical Images

Algorri M., Zimmermann M., Hofmann-Apitius M.

2007-09-01, citations by CoLab: 8

Reconstruction of Chemical Molecules from Images

Algorri M., Zimmermann M., Friedrich C.M., Akle S., Hofmann-Apitius M.

2007-08-01, citations by CoLab: 9

Chemical literature data extraction: The CLiDE Project

Ibison P., Jacquot M., Kam F., Neville A.G., Simpson R.W., Tonnelier C., Venczel T., Johnson A.P.

Journal of Chemical Information and Computer Sciences ,

1993-05-01, citations by CoLab: 56

Kekule: OCR-optical chemical (structure) recognition

McDaniel J.R., Balmuth J.R.

Journal of Chemical Information and Computer Sciences ,

1992-07-01, citations by CoLab: 71

New Computer Program Reads, Interprets Chemical Structures

BORMAN S.

Chemical & Engineering News ,

1992-03-23, citations by CoLab: 4

Found 58

By date By citations

Role of Artificial Intelligence in Drug Discovery to Revolutionize the Pharmaceutical Industry: Resources, Methods and Applications

singh P.K., Sachan K., Khandelwal V., Singh S., Singh S.

Recent Patents on Biotechnology scimago Q3 ,

2025-03-01, citations by CoLab: 1 , Abstract

Transform Drug Discovery and Development With Generative Artificial Intelligence

Lavecchia A.

2025-01-03, citations by CoLab: 0

STOUT V2.0: SMILES to IUPAC name conversion using transformer models

Rajan K., Zielesny A., Steinbeck C.

Journal of Cheminformatics scimago Q1 wos Q1 Open Access

2024-12-27, citations by CoLab: 0 , PDF, Abstract

MolNexTR: a generalized deep learning model for molecular image recognition

Chen Y., Leung C.T., Huang Y., Sun J., Chen H., Gao H.

Journal of Cheminformatics scimago Q1 wos Q1 Open Access

2024-12-18, citations by CoLab: 0 , PDF, Abstract

MIBiG 4.0: advancing biosynthetic gene cluster curation through global collaboration

Zdouc M., Blin K., Louwen N.L., Navarro J., Loureiro C., Bader C., Bailey C., Barra L., Booth T., Bozhüyük K.J., Cediel-Becerra J.D., Charlop-Powers Z., Chevrette M., Chooi Y.H., D’Agostino P., et. al.

Nucleic Acids Research scimago Q1 wos Q1 Open Access

2024-12-09, citations by CoLab: 10 , PDF, Abstract

Revealing Chemical Trends: Insights from Data-Driven Visualization and Patent Analysis in Exposomics Research

Aurich D., Schymanski E.L., de Jesus Matias F., Thiessen P.A., Pang J.

Environmental Science and Technology Letters scimago Q1 wos Q1 ,

2024-08-30, citations by CoLab: 1

Accelerating Materials Discovery for Polymer Solar Cells: Data-Driven Insights Enabled by Natural Language Processing

Shetty P., Adeboye A., Gupta S., Zhang C., Ramprasad R.

Chemistry of Materials scimago Q1 wos Q1 ,

2024-08-06, citations by CoLab: 3

ChemReco: automated recognition of hand-drawn carbon–hydrogen–oxygen structures using deep learning

Ouyang H., Liu W., Tao J., Luo Y., Zhang W., Zhou J., Geng S., Zhang C.

Scientific Reports scimago Q1 wos Q1 Open Access

2024-07-25, citations by CoLab: 0 , PDF, Abstract

MPOCSR: optical chemical structure recognition based on multi-path Vision Transformer

Lin F., Li J.

Complex & Intelligent Systems scimago Q1 wos Q1 Open Access

2024-07-22, citations by CoLab: 1 , PDF, Abstract

pKalculator: A pK_a predictor for C–H bonds

Borup R.M., Ree N., Jensen J.H.

Beilstein Journal of Organic Chemistry scimago Q2 wos Q2 Open Access

2024-07-16, citations by CoLab: 0 , Abstract

Advancements in hand-drawn chemical structure recognition through an enhanced DECIMER architecture

Rajan K., Brinkhaus H.O., Zielesny A., Steinbeck C.

Journal of Cheminformatics scimago Q1 wos Q1 Open Access

2024-07-05, citations by CoLab: 1 , PDF, Abstract

ChemScraper: leveraging PDF graphics instructions for molecular diagram parsing

Shah A.K., Amador B., Dey A., Creekmore M., Ocampo B., Denmark S., Zanibbi R.

International Journal on Document Analysis and Recognition scimago Q1 wos Q3 Open Access

2024-07-05, citations by CoLab: 0 , PDF, Abstract

Automation and machine learning augmented by large language models in a catalysis study

Su Y., Wang X., Ye Y., Xie Y., Xu Y., Jiang Y., Wang C.

Chemical Science scimago Q1 wos Q1 Open Access

2024-06-26, citations by CoLab: 11 , PDF, Abstract

Representations of lipid nanoparticles using large language models for transfection efficiency prediction

Moayedpour S., Broadbent J., Riahi S., Bailey M., Vu Thu H., Dobchev D., Balsubramani A., Nascimento Dos Santos R., Kogler-Anele L., Corrochano-Navarro A., Li S., Ulloa Montoya F., Agarwal V., Bar-Joseph Z., Jager S.

Bioinformatics scimago Q1 wos Q1 Open Access

2024-05-29, citations by CoLab: 3 , PDF, Abstract

Comparing software tools for optical chemical structure recognition

Krasnov A., Barnabas S.J., Boehme T., Boyer S.K., Weber L.

Digital Discovery scimago Q1 wos Q1 Open Access

2024-03-07, citations by CoLab: 1 , PDF, Abstract

	2 4 6 8 10 12
Journal of Cheminformatics	Journal of Cheminformatics, 11, 19.3% Journal of Cheminformatics 11 publications, 19.3%
Journal of Chemical Information and Modeling	Journal of Chemical Information and Modeling, 4, 7.02% Journal of Chemical Information and Modeling 4 publications, 7.02%
Briefings in Bioinformatics	Briefings in Bioinformatics, 3, 5.26% Briefings in Bioinformatics 3 publications, 5.26%
Digital Discovery	Digital Discovery, 3, 5.26% Digital Discovery 3 publications, 5.26%
Chemical Science	Chemical Science, 2, 3.51% Chemical Science 2 publications, 3.51%
Bioinformatics	Bioinformatics, 2, 3.51% Bioinformatics 2 publications, 3.51%
Chemistry - Methods	Chemistry - Methods, 1, 1.75% Chemistry - Methods 1 publication, 1.75%
Molecular Informatics	Molecular Informatics, 1, 1.75% Molecular Informatics 1 publication, 1.75%
Frontiers in Microbiology	Frontiers in Microbiology, 1, 1.75% Frontiers in Microbiology 1 publication, 1.75%
CMES - Computer Modeling in Engineering and Sciences	CMES - Computer Modeling in Engineering and Sciences, 1, 1.75% CMES - Computer Modeling in Engineering and Sciences 1 publication, 1.75%
Nature Machine Intelligence	Nature Machine Intelligence, 1, 1.75% Nature Machine Intelligence 1 publication, 1.75%
npj Computational Materials	npj Computational Materials, 1, 1.75% npj Computational Materials 1 publication, 1.75%
Patterns	Patterns, 1, 1.75% Patterns 1 publication, 1.75%
Advanced Intelligent Systems	Advanced Intelligent Systems, 1, 1.75% Advanced Intelligent Systems 1 publication, 1.75%
Natural Product Reports	Natural Product Reports, 1, 1.75% Natural Product Reports 1 publication, 1.75%
Annual Review of Physical Chemistry	Annual Review of Physical Chemistry, 1, 1.75% Annual Review of Physical Chemistry 1 publication, 1.75%
Journal of Biomolecular Structure and Dynamics	Journal of Biomolecular Structure and Dynamics, 1, 1.75% Journal of Biomolecular Structure and Dynamics 1 publication, 1.75%
Advanced Functional Materials	Advanced Functional Materials, 1, 1.75% Advanced Functional Materials 1 publication, 1.75%
Nature Communications	Nature Communications, 1, 1.75% Nature Communications 1 publication, 1.75%
Challenges and Advances in Computational Chemistry and Physics	Challenges and Advances in Computational Chemistry and Physics, 1, 1.75% Challenges and Advances in Computational Chemistry and Physics 1 publication, 1.75%
Current Opinion in Chemical Biology	Current Opinion in Chemical Biology, 1, 1.75% Current Opinion in Chemical Biology 1 publication, 1.75%
Applied AI Letters	Applied AI Letters, 1, 1.75% Applied AI Letters 1 publication, 1.75%
International Journal on Document Analysis and Recognition	International Journal on Document Analysis and Recognition, 1, 1.75% International Journal on Document Analysis and Recognition 1 publication, 1.75%
Beilstein Journal of Organic Chemistry	Beilstein Journal of Organic Chemistry, 1, 1.75% Beilstein Journal of Organic Chemistry 1 publication, 1.75%
Complex & Intelligent Systems	Complex & Intelligent Systems, 1, 1.75% Complex & Intelligent Systems 1 publication, 1.75%
Scientific Reports	Scientific Reports, 1, 1.75% Scientific Reports 1 publication, 1.75%
Chemistry of Materials	Chemistry of Materials, 1, 1.75% Chemistry of Materials 1 publication, 1.75%
Environmental Science and Technology Letters	Environmental Science and Technology Letters, 1, 1.75% Environmental Science and Technology Letters 1 publication, 1.75%
Nucleic Acids Research	Nucleic Acids Research, 1, 1.75% Nucleic Acids Research 1 publication, 1.75%
	2 4 6 8 10 12

	2 4 6 8 10 12 14 16 18
Springer Nature	Springer Nature, 18, 31.58% Springer Nature 18 publications, 31.58%
Wiley	Wiley, 7, 12.28% Wiley 7 publications, 12.28%
Oxford University Press	Oxford University Press, 6, 10.53% Oxford University Press 6 publications, 10.53%
Royal Society of Chemistry (RSC)	Royal Society of Chemistry (RSC), 6, 10.53% Royal Society of Chemistry (RSC) 6 publications, 10.53%
American Chemical Society (ACS)	American Chemical Society (ACS), 6, 10.53% American Chemical Society (ACS) 6 publications, 10.53%
Institute of Electrical and Electronics Engineers (IEEE)	Institute of Electrical and Electronics Engineers (IEEE), 6, 10.53% Institute of Electrical and Electronics Engineers (IEEE) 6 publications, 10.53%
Elsevier	Elsevier, 2, 3.51% Elsevier 2 publications, 3.51%
Frontiers Media S.A.	Frontiers Media S.A., 1, 1.75% Frontiers Media S.A. 1 publication, 1.75%
Tech Science Press	Tech Science Press, 1, 1.75% Tech Science Press 1 publication, 1.75%
Annual Reviews	Annual Reviews, 1, 1.75% Annual Reviews 1 publication, 1.75%
Taylor & Francis	Taylor & Francis, 1, 1.75% Taylor & Francis 1 publication, 1.75%
Beilstein-Institut	Beilstein-Institut, 1, 1.75% Beilstein-Institut 1 publication, 1.75%
	2 4 6 8 10 12 14 16 18

Are you a researcher?

Create a profile to get free access to personal recommendations for colleagues and new articles.

Publication PDF

Metrics

Cite this

GOST | RIS | BibTex

Found error?

Publisher

Springer Nature

Journal

Journal of Cheminformatics

scimago Q1

wos Q1

SJR

1.745

CiteScore

14.1

Impact factor

7.1

ISSN

17582946 (Print, Electronic)

DECIMER: towards deep learning for chemical image recognition

Top-30

Journals

Publishers

Are you a researcher?