Open Access

Chemical Science, volume 12, issue 31, pages 10622-10633

ChemPix: automated recognition of hand-drawn hydrocarbon structures using deep learning

Hayley Weir ^{1, 2}

Keiran C Thompson ^{1, 2}

Amelia Woodward ¹

Benjamin Choi ³

Augustin Braun ¹

Todd J. Martinez ^{1, 2}

Hide authors affiliations

Department of Chemistry, Stanford University, Stanford, CA 94305, USA |

SLAC National Accelerator Laboratory, 2575 Sand Hill Road, Menlo Park, CA 94025, USA |

Department of Electrical Engineering, Stanford University, Stanford, CA 94305, USA |

Publication type: Journal Article

Publication date: 2021-07-03

Royal Society of Chemistry (RSC)

Journal: Chemical Science

scimago Q1

wos Q1

SJR: 2.333

CiteScore: 14.4

Impact factor: 7.6

ISSN: 20416520, 20416539

DOI: 10.1039/D1SC02957F

Copy DOI

PubMed ID: 34447555

General Chemistry

Abstract

Inputting molecules into chemistry software, such as quantum chemistry packages, currently requires domain expertise, expensive software and/or cumbersome procedures. Leveraging recent breakthroughs in machine learning, we develop ChemPix: an offline, hand-drawn hydrocarbon structure recognition tool designed to remove these barriers. A neural image captioning approach consisting of a convolutional neural network (CNN) encoder and a long short-term memory (LSTM) decoder learned a mapping from photographs of hand-drawn hydrocarbon structures to machine-readable SMILES representations. We generated a large auxiliary training dataset, based on RDKit molecular images, by combining image augmentation, image degradation and background addition. Additionally, a small dataset of ∼600 hand-drawn hydrocarbon chemical structures was crowd-sourced using a phone web application. These datasets were used to train the image-to-SMILES neural network with the goal of maximizing the hand-drawn hydrocarbon recognition accuracy. By forming a committee of the trained neural networks where each network casts one vote for the predicted molecule, we achieved a nearly 10 percentage point improvement of the molecule recognition accuracy and were able to assign a confidence value for the prediction based on the number of agreeing votes. The ensemble model achieved an accuracy of 76% on hand-drawn hydrocarbons, increasing to 86% if the top 3 predictions were considered.

Found 29

By date By citations

Voice-controlled quantum chemistry

Raucci U., Valentini A., Pieri E., Weir H., Seritan S., Martínez T.J.

Nature Computational Science scimago Q1 wos Q1 ,

2021-01-14, citations by CoLab: 13 , Abstract

Machine Learning for Molecular Simulation

Noé F., Tkatchenko A., Müller K., Clementi C.

Annual Review of Physical Chemistry scimago Q1 wos Q1 ,

2020-04-20, citations by CoLab: 593 , Abstract

TeraChem Cloud: A High-Performance Computing Service for Scalable Distributed GPU-Accelerated Electronic Structure Calculations

Seritan S., Thompson K., Martínez T.J.

Journal of Chemical Information and Modeling scimago Q1 wos Q1 ,

2020-04-08, citations by CoLab: 25 , Abstract

ChemSchematicResolver: A Toolkit to Decode 2D Chemical Diagrams with Labels and R-Groups into Annotated Chemical Named Entities

Beard E.J., Cole J.M.

Journal of Chemical Information and Modeling scimago Q1 wos Q1 ,

2020-03-26, citations by CoLab: 32 , Abstract

Building attention and edge message passing neural networks for bioactivity and physical–chemical property prediction

Withnall M., Lindelöf E., Engkvist O., Chen H.

Journal of Cheminformatics scimago Q1 wos Q1 Open Access

2020-01-08, citations by CoLab: 147 , PDF, Abstract

Molecular Structure Extraction From Documents Using Deep Learning

Staker J., Marshall K., Abel R., McQuaw C.M.

Journal of Chemical Information and Modeling scimago Q1 wos Q1 ,

2019-02-13, citations by CoLab: 67 , Abstract

Association for Computing Machinery (ACM)

ImageNet classification with deep convolutional neural networks

Krizhevsky A., Sutskever I., Hinton G.E.

Communications of the ACM scimago Q1 wos Q1 ,

2017-05-24, citations by CoLab: 35322 , Abstract

Perspective: Machine learning potentials for atomistic simulations

Behler J.

Journal of Chemical Physics scimago Q1 wos Q1 ,

2016-11-01, citations by CoLab: 1009 , PDF, Abstract

Institute of Electrical and Electronics Engineers (IEEE)

Convolutional Neural Networks for Medical Image Analysis: Full Training or Fine Tuning?

Tajbakhsh N., Shin J.Y., Gurudu S.R., Hurst R.T., Kendall C.B., Gotway M.B., Liang J.

IEEE Transactions on Medical Imaging scimago Q1 wos Q1 ,

2016-05-01, citations by CoLab: 2244 , Abstract

American Association for the Advancement of Science (AAAS)

Advances in natural language processing

Hirschberg J., Manning C.D.

Science scimago Q1 wos Q1 Open Access

2015-07-17, citations by CoLab: 963 , PDF, Abstract

Deep learning

LeCun Y., Bengio Y., Hinton G.

Nature scimago Q1 wos Q1 ,

2015-05-27, citations by CoLab: 57034 , Abstract

Markov Logic Networks for Optical Chemical Structure Recognition

Frasconi P., Gabbrielli F., Lippi M., Marinai S.

Journal of Chemical Information and Modeling scimago Q1 wos Q1 ,

2014-08-06, citations by CoLab: 22 , Abstract

Fast and accurate modeling of molecular atomization energies with machine learning.

Rupp M., Tkatchenko A., Müller K., von Lilienfeld O.A.

Physical Review Letters scimago Q1 wos Q1 Open Access

2012-01-31, citations by CoLab: 1711 , Abstract

970 Million Druglike Small Molecules for Virtual Screening in the Chemical Universe Database GDB-13

Blum L.C., Reymond J.

Journal of the American Chemical Society scimago Q1 wos Q1 ,

2009-06-08, citations by CoLab: 592 , Abstract

CLiDE Pro: The Latest Generation of CLiDE, a Tool for Optical Chemical Structure Recognition

Valko A.T., Johnson A.P.

Journal of Chemical Information and Modeling scimago Q1 wos Q1 ,

2009-03-19, citations by CoLab: 65 , Abstract

Found 27

By date By citations

Endocrine disruptor identification and multitoxicity level assessment of organic chemicals: an example of multiple machine learning models

Hao N., Zhao Y., Sun P., Deng Z., Cui X., Liu J., Zhao W.

Journal of Hazardous Materials scimago Q1 wos Q1 ,

2025-03-01, citations by CoLab: 0

Recognition of Hand-Drawn Hydrocarbon Structure Formulas Using Anchor-Free Detector

Tao J., Liu W., Peng X., He X., Luo Y.

Lecture Notes in Computer Science scimago Q2 Open Access

2024-11-12, citations by CoLab: 0 , Abstract

Prediction of the Infrared Absorbance Intensities and Frequencies of Hydrocarbons: A Message Passing Neural Network Approach

Shaban Tameh M., Coropceanu V., Purcell T.A., Brédas J.

Journal of Physical Chemistry A scimago Q2 wos Q2 ,

2024-10-28, citations by CoLab: 0

Image Style Conversion Model Design Based on Generative Adversarial Networks

Gong K., Zhen Z.

IEEE Access scimago Q1 wos Q2 Open Access

2024-09-02, citations by CoLab: 0

ChemReco: automated recognition of hand-drawn carbon–hydrogen–oxygen structures using deep learning

Ouyang H., Liu W., Tao J., Luo Y., Zhang W., Zhou J., Geng S., Zhang C.

Scientific Reports scimago Q1 wos Q1 Open Access

2024-07-25, citations by CoLab: 0 , PDF, Abstract

Revolution of Artificial Intelligence in Computational Chemistry Breakthroughs

Anjaneyulu B., Goswami S., Banik P., Chauhan V., Raghav N., Chinmay

Chemistry Africa scimago Q3 wos Q3 ,

2024-05-31, citations by CoLab: 1 , Abstract

Deep Aramaic: Towards a synthetic data paradigm enabling machine learning in epigraphy

Aioanei A.C., Hunziker-Rodewald R.R., Klein K.M., Michels D.L.

PLoS ONE scimago Q1 wos Q1 Open Access

2024-04-19, citations by CoLab: 1 , PDF, Abstract

An Overview of Hand-Drawn Diagram Recognition Methods and Applications

Agrawal V., Jagtap J., Kantipudi M.P.

IEEE Access scimago Q1 wos Q2 Open Access

2024-01-24, citations by CoLab: 4

Software for Drug Discovery and Protein Engineering: A Comparison Between the Alternatives and Recent Advancements in Computational Biology

Adhikary T., Basak P.

2023-08-28, citations by CoLab: 0 , Abstract

DECIMER.ai: an open platform for automated optical chemical structure identification, segmentation and recognition in scientific publications

Rajan K., Brinkhaus H.O., Agea M.I., Zielesny A., Steinbeck C.

Nature Communications scimago Q1 wos Q1 Open Access

2023-08-19, citations by CoLab: 24 , PDF, Abstract

Research on automatic recognition of hand-drawn chemical molecular structures based on deep learning

Ouyang H., Liu W., Tao J., Luo Y., Zhang W., Zhou J., Geng S., Zhang C.

2023-08-17, citations by CoLab: 0 , Abstract

Automatic Analysis of Student Drawings in Chemistry Classes

Stamatakis M., Gritz W., Oldag J., Hoppe A., Schanze S., Ewerth R.

Lecture Notes in Computer Science scimago Q2 Open Access

2023-06-25, citations by CoLab: 0 , Abstract

Discovery of Multitarget Inhibitors against Insect Chitinolytic Enzymes via Machine Learning-Based Virtual Screening

Ding Y., Chen S., Liu H., Liu T., Yang Q.

Journal of Agricultural and Food Chemistry scimago Q1 wos Q1 ,

2023-05-31, citations by CoLab: 17

Interactive Quantum Chemistry Enabled by Machine Learning, Graphical Processing Units, and Cloud Computing

Raucci U., Weir H., Sakshuwong S., Seritan S., Hicks C.B., Vannucci F., Rea F., Martínez T.J.

Annual Review of Physical Chemistry scimago Q1 wos Q1 ,

2023-04-24, citations by CoLab: 9 , Abstract

Open data and algorithms for open science in AI-driven molecular informatics

Brinkhaus H.O., Rajan K., Schaub J., Zielesny A., Steinbeck C.

Current Opinion in Structural Biology scimago Q1 wos Q1 ,

2023-04-01, citations by CoLab: 12 , Abstract

	1 2
Journal of Cheminformatics	Journal of Cheminformatics, 2, 7.41% Journal of Cheminformatics 2 publications, 7.41%
Lecture Notes in Computer Science	Lecture Notes in Computer Science, 2, 7.41% Lecture Notes in Computer Science 2 publications, 7.41%
IEEE Access	IEEE Access, 2, 7.41% IEEE Access 2 publications, 7.41%
Chemistry - Methods	Chemistry - Methods, 1, 3.7% Chemistry - Methods 1 publication, 3.7%
Journal of Chemical Physics	Journal of Chemical Physics, 1, 3.7% Journal of Chemical Physics 1 publication, 3.7%
Briefings in Bioinformatics	Briefings in Bioinformatics, 1, 3.7% Briefings in Bioinformatics 1 publication, 3.7%
Molecular Informatics	Molecular Informatics, 1, 3.7% Molecular Informatics 1 publication, 3.7%
Digital Discovery	Digital Discovery, 1, 3.7% Digital Discovery 1 publication, 3.7%
Journal of Chemical Information and Modeling	Journal of Chemical Information and Modeling, 1, 3.7% Journal of Chemical Information and Modeling 1 publication, 3.7%
Current Opinion in Structural Biology	Current Opinion in Structural Biology, 1, 3.7% Current Opinion in Structural Biology 1 publication, 3.7%
Journal of Chemical Theory and Computation	Journal of Chemical Theory and Computation, 1, 3.7% Journal of Chemical Theory and Computation 1 publication, 3.7%
Chemical Science	Chemical Science, 1, 3.7% Chemical Science 1 publication, 3.7%
Annual Review of Physical Chemistry	Annual Review of Physical Chemistry, 1, 3.7% Annual Review of Physical Chemistry 1 publication, 3.7%
Journal of Agricultural and Food Chemistry	Journal of Agricultural and Food Chemistry, 1, 3.7% Journal of Agricultural and Food Chemistry 1 publication, 3.7%
Nature Communications	Nature Communications, 1, 3.7% Nature Communications 1 publication, 3.7%
PLoS ONE	PLoS ONE, 1, 3.7% PLoS ONE 1 publication, 3.7%
Chemistry Africa	Chemistry Africa, 1, 3.7% Chemistry Africa 1 publication, 3.7%
Scientific Reports	Scientific Reports, 1, 3.7% Scientific Reports 1 publication, 3.7%
Journal of Physical Chemistry A	Journal of Physical Chemistry A, 1, 3.7% Journal of Physical Chemistry A 1 publication, 3.7%
Journal of Hazardous Materials	Journal of Hazardous Materials, 1, 3.7% Journal of Hazardous Materials 1 publication, 3.7%
	1 2

	1 2 3 4 5 6 7
Springer Nature	Springer Nature, 7, 25.93% Springer Nature 7 publications, 25.93%
American Chemical Society (ACS)	American Chemical Society (ACS), 4, 14.81% American Chemical Society (ACS) 4 publications, 14.81%
Institute of Electrical and Electronics Engineers (IEEE)	Institute of Electrical and Electronics Engineers (IEEE), 4, 14.81% Institute of Electrical and Electronics Engineers (IEEE) 4 publications, 14.81%
Wiley	Wiley, 2, 7.41% Wiley 2 publications, 7.41%
Royal Society of Chemistry (RSC)	Royal Society of Chemistry (RSC), 2, 7.41% Royal Society of Chemistry (RSC) 2 publications, 7.41%
Elsevier	Elsevier, 2, 7.41% Elsevier 2 publications, 7.41%
AIP Publishing	AIP Publishing, 1, 3.7% AIP Publishing 1 publication, 3.7%
Oxford University Press	Oxford University Press, 1, 3.7% Oxford University Press 1 publication, 3.7%
Annual Reviews	Annual Reviews, 1, 3.7% Annual Reviews 1 publication, 3.7%
Public Library of Science (PLoS)	Public Library of Science (PLoS), 1, 3.7% Public Library of Science (PLoS) 1 publication, 3.7%
	1 2 3 4 5 6 7

Are you a researcher?

Create a profile to get free access to personal recommendations for colleagues and new articles.

Publication PDF

Metrics

Cite this

GOST | RIS | BibTex | MLA

Found error?

Publisher

Royal Society of Chemistry (RSC)

Journal

Chemical Science

scimago Q1

wos Q1

SJR

2.333

CiteScore

14.4

Impact factor

7.6

ISSN

20416520 (Print)

20416539 (Electronic)

Profiles

Martinez, Todd J

ChemPix: automated recognition of hand-drawn hydrocarbon structures using deep learning

Top-30

Journals

Publishers

Are you a researcher?