Robust Molecular Image Recognition: A Graph Generation Approach

Barzilay, Regina; Coley, Connor W.; Guo, Jiang; Qian, Yujie; Tu, Zhengkai

Robust Molecular Image Recognition: A Graph Generation Approach

Authors: Regina Barzilay
Connor W. Coley
Jiang Guo
Yujie Qian
Zhengkai Tu
Publication date: 27 May 2022
Publisher

Abstract

Molecular image recognition is a fundamental task in information extraction from chemistry literature. Previous data-driven models formulate it as an image-to-sequence task, to generate a sequential representation of the molecule (e.g. SMILES string) from its graphical representation. Although they perform adequately on certain benchmarks, these models are not robust in real-world situations, where molecular images differ in style, quality, and chemical patterns. In this paper, we propose a novel graph generation approach that explicitly predicts atoms and bonds, along with their geometric layouts, to construct the molecular graph. We develop data augmentation strategies for molecules and images to increase the robustness of our model against domain shifts. Our model is flexible to incorporate chemistry constraints, and produces more interpretable predictions than SMILES. In experiments on both synthetic and realistic molecular images, our model significantly outperforms previous models, achieving 84-93% accuracy on five benchmarks. We also conduct human evaluation and show that our model reduces the time for a chemist to extract molecular structures from images by roughly 50%.Comment: 16 pages, 8 figure

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2205.14311

Last time updated on 14/08/2022