MolRec at CLEF 2012 — Overview and Analysis of Results
Abstract. We present the results and analysis of our chemical struc-ture recognition system, MolRec, in the CLEF 2012 chemical structure recognition task. MolRec analyses a diagram image, extracts vectorised components from the image and applies a rule based system to construct an internal representation of the chemical structure. This internal repre-sentation can then be exported to MOL or SMILE format. The task assigned in CLEF was to analyse two sets of chemical diagram images clipped from patent documents. The first set is of 965 diagram im-ages, the results of which could be evaluated automatically using Open-Babel. The second set is a more challenging collection of 95 images which include elements not supported by OpenBabel and which therefore have to be evaluated manually. On the first set, MolRec achieved recognition rates of between 94.91 % and 96.18 % over 4 runs with slightly different parameters. On the more exacting second set, MolRec’s recognition rate was between 46.32 % and 58.95%. Overall the results testified to high performance on a large sample of quite complex diagrams but also to the challenges posed by the more difficult images that appear in real patent documents.