4 research outputs found

    Image-based Automated Chemical Database Annotation with Ensemble of Machine-Vision Classifiers

    Full text link
    This paper presents an image-based annotation strategy for automated annotation of chemical databases. The proposed strategy is based on the use of a machine vision-based classifier for extracting a 2D chemical structure diagram in research articles and converting them into standard chemical file formats, a virtual Chemical Expert" system for screening the converted structures based on the level of estimated conversion accuracy, and a fragment-based measure for calculation intermolecular similarity. In particular, in order to overcome limited accuracies of individual machine-vision classifier, inspired by ensemble methods in machine learning, it is attempted to use of the ensemble of machine-vision classifiers. For annotation, calculated chemical similarity between the converted structures and entries in a virtual small molecule database is used to establish the links. Annotation test to link 121 journal articles to entries in PubChem database demonstrates that ensemble approach increases the coverage of annotation, while keeping the annotation quality (e.g., recall and precision rates) comparable to using a single machine-vision classifier.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/87266/4/Saitou55.pd

    Molecular access to multi-dimensionally encoded information

    Get PDF
    Polymer scientist have only recently realized that information storage on the molecular level is not only restricted to DNA-based systems. Similar encoding and decoding of data have been demonstrated on synthetic polymers that could overcome some of the drawbacks associated with DNA, such as the ability to make use of a larger monomer alphabet. This feature article describes some of the recent data storage strategies that were investigated, ranging from writing information on linear sequence-defined macromolecules up to layer-by-layer casted surfaces and QR codes. In addition, some strategies to increase storage density are elaborated and some trends regarding future perspectives on molecular data storage from the literature are critically evaluated. This work ends with highlighting the demand for new strategies setting up reliable solutions for future data management technologies
    corecore