12,112 research outputs found

    Normalization And Matching Of Chemical Compound Names

    Get PDF
    We have developed ChemHits (http://sabio.h-its.org/chemHits/), an application which detects and matches synonymic names of chemical compounds. The tool is based on natural language processing (NLP) methods and applies rules to systematically normalize chemical compound names. Subsequently, matching of synonymous names is achieved by comparison of the normalized name forms. The tool is capable of normalizing a given name of a chemical compound and matching it against names in (bio-)chemical databases, like SABIO-RK, PubChem, ChEBI or KEGG, even when there is no exact name-to-name-match

    Communication and re-use of chemical information in bioscience.

    Get PDF
    The current methods of publishing chemical information in bioscience articles are analysed. Using 3 papers as use-cases, it is shown that conventional methods using human procedures, including cut-and-paste are time-consuming and introduce errors. The meaning of chemical terms and the identity of compounds is often ambiguous. valuable experimental data such as spectra and computational results are almost always omitted. We describe an Open XML architecture at proof-of-concept which addresses these concerns. Compounds are identified through explicit connection tables or links to persistent Open resources such as PubChem. It is argued that if publishers adopt these tools and protocols, then the quality and quantity of chemical information available to bioscientists will increase and the authors, publishers and readers will find the process cost-effective.An article submitted to BiomedCentral Bioinformatics, created on request with their Publicon system. The transformed manuscript is archived as PDF. Although it has been through the publishers system this is purely automatic and the contents are those of a pre-refereed preprint. The formatting is provided by the system and tables and figures appear at the end. An accommpanying submission, http://www.dspace.cam.ac.uk/handle/1810/34580, describes the rationale and cultural aspects of publishing , abstracting and aggregating chemical information. BMC is an Open Access publisher and we emphasize that all content is re-usable under Creative Commons Licens

    Chemistry in Bioinformatics

    Get PDF
    A preprint of an invited submission to BioMedCentral Bioinformatics. This short manuscript is an overview or the current problems and opportunities in publishing chemical information. Full details of technology are given in the sibling manuscript http://www.dspace.cam.ac.uk/handle/1810/34579 The manuscript is the authors' preprint although it has been automatically transformed into this archived PDF by the submission system. The authors are not responsible for the formattingChemical information is now seen as critical for most areas of life sciences. But unlike Bioinformatics, where data is Openly available and freely re−usable, most chemical information is closed and cannot be re−distributed without permission. This has led to a failure to adopt modern informatics and software techniques and therefore paucity of chemistry in bioinformatics. New technology, however, offers the hope of making chemical data (compounds and properties) Free during the authoring process. We argue that the technology is already available; we require a collective agreement to enhance publication protocols

    Annotation guidelines for labeling English-Dutch cognate pairs (version 1.0)

    Get PDF

    RegenBase: a knowledge base of spinal cord injury biology for translational research.

    Get PDF
    Spinal cord injury (SCI) research is a data-rich field that aims to identify the biological mechanisms resulting in loss of function and mobility after SCI, as well as develop therapies that promote recovery after injury. SCI experimental methods, data and domain knowledge are locked in the largely unstructured text of scientific publications, making large scale integration with existing bioinformatics resources and subsequent analysis infeasible. The lack of standard reporting for experiment variables and results also makes experiment replicability a significant challenge. To address these challenges, we have developed RegenBase, a knowledge base of SCI biology. RegenBase integrates curated literature-sourced facts and experimental details, raw assay data profiling the effect of compounds on enzyme activity and cell growth, and structured SCI domain knowledge in the form of the first ontology for SCI, using Semantic Web representation languages and frameworks. RegenBase uses consistent identifier schemes and data representations that enable automated linking among RegenBase statements and also to other biological databases and electronic resources. By querying RegenBase, we have identified novel biological hypotheses linking the effects of perturbagens to observed behavioral outcomes after SCI. RegenBase is publicly available for browsing, querying and download.Database URL:http://regenbase.org

    Representation and use of chemistry in the global electronic age.

    Get PDF
    We present an overview of the current state of public semantic chemistry and propose new approaches at a strategic and a detailed level. We show by example how a model for a Chemical Semantic Web can be constructed using machine-processed data and information from journal articles.This manuscript addresses questions of robotic access to data and its automatic re-use, including the role of Open Access archival of data. This is a pre-refereed preprint allowed by the publisher's (Royal Soc. Chemistry) Green policy. The author's preferred manuscript is an HTML hyperdocument with ca. 20 links to images, some of which are JPEgs and some of which are SVG (scalable vector graphics) including animations. There are also links to molecules in CML, for which the Jmol viewer is recommended. We susgeest that readers who wish to see the full glory of the manuscript, download the Zipped version and unpack on their machine. We also supply a PDF and DOC (Word) version which obviously cannot show the animations, but which may be the best palce to start, particularly for those more interested in the text

    TechMiner: Extracting Technologies from Academic Publications

    Get PDF
    In recent years we have seen the emergence of a variety of scholarly datasets. Typically these capture ‘standard’ scholarly entities and their connections, such as authors, affiliations, venues, publications, citations, and others. However, as the repositories grow and the technology improves, researchers are adding new entities to these repositories to develop a richer model of the scholarly domain. In this paper, we introduce TechMiner, a new approach, which combines NLP, machine learning and semantic technologies, for mining technologies from research publications and generating an OWL ontology describing their relationships with other research entities. The resulting knowledge base can support a number of tasks, such as: richer semantic search, which can exploit the technology dimension to support better retrieval of publications; richer expert search; monitoring the emergence and impact of new technologies, both within and across scientific fields; studying the scholarly dynamics associated with the emergence of new technologies; and others. TechMiner was evaluated on a manually annotated gold standard and the results indicate that it significantly outperforms alternative NLP approaches and that its semantic features improve performance significantly with respect to both recall and precision

    Engineering polymer informatics: Towards the computer-aided design of polymers

    Get PDF
    The computer-aided design of polymers is one of the holy grails of modern chemical informatics and of significant interest for a number of communities in polymer science. The paper outlines a vision for the in silico design of polymers and presents an information model for polymers based on modern semantic web technologies, thus laying the foundations for achieving the vision
    • …
    corecore