102 research outputs found

    Fast and accurate semantic annotation of bioassays exploiting a hybrid of machine learning and user confirmation

    Get PDF
    Bioinformatics and computer aided drug design rely on the curation of a large number of protocols for biological assays that measure the ability of potential drugs to achieve a therapeutic effect. These assay protocols are generally published by scientists in the form of plain text, which needs to be more precisely annotated in order to be useful to software methods. We have developed a pragmatic approach to describing assays according to the semantic definitions of the BioAssay Ontology (BAO) project, using a hybrid of machine learning based on natural language processing, and a simplified user interface designed to help scientists curate their data with minimum effort. We have carried out this work based on the premise that pure machine learning is insufficiently accurate, and that expecting scientists to find the time to annotate their protocols manually is unrealistic. By combining these approaches, we have created an effective prototype for which annotation of bioassay text within the domain of the training set can be accomplished very quickly. Well-trained annotations require single-click user approval, while annotations from outside the training set domain can be identified using the search feature of a well-designed user interface, and subsequently used to improve the underlying models. By drastically reducing the time required for scientists to annotate their assays, we can realistically advocate for semantic annotation to become a standard part of the publication process. Once even a small proportion of the public body of bioassay data is marked up, bioinformatics researchers can begin to construct sophisticated and useful searching and analysis algorithms that will provide a diverse and powerful set of tools for drug discovery researchers

    BioAssay Ontology (BAO): a semantic description of bioassays and high-throughput screening results

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>High-throughput screening (HTS) is one of the main strategies to identify novel entry points for the development of small molecule chemical probes and drugs and is now commonly accessible to public sector research. Large amounts of data generated in HTS campaigns are submitted to public repositories such as PubChem, which is growing at an exponential rate. The diversity and quantity of available HTS assays and screening results pose enormous challenges to organizing, standardizing, integrating, and analyzing the datasets and thus to maximize the scientific and ultimately the public health impact of the huge investments made to implement public sector HTS capabilities. Novel approaches to organize, standardize and access HTS data are required to address these challenges.</p> <p>Results</p> <p>We developed the first ontology to describe HTS experiments and screening results using expressive description logic. The BioAssay Ontology (BAO) serves as a foundation for the standardization of HTS assays and data and as a semantic knowledge model. In this paper we show important examples of formalizing HTS domain knowledge and we point out the advantages of this approach. The ontology is available online at the NCBO bioportal <url>http://bioportal.bioontology.org/ontologies/44531</url>.</p> <p>Conclusions</p> <p>After a large manual curation effort, we loaded BAO-mapped data triples into a RDF database store and used a reasoner in several case studies to demonstrate the benefits of formalized domain knowledge representation in BAO. The examples illustrate semantic querying capabilities where BAO enables the retrieval of inferred search results that are relevant to a given query, but are not explicitly defined. BAO thus opens new functionality for annotating, querying, and analyzing HTS datasets and the potential for discovering new knowledge by means of inference.</p

    TIN-X:target importance and novelty explorer

    Get PDF
    Abstract Motivation The increasing amount of peer-reviewed manuscripts requires the development of specific mining tools to facilitate the visual exploration of evidence linking diseases and proteins. Results We developed TIN-X, the Target Importance and Novelty eXplorer, to visualize the association between proteins and diseases, based on text mining data processed from scientific literature. In the current implementation, TIN-X supports exploration of data for G-protein coupled receptors, kinases, ion channels, and nuclear receptors. TIN-X supports browsing and navigating across proteins and diseases based on ontology classes, and displays a scatter plot with two proposed new bibliometric statistics: Importance and Novelty. Availability and Implementation http://www.newdrugtargets.org </jats:sec

    CLO: The cell line ontology

    Get PDF
    Abstract Background Cell lines have been widely used in biomedical research. The community-based Cell Line Ontology (CLO) is a member of the OBO Foundry library that covers the domain of cell lines. Since its publication two years ago, significant updates have been made, including new groups joining the CLO consortium, new cell line cells, upper level alignment with the Cell Ontology (CL) and the Ontology for Biomedical Investigation, and logical extensions. Construction and content Collaboration among the CLO, CL, and OBI has established consensus definitions of cell line-specific terms such as ‘cell line’, ‘cell line cell’, ‘cell line culturing’, and ‘mortal’ vs. ‘immortal cell line cell’. A cell line is a genetically stable cultured cell population that contains individual cell line cells. The hierarchical structure of the CLO is built based on the hierarchy of the in vivo cell types defined in CL and tissue types (from which cell line cells are derived) defined in the UBERON cross-species anatomy ontology. The new hierarchical structure makes it easier to browse, query, and perform automated classification. We have recently added classes representing more than 2,000 cell line cells from the RIKEN BRC Cell Bank to CLO. Overall, the CLO now contains ~38,000 classes of specific cell line cells derived from over 200 in vivo cell types from various organisms. Utility and discussion The CLO has been applied to different biomedical research studies. Example case studies include annotation and analysis of EBI ArrayExpress data, bioassays, and host-vaccine/pathogen interaction. CLO’s utility goes beyond a catalogue of cell line types. The alignment of the CLO with related ontologies combined with the use of ontological reasoners will support sophisticated inferencing to advance translational informatics development.http://deepblue.lib.umich.edu/bitstream/2027.42/109554/1/13326_2013_Article_185.pd

    Knowledge-based Characterization of Similarity Relationships in the Human Protein-Tyrosine Phosphatase Family for Rational Inhibitor Design

    No full text
    Tyrosine phosphorylation, controlled by the coordinated action of protein-tyrosine kinases (PTKs) and protein-tyrosine phosphatases (PTPs), is a fundamental regulatory mechanism of numerous physiological processes. PTPs are implicated in a number of human diseases and their potential as prospective drug targets is increasingly being recognized. Despite their biological importance, until now no comprehensive overview has been reported describing how all members of the human PTP family are related. Here we review the entire human PTP family and present a systematic knowledge-based characterization of global and local similarity relationships, which are relevant for the development of small molecule inhibitors. We use parallel homology modeling to expand the current PTP structure space and analyze the human PTPs based on local three-dimensional catalytic sites and domain sequences. Furthermore, we demonstrate the importance of binding site similarities in understanding cross-reactivity and inhibitor selectivity in the design of small molecule inhibitors

    A versatile synthesis of substituted tetrahydropyridines

    No full text
    A short and efficient synthesis of highly substituted tetrahydropyridines is achieved by a combination of enyne cross metathesis and aza-Diels-Alder reaction under high pressure. The reaction sequence shows atom economy and is compatible with a variety of functionalities being introduced by three building blocks: a monosubstituted alkyne, a terminal alkene, and an imine. Highly substituted tetrahydropyridines are available from three simple building blocks by an efficient combination of enyne cross metathesis and high pressure aza-Diels-Alder reaction. The entire process shows atom economy
    corecore