102 research outputs found
Fast and accurate semantic annotation of bioassays exploiting a hybrid of machine learning and user confirmation
Bioinformatics and computer aided drug design rely on the curation of a large number of protocols for biological assays that measure the ability of potential drugs to achieve a therapeutic effect. These assay protocols are generally published by scientists in the form of plain text, which needs to be more precisely annotated in order to be useful to software methods. We have developed a pragmatic approach to describing assays according to the semantic definitions of the BioAssay Ontology (BAO) project, using a hybrid of machine learning based on natural language processing, and a simplified user interface designed to help scientists curate their data with minimum effort. We have carried out this work based on the premise that pure machine learning is insufficiently accurate, and that expecting scientists to find the time to annotate their protocols manually is unrealistic. By combining these approaches, we have created an effective prototype for which annotation of bioassay text within the domain of the training set can be accomplished very quickly. Well-trained annotations require single-click user approval, while annotations from outside the training set domain can be identified using the search feature of a well-designed user interface, and subsequently used to improve the underlying models. By drastically reducing the time required for scientists to annotate their assays, we can realistically advocate for semantic annotation to become a standard part of the publication process. Once even a small proportion of the public body of bioassay data is marked up, bioinformatics researchers can begin to construct sophisticated and useful searching and analysis algorithms that will provide a diverse and powerful set of tools for drug discovery researchers
BioAssay Ontology (BAO): a semantic description of bioassays and high-throughput screening results
<p>Abstract</p> <p>Background</p> <p>High-throughput screening (HTS) is one of the main strategies to identify novel entry points for the development of small molecule chemical probes and drugs and is now commonly accessible to public sector research. Large amounts of data generated in HTS campaigns are submitted to public repositories such as PubChem, which is growing at an exponential rate. The diversity and quantity of available HTS assays and screening results pose enormous challenges to organizing, standardizing, integrating, and analyzing the datasets and thus to maximize the scientific and ultimately the public health impact of the huge investments made to implement public sector HTS capabilities. Novel approaches to organize, standardize and access HTS data are required to address these challenges.</p> <p>Results</p> <p>We developed the first ontology to describe HTS experiments and screening results using expressive description logic. The BioAssay Ontology (BAO) serves as a foundation for the standardization of HTS assays and data and as a semantic knowledge model. In this paper we show important examples of formalizing HTS domain knowledge and we point out the advantages of this approach. The ontology is available online at the NCBO bioportal <url>http://bioportal.bioontology.org/ontologies/44531</url>.</p> <p>Conclusions</p> <p>After a large manual curation effort, we loaded BAO-mapped data triples into a RDF database store and used a reasoner in several case studies to demonstrate the benefits of formalized domain knowledge representation in BAO. The examples illustrate semantic querying capabilities where BAO enables the retrieval of inferred search results that are relevant to a given query, but are not explicitly defined. BAO thus opens new functionality for annotating, querying, and analyzing HTS datasets and the potential for discovering new knowledge by means of inference.</p
TIN-X:target importance and novelty explorer
Abstract
Motivation
The increasing amount of peer-reviewed manuscripts requires the development of specific mining tools to facilitate the visual exploration of evidence linking diseases and proteins.
Results
We developed TIN-X, the Target Importance and Novelty eXplorer, to visualize the association between proteins and diseases, based on text mining data processed from scientific literature. In the current implementation, TIN-X supports exploration of data for G-protein coupled receptors, kinases, ion channels, and nuclear receptors. TIN-X supports browsing and navigating across proteins and diseases based on ontology classes, and displays a scatter plot with two proposed new bibliometric statistics: Importance and Novelty.
Availability and Implementation
http://www.newdrugtargets.org
</jats:sec
CLO: The cell line ontology
Abstract
Background
Cell lines have been widely used in biomedical research. The community-based Cell Line Ontology (CLO) is a member of the OBO Foundry library that covers the domain of cell lines. Since its publication two years ago, significant updates have been made, including new groups joining the CLO consortium, new cell line cells, upper level alignment with the Cell Ontology (CL) and the Ontology for Biomedical Investigation, and logical extensions.
Construction and content
Collaboration among the CLO, CL, and OBI has established consensus definitions of cell line-specific terms such as ‘cell line’, ‘cell line cell’, ‘cell line culturing’, and ‘mortal’ vs. ‘immortal cell line cell’. A cell line is a genetically stable cultured cell population that contains individual cell line cells. The hierarchical structure of the CLO is built based on the hierarchy of the in vivo cell types defined in CL and tissue types (from which cell line cells are derived) defined in the UBERON cross-species anatomy ontology. The new hierarchical structure makes it easier to browse, query, and perform automated classification. We have recently added classes representing more than 2,000 cell line cells from the RIKEN BRC Cell Bank to CLO. Overall, the CLO now contains ~38,000 classes of specific cell line cells derived from over 200 in vivo cell types from various organisms.
Utility and discussion
The CLO has been applied to different biomedical research studies. Example case studies include annotation and analysis of EBI ArrayExpress data, bioassays, and host-vaccine/pathogen interaction. CLO’s utility goes beyond a catalogue of cell line types. The alignment of the CLO with related ontologies combined with the use of ontological reasoners will support sophisticated inferencing to advance translational informatics development.http://deepblue.lib.umich.edu/bitstream/2027.42/109554/1/13326_2013_Article_185.pd
Recommended from our members
Utilizing high throughput screening data for predictive toxicology models: protocols and application to MLSCN assays
Computational toxicology is emerging as an encouraging alternative to experimental testing. The Molecular Libraries Screening Center Network (MLSCN) as part of the NIH Molecular Libraries Roadmap has recently started generating large and diverse screening datasets, which are publicly available in PubChem. In this report, we investigate various aspects of developing computational models to predict cell toxicity based on cell proliferation screening data generated in the MLSCN. By capturing feature-based information in those datasets, such predictive models would be useful in evaluating cell-based screening results in general (for example from reporter assays) and could be used as an aid to identify and eliminate potentially undesired compounds. Specifically we present the results of random forest ensemble models developed using different cell proliferation datasets and highlight protocols to take into account their extremely imbalanced nature. Depending on the nature of the datasets and the descriptors employed we were able to achieve percentage correct classification rates between 70% and 85% on the prediction set, though the accuracy rate dropped significantly when the models were applied to in vivo data. In this context we also compare the MLSCN cell proliferation results with animal acute toxicity data to investigate to what extent animal toxicity can be correlated and potentially predicted by proliferation results. Finally, we present a visualization technique that allows one to compare a new dataset to the training set of the models to decide whether the new dataset may be reliably predicted
Knowledge-based Characterization of Similarity Relationships in the Human Protein-Tyrosine Phosphatase Family for Rational Inhibitor Design
Tyrosine phosphorylation, controlled by the coordinated action of protein-tyrosine kinases (PTKs) and protein-tyrosine phosphatases (PTPs), is a fundamental regulatory mechanism of numerous physiological processes. PTPs are implicated in a number of human diseases and their potential as prospective drug targets is increasingly being recognized. Despite their biological importance, until now no comprehensive overview has been reported describing how all members of the human PTP family are related. Here we review the entire human PTP family and present a systematic knowledge-based characterization of global and local similarity relationships, which are relevant for the development of small molecule inhibitors. We use parallel homology modeling to expand the current PTP structure space and analyze the human PTPs based on local three-dimensional catalytic sites and domain sequences. Furthermore, we demonstrate the importance of binding site similarities in understanding cross-reactivity and inhibitor selectivity in the design of small molecule inhibitors
A versatile synthesis of substituted tetrahydropyridines
A short and efficient synthesis of highly substituted tetrahydropyridines is achieved by a combination of enyne cross metathesis and aza-Diels-Alder reaction under high pressure. The reaction sequence shows atom economy and is compatible with a variety of functionalities being introduced by three building blocks: a monosubstituted alkyne, a terminal alkene, and an imine.
Highly substituted tetrahydropyridines are available from three simple building blocks by an efficient combination of enyne cross metathesis and high pressure aza-Diels-Alder reaction. The entire process shows atom economy
Recommended from our members
Sequences of Yne-Ene Cross Metathesis and Diels-Alder Cycloaddition Reactions - Modular Solid Phase Synthesis of Substituted Octahydrobenzazepinones
Recommended from our members
The Clinical Kinase Index (CKI): A user friendly application to prioritize kinases as prospective cancer drug targets
Kinases are among the most established druggable proteins with currently over 50 approved kinase inhibitor drugs, most of them for cancer. However, these drugs only target a small subset of the human kinome and many kinases remain “dark” or “understudied” (Essegian et al. 2020 [1], Oprea et al. 2018 [2]). To improve the utility to evaluate the clinical relevance of kinases, including their potential as novel prospective cancer drug targets, we developed the Clinical Kinase Index (CKI). CKI is an inter interactive web application with harmonized datasets extracted from several resources that allows researchers and clinicians to prioritize and evaluate the clinical relevance of kinases as cancer drug targets across solid.
•CKI is a novel method to prioritize kinases as prospective cancer drug targets.•The CKI App allows to rank kinases as prospective cancer drug targets, explore, visualize, download results.•In addition to the most common cancer drug target, CKI also ranks understudies and “dark” kinases for which no small molecules exist and with limited knowledge about their biological roles and functions.•Expression of understudied kinases in tumors is prognostic of poor outcomes.•Dark kinases are likely clinically relevant cancer targets.•Cancer cell dependency correlates with tumor pathology and survival
Recommended from our members
Ruthenium-Catalyzed Yne-Ene Cross Metathesis Binding to Solid Support and Cleavage by Pd0-Catalysis
- …