250 research outputs found

    Mining the biomedical literature to predict shared drug targets in drugbank

    Get PDF
    The current drug development pipelines are characterised by long processes with high attrition rates and elevated costs. More than 80% of new compounds fail in the later stages of testing due to severe side-effects caused by unknown biomolecular targets of the compounds. In this work, we present a measure that can predict shared targets for drugs in DrugBank through large scale analysis of the biomedical literature. We show that using MeSH ontology terms can accurately describe the drugs and that appropriate use of the MeSH ontological structure can determine pairwise drug similarity.Sociedad Argentina de Informática e Investigación Operativa (SADIO

    Mining the biomedical literature to predict shared drug targets in drugbank

    Get PDF
    The current drug development pipelines are characterised by long processes with high attrition rates and elevated costs. More than 80% of new compounds fail in the later stages of testing due to severe side-effects caused by unknown biomolecular targets of the compounds. In this work, we present a measure that can predict shared targets for drugs in DrugBank through large scale analysis of the biomedical literature. We show that using MeSH ontology terms can accurately describe the drugs and that appropriate use of the MeSH ontological structure can determine pairwise drug similarity.Sociedad Argentina de Informática e Investigación Operativa (SADIO

    Sitagliptin: a potential drug for the treatment of COVID-19?

    Get PDF
    Recently, an outbreak of a fatal coronavirus, SARS-CoV-2, has emerged from China and is rapidly spreading worldwide. Possible interaction of SARS-CoV-2 with DPP4 peptidase may partly contribute to the viral pathogenesis. An integrative bioinformatics approach starting with mining the biomedical literature for high confidence DPP4-protein/gene associations followed by functional analysis using network analysis and pathway enrichment was adopted. The results indicate that the identified DPP4 networks are highly enriched in viral processes required for viral entry and infection, and as a result, we propose DPP4 as an important putative target for the treatment of COVID-19. Additionally, our protein-chemical interaction networks identified important interactions between DPP4 and sitagliptin. We conclude that sitagliptin may be beneficial for the treatment of COVID-19 disease, either as monotherapy or in combination with other therapies, especially for diabetic patients and patients with pre-existing cardiovascular conditions who are already at higher risk of COVID-19 mortality

    Text Analytics for Android Project

    Get PDF
    Most advanced text analytics and text mining tasks include text classification, text clustering, building ontology, concept/entity extraction, summarization, deriving patterns within the structured data, production of granular taxonomies, sentiment and emotion analysis, document summarization, entity relation modelling, interpretation of the output. Already existing text analytics and text mining cannot develop text material alternatives (perform a multivariant design), perform multiple criteria analysis, automatically select the most effective variant according to different aspects (citation index of papers (Scopus, ScienceDirect, Google Scholar) and authors (Scopus, ScienceDirect, Google Scholar), Top 25 papers, impact factor of journals, supporting phrases, document name and contents, density of keywords), calculate utility degree and market value. However, the Text Analytics for Android Project can perform the aforementioned functions. To the best of the knowledge herein, these functions have not been previously implemented; thus this is the first attempt to do so. The Text Analytics for Android Project is briefly described in this article

    Conceptual biology, hypothesis discovery, and text mining: Swanson's legacy

    Get PDF
    Innovative biomedical librarians and information specialists who want to expand their roles as expert searchers need to know about profound changes in biology and parallel trends in text mining. In recent years, conceptual biology has emerged as a complement to empirical biology. This is partly in response to the availability of massive digital resources such as the network of databases for molecular biologists at the National Center for Biotechnology Information. Developments in text mining and hypothesis discovery systems based on the early work of Swanson, a mathematician and information scientist, are coincident with the emergence of conceptual biology. Very little has been written to introduce biomedical digital librarians to these new trends. In this paper, background for data and text mining, as well as for knowledge discovery in databases (KDD) and in text (KDT) is presented, then a brief review of Swanson's ideas, followed by a discussion of recent approaches to hypothesis discovery and testing. 'Testing' in the context of text mining involves partially automated methods for finding evidence in the literature to support hypothetical relationships. Concluding remarks follow regarding (a) the limits of current strategies for evaluation of hypothesis discovery systems and (b) the role of literature-based discovery in concert with empirical research. Report of an informatics-driven literature review for biomarkers of systemic lupus erythematosus is mentioned. Swanson's vision of the hidden value in the literature of science and, by extension, in biomedical digital databases, is still remarkably generative for information scientists, biologists, and physicians. © 2006Bekhuis; licensee BioMed Central Ltd

    Semi-supervised prediction of protein interaction sentences exploiting semantically encoded metrics

    Get PDF
    Protein-protein interaction (PPI) identification is an integral component of many biomedical research and database curation tools. Automation of this task through classification is one of the key goals of text mining (TM). However, labelled PPI corpora required to train classifiers are generally small. In order to overcome this sparsity in the training data, we propose a novel method of integrating corpora that do not contain relevance judgements. Our approach uses a semantic language model to gather word similarity from a large unlabelled corpus. This additional information is integrated into the sentence classification process using kernel transformations and has a re-weighting effect on the training features that leads to an 8% improvement in F-score over the baseline results. Furthermore, we discover that some words which are generally considered indicative of interactions are actually neutralised by this process

    Classification of protein interaction sentences via gaussian processes

    Get PDF
    The increase in the availability of protein interaction studies in textual format coupled with the demand for easier access to the key results has lead to a need for text mining solutions. In the text processing pipeline, classification is a key step for extraction of small sections of relevant text. Consequently, for the task of locating protein-protein interaction sentences, we examine the use of a classifier which has rarely been applied to text, the Gaussian processes (GPs). GPs are a non-parametric probabilistic analogue to the more popular support vector machines (SVMs). We find that GPs outperform the SVM and na\"ive Bayes classifiers on binary sentence data, whilst showing equivalent performance on abstract and multiclass sentence corpora. In addition, the lack of the margin parameter, which requires costly tuning, along with the principled multiclass extensions enabled by the probabilistic framework make GPs an appealing alternative worth of further adoption

    Extraction of Transcript Diversity from Scientific Literature

    Get PDF
    Transcript diversity generated by alternative splicing and associated mechanisms contributes heavily to the functional complexity of biological systems. The numerous examples of the mechanisms and functional implications of these events are scattered throughout the scientific literature. Thus, it is crucial to have a tool that can automatically extract the relevant facts and collect them in a knowledge base that can aid the interpretation of data from high-throughput methods. We have developed and applied a composite text-mining method for extracting information on transcript diversity from the entire MEDLINE database in order to create a database of genes with alternative transcripts. It contains information on tissue specificity, number of isoforms, causative mechanisms, functional implications, and experimental methods used for detection. We have mined this resource to identify 959 instances of tissue-specific splicing. Our results in combination with those from EST-based methods suggest that alternative splicing is the preferred mechanism for generating transcript diversity in the nervous system. We provide new annotations for 1,860 genes with the potential for generating transcript diversity. We assign the MeSH term “alternative splicing” to 1,536 additional abstracts in the MEDLINE database and suggest new MeSH terms for other events. We have successfully extracted information about transcript diversity and semiautomatically generated a database, LSAT, that can provide a quantitative understanding of the mechanisms behind tissue-specific gene expression. LSAT (Literature Support for Alternative Transcripts) is publicly available at http://www.bork.embl.de/LSAT/

    Data Management Roles for Librarians

    Get PDF
    In this Chapter:● Looking at data through different lenses● Exploring the range of data use and data support ● Using data as the basis for informed decision making ● Treating data as a legitimate scholarly research produc
    corecore