282 research outputs found

    Global text mining and development of pharmacogenomic knowledge resource for precision medicine

    Get PDF
    Understanding patients' genomic variations and their effect in protecting or predisposing them to drug response phenotypes is important for providing personalized healthcare. Several studies have manually curated such genotype-phenotype relationships into organized databases from clinical trial data or published literature. However, there are no text mining tools available to extract high-accuracy information from such existing knowledge. In this work, we used a semiautomated text mining approach to retrieve a complete pharmacogenomic (PGx) resource integrating disease-drug-gene-polymorphism relationships to derive a global perspective for ease in therapeutic approaches. We used an R package, pubmed.mineR, to automatically retrieve PGx-related literature. We identified 1,753 disease types, and 666 drugs, associated with 4,132 genes and 33,942 polymorphisms collated from 180,088 publications. With further manual curation, we obtained a total of 2,304 PGx relationships. We evaluated our approach by performance (precision = 0.806) with benchmark datasets like Pharmacogenomic Knowledgebase (PharmGKB) (0.904), Online Mendelian Inheritance in Man (OMIM) (0.600), and The Comparative Toxicogenomics Database (CTD) (0.729). We validated our study by comparing our results with 362 commercially used the US- Food and drug administration (FDA)-approved drug labeling biomarkers. Of the 2,304 PGx relationships identified, 127 belonged to the FDA list of 362 approved pharmacogenomic markers, indicating that our semiautomated text mining approach may reveal significant PGx information with markers for drug response prediction. In addition, it is a scalable and state-of-art approach in curation for PGx clinical utility

    Precision medicine and future of cancer treatment

    Get PDF
    Over the last few decades, there has been a deluge in the production of large-scale biological data mainly due to the advances in high-throughput technology. This initiated a paradigm shift on the focus in medical research. Ability to investigate molecular changes over the whole genome provided a unique opportunity in the field of translational research. This also gave rise to the concept of precision medicine which provided a strong hope for the development of better diagnostic and therapeutic tools. This is especially relevant to cancer as its incidence is increasing throughout the world. The purpose of this article is to review tools and applications of precision medicine in cancer

    BMC Bioinformatics

    Get PDF
    BackgroundSystems approaches to studying drug-side-effect (drug-SE) associations are emerging as an active research area for both drug target discovery and drug repositioning. However, a comprehensive drug-SE association knowledge base does not exist. In this study, we present a novel knowledge-driven (KD) approach to effectively extract a large number of drug-SE pairs from published biomedical literature.Data and methodsFor the text corpus, we used 21,354,075 MEDLINE records (119,085,682 sentences). First, we used known drug-SE associations derived from FDA drug labels as prior knowledge to automatically find SE-related sentences and abstracts. We then extracted a total of 49,575 drug-SE pairs from MEDLINE sentences and 180,454 pairs from abstracts.ResultsOn average, the KD approach has achieved a precision of 0.335, a recall of 0.509, and an F1 of 0.392, which is significantly better than a SVM-based machine learning approach (precision: 0.135, recall: 0.900, F1: 0.233) with a 73.0% increase in F1 score. Through integrative analysis, we demonstrate that the higher-level phenotypic drug-SE relationships reflects lower-level genetic, genomic, and chemical drug mechanisms. In addition, we show that the extracted drug-SE pairs can be directly used in drug repositioning.ConclusionIn summary, we automatically constructed a large-scale higher-level drug phenotype relationship knowledge, which can have great potential in computational drug discovery.DP2HD084068/DP/NCCDPHP CDC HHS/United StatesR25 CA094186-06/CA/NCI NIH HHS/United StatesUL1 RR024989/RR/NCRR NIH HHS/United States25860223PMC440259

    Clinical Document Classification Using Labeled and Unlabeled Data Across Hospitals

    Full text link
    Reviewing radiology reports in emergency departments is an essential but laborious task. Timely follow-up of patients with abnormal cases in their radiology reports may dramatically affect the patient's outcome, especially if they have been discharged with a different initial diagnosis. Machine learning approaches have been devised to expedite the process and detect the cases that demand instant follow up. However, these approaches require a large amount of labeled data to train reliable predictive models. Preparing such a large dataset, which needs to be manually annotated by health professionals, is costly and time-consuming. This paper investigates a semi-supervised learning framework for radiology report classification across three hospitals. The main goal is to leverage clinical unlabeled data in order to augment the learning process where limited labeled data is available. To further improve the classification performance, we also integrate a transfer learning technique into the semi-supervised learning pipeline . Our experimental findings show that (1) convolutional neural networks (CNNs), while being independent of any problem-specific feature engineering, achieve significantly higher effectiveness compared to conventional supervised learning approaches, (2) leveraging unlabeled data in training a CNN-based classifier reduces the dependency on labeled data by more than 50% to reach the same performance of a fully supervised CNN, and (3) transferring the knowledge gained from available labeled data in an external source hospital significantly improves the performance of a semi-supervised CNN model over their fully supervised counterparts in a target hospital

    Knowledge-based Biomedical Data Science 2019

    Full text link
    Knowledge-based biomedical data science (KBDS) involves the design and implementation of computer systems that act as if they knew about biomedicine. Such systems depend on formally represented knowledge in computer systems, often in the form of knowledge graphs. Here we survey the progress in the last year in systems that use formally represented knowledge to address data science problems in both clinical and biological domains, as well as on approaches for creating knowledge graphs. Major themes include the relationships between knowledge graphs and machine learning, the use of natural language processing, and the expansion of knowledge-based approaches to novel domains, such as Chinese Traditional Medicine and biodiversity.Comment: Manuscript 43 pages with 3 tables; Supplemental material 43 pages with 3 table

    The Pharmacoepigenomics Informatics Pipeline and H-GREEN Hi-C Compiler: Discovering Pharmacogenomic Variants and Pathways with the Epigenome and Spatial Genome

    Full text link
    Over the last decade, biomedical science has been transformed by the epigenome and spatial genome, but the discipline of pharmacogenomics, the study of the genetic underpinnings of pharmacological phenotypes like drug response and adverse events, has not. Scientists have begun to use omics atlases of increasing depth, and inferences relating to the bidirectional causal relationship between the spatial epigenome and gene expression, as a foundational underpinning for genetics research. The epigenome and spatial genome are increasingly used to discover causative regulatory variants in the significance regions of genome-wide association studies, for the discovery of the biological mechanisms underlying these phenotypes and the design of genetic tests to predict them. Such variants often have more predictive power than coding variants, but in the area of pharmacogenomics, such advances have been radically underapplied. The majority of pharmacogenomics tests are designed manually on the basis of mechanistic work with coding variants in candidate genes, and where genome wide approaches are used, they are typically not interpreted with the epigenome. This work describes a series of analyses of pharmacogenomics association studies with the tools and datasets of the epigenome and spatial genome, undertaken with the intent of discovering causative regulatory variants to enable new genetic tests. It describes the potent regulatory variants discovered thereby to have a putative causative and predictive role in a number of medically important phenotypes, including analgesia and the treatment of depression, bipolar disorder, and traumatic brain injury with opiates, anxiolytics, antidepressants, lithium, and valproate, and in particular the tendency for such variants to cluster into spatially interacting, conceptually unified pathways which offer mechanistic insight into these phenotypes. It describes the Pharmacoepigenomics Informatics Pipeline (PIP), an integrative multiple omics variant discovery pipeline designed to make this kind of analysis easier and cheaper to perform, more reproducible, and amenable to the addition of advanced features. It described the successes of the PIP in rediscovering manually discovered gene networks for lithium response, as well as discovering a previously unknown genetic basis for warfarin response in anticoagulation therapy. It describes the H-GREEN Hi-C compiler, which was designed to analyze spatial genome data and discover the distant target genes of such regulatory variants, and its success in discovering spatial contacts not detectable by preceding methods and using them to build spatial contact networks that unite disparate TADs with phenotypic relationships. It describes a potential featureset of a future pipeline, using the latest epigenome research and the lessons of the previous pipeline. It describes my thinking about how to use the output of a multiple omics variant pipeline to design genetic tests that also incorporate clinical data. And it concludes by describing a long term vision for a comprehensive pharmacophenomic atlas, to be constructed by applying a variant pipeline and machine learning test design system, such as is described, to thousands of phenotypes in parallel. Scientists struggled to assay genotypes for the better part of a century, and in the last twenty years, succeeded. The struggle to predict phenotypes on the basis of the genotypes we assay remains ongoing. The use of multiple omics variant pipelines and machine learning models with omics atlases, genetic association, and medical records data will be an increasingly significant part of that struggle for the foreseeable future.PHDBioinformaticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/145835/1/ariallyn_1.pd

    Strategies for the intelligent integration of genetic variance information in multiscale models of neurodegenerative diseases

    Get PDF
    A more complete understanding of the genetic architecture of complex traits and diseases can maximize the utility of human genetics in disease screening, diagnosis, prognosis, and therapy. Undoubtedly, the identification of genetic variants linked to polygenic and complex diseases is of supreme interest for clinicians, geneticists, patients, and the public. Furthermore, determining how genetic variants affect an individual’s health and transmuting this knowledge into the development of new medicine can revolutionize the treatment of most common deleterious diseases. However, this requires the correlation of genetic variants with specific diseases, and accurate functional assessment of genetic variation in human DNA sequencing studies is still a nontrivial challenge in clinical genomics. Assigning functional consequences and clinical significances to genetic variants is an important step in human genome interpretation. The translation of the genetic variants into functional molecular mechanisms is essential in disease pathogenesis and, eventually in therapy design. Although various statistical methods are helpful to short-list the genetic variants for fine-mapping investigation, demonstrating their role in molecular mechanism requires knowledge of functional consequences. This undoubtedly requires comprehensive investigation. Experimental interpretation of all the observed genetic variants is still impractical. Thus, the prediction of functional and regulatory consequences of the genetic variants using in-silico approaches is an important step in the discovery of clinically actionable knowledge. Since the interactions between phenotypes and genotypes are multi-layered and biologically complex. Such associations present several challenges and simultaneously offer many opportunities to design new protocols for in-silico variant evaluation strategies. This thesis presents a comprehensive protocol based on a causal reasoning algorithm that harvests and integrates multifaceted genetic and biomedical knowledge with various types of entities from several resources and repositories to understand how genetic variants perturb molecular interaction, and initiate a disease mechanism. Firstly, as a case study of genetic susceptibility loci of Alzheimer’s disease, I reviewed and summarized all the existing methodologies for Genome Wide Association Studies (GWAS) interpretation, currently available algorithms, and computable modelling approaches. In addition, I formulated a new approach for modelling and simulations of genetic regulatory networks as an extension of the syntax of the Biological Expression Language (OpenBEL). This could allow the representation of genetic variation information in cause-and-effect models to predict the functional consequences of disease-associated genetic variants. Secondly, by using the new syntax of OpenBEL, I generated an OpenBEL model for Alzheimer´s Disease (AD) together with genetic variants including their DNA, RNA or protein position, variant type and associated allele. To better understand the role of genetic variants in a disease context, I subsequently tried to predict the consequences of genetic variation based on the functional context provided by the network model. I further explained that how genetic variation information could help to identify candidate molecular mechanisms for aetiologically complex diseases such as Alzheimer’s disease (AD) and Parkinson’s disease (PD). Though integration of genetic variation information can enhance the evidence base for shared pathophysiology pathways in complex diseases, I have addressed to one of the key questions, namely the role of shared genetic variants to initiate shared molecular mechanisms between neurodegenerative diseases. I systematically analysed shared genetic variation information of AD and PD and mapped them to find shared molecular aetiology between neurodegenerative diseases. My methodology highlighted that a comprehensive understanding of genetic variation needs integration and analysis of all omics data, in order to build a joint model to capture all datasets concurrently. Moreover genomic loci should be considered to investigate the effects of GWAS variants rather than an individual genetic variant, which is hard to predict in a biologically complex molecular mechanism, predominantly to investigate shared pathology

    Artificial Intelligence in Oncology Drug Discovery and Development

    Get PDF
    There exists a profound conflict at the heart of oncology drug development. The efficiency of the drug development process is falling, leading to higher costs per approved drug, at the same time personalised medicine is limiting the target market of each new medicine. Even as the global economic burden of cancer increases, the current paradigm in drug development is unsustainable. In this book, we discuss the development of techniques in machine learning for improving the efficiency of oncology drug development and delivering cost-effective precision treatment. We consider how to structure data for drug repurposing and target identification, how to improve clinical trials and how patients may view artificial intelligence
    corecore