243 research outputs found

    A toxicogenomic data space for system-level understanding and prediction of EDC-induced toxicity.

    Get PDF
    Endocrine disrupting compounds (EDCs) are a persistent threat to humans and wildlife due to their ability to interfere with endocrine signaling pathways. Inspired by previous work to improve chemical hazard identification through the use of toxicogenomics data, we developed a genomic-oriented data space for profiling the molecular activity of EDCs in an in silico manner, and for creating predictive models that identify and prioritize EDCs. Predictive models of EDCs, derived from gene expression data from rats (in vivo and in vitro primary hepatocytes) and humans (in vitro primary hepatocytes and HepG2), achieve testing accuracy greater than 90%. Negative test sets indicate that known safer chemicals are not predicted as EDCs. The rat in vivo-based classifiers achieve accuracy greater than 75% when tested for invitro to in vivoextrapolation. This study reveals key metabolic pathways and genes affected by EDCs together with a set of predictive models that utilize these pathways to prioritize EDCs in dose/time dependent manner and to predict EDCevokedmetabolic diseases

    Use of Text Data in Identifying and Prioritizing Potential Drug Repositioning Candidates

    Get PDF
    New drug development costs between 500 million and 2 billion dollars and takes 10-15 years, with a success rate of less than 10%. Drug repurposing (defined as discovering new indications for existing drugs) could play a significant role in drug development, especially considering the declining success rates of developing novel drugs. In the period 2007-2009, drug repurposing led to the launching of 30-40% of new drugs. Typically, new indications for existing medications are identified by accident. However, new technologies and a large number of available resources enable the development of systematic approaches to identify and validate drug-repurposing candidates with significantly lower cost. A variety of resources have been utilized to identify novel drug repurposing candidates such as biomedical literature, clinical notes, and genetic data. In this dissertation, we focused on using text data in identifying and prioritizing drug repositioning candidates and conducted five studies. In the first study, we aimed to assess the feasibility of using patient reviews from social media to identify potential candidates for drug repurposing. We retrieved patient reviews of 180 medications from an online forum, WebMD. Using dictionary-based and machine learning approaches, we identified disease names in the reviews. Several publicly available resources were used to exclude comments containing known indications and adverse drug effects. After manually reviewing some of the remaining comments, we implemented a rule-based system to identify beneficial effects. The dictionary-based system and machine learning system identified 2178 and 6171 disease names respectively in 64,616 patient comments. We provided a list of 10 common patterns that patients used to report any beneficial effects or uses of medication. After manually reviewing the comments tagged by our rule-based system, we identified five potential drug repurposing candidates. To our knowledge, this was the first study to consider using social media data to identify drug-repurposing candidates. We found that even a rule-based system, with a limited number of rules, could identify beneficial effect mentions in the comments of patients. Our preliminary study shows that social media has the potential to be used in drug repurposing. In the second study, we investigated the significance of extracting information from multiple sentences specifically in the context of drug-disease relation discovery. We used multiple resources such as Semantic Medline, a literature-based resource, and Medline search (for filtering spurious results) and inferred 8,772 potential drug-disease pairs. Our analysis revealed that 6,450 (73.5%) of the 8,772 potential drug-disease relations did not occur in a single sentence. Moreover, only 537 of the drug-disease pairs matched the curated gold standard in the Comparative Toxicogenomics Database (CTD), a trusted resource for drug-disease relations. Among the 537, nearly 75% (407) of the drug-disease pairs occur in multiple sentences. Our analysis revealed that the drug-disease pairs inferred from Semantic Medline or retrieved from CTD could be extracted from multiple sentences in the literature. This highlights the significance of the need for discourse-level analysis in extracting the relations from biomedical literature. In the third and fourth study, we focused on prioritizing drug repositioning candidates extracted from biomedical literature which we refer to as Literature-Based Discovery (LBD). In the third study, we used drug-gene and gene-disease semantic predications extracted from Medline abstracts to generate a list of potential drug-disease pairs. We further ranked the generated pairs, by assigning scores based on the predicates that qualify drug-gene and gene-disease relationships. On comparing the top-ranked drug-disease pairs against the Comparative Toxicogenomics Database, we found that a significant percentage of top-ranked pairs appeared in CTD. Co-occurrence of these high-ranked pairs in Medline abstracts is then used to improve the rankings of the inferred drug-disease relations. Finally, manual evaluation of the top-ten pairs ranked by our approach revealed that nine of them have good potential for biological significance based on expert judgment. In the fourth study, we proposed a method, utilizing information surrounding causal findings, to prioritize discoveries generated by LBD systems. We focused on discovering drug-disease relations, which have the potential to identify drug repositioning candidates or adverse drug reactions. Our LBD system used drug-gene and gene-disease semantic predication in SemMedDB as causal findings and Swanson’s ABC model to generate potential drug-disease relations. Using sentences, as a source of causal findings, our ranking method trained a binary classifier to classify generated drug-disease relations into desired classes. We trained and tested our classifier for three different purposes: a) drug repositioning b) adverse drug-event detection and c) drug-disease relation detection. The classifier obtained 0.78, 0.86, and 0.83 F-measures respectively for these tasks. The number of causal findings of each hypothesis, which were classified as positive by the classifier, is the main metric for ranking hypotheses in the proposed method. To evaluate the ranking method, we counted and compared the number of true relations in the top 100 pairs, ranked by our method and one of the previous methods. Out of 181 true relations in the test dataset, the proposed method ranked 20 of them in the top 100 relations while this number was 13 for the other method. In the last study, we used biomedical literature and clinical trials in ranking potential drug repositioning candidates identified by Phenome-Wide Association Studies (PheWAS). Unlike previous approaches, in this study, we did not limit our method to LBD. First, we generated a list of potential drug repositioning candidates using PheWAS. We retrieved 212,851 gene-disease associations from PheWAS catalog and 14,169 gene-drug relationships from DrugBank. Following Swanson’s model, we generated 52,966 potential drug repositioning candidates. Then, we developed an information retrieval system to retrieve any evidence of those candidates co-occurring in the biomedical literature and clinical trials. We identified nearly 14,800 drug-disease pairs with some evidence of support. In addition, we identified more than 38,000 novel candidates for re-purposing, encompassing hundreds of different disease states and over 1,000 individual medications. We anticipate that these results will be highly useful for hypothesis generation in the field of drug repurposing

    Transcriptomics in Toxicogenomics, Part III : Data Modelling for Risk Assessment

    Get PDF
    Transcriptomics data are relevant to address a number of challenges in Toxicogenomics (TGx). After careful planning of exposure conditions and data preprocessing, the TGx data can be used in predictive toxicology, where more advanced modelling techniques are applied. The large volume of molecular profiles produced by omics-based technologies allows the development and application of artificial intelligence (AI) methods in TGx. Indeed, the publicly available omics datasets are constantly increasing together with a plethora of different methods that are made available to facilitate their analysis, interpretation and the generation of accurate and stable predictive models. In this review, we present the state-of-the-art of data modelling applied to transcriptomics data in TGx. We show how the benchmark dose (BMD) analysis can be applied to TGx data. We review read across and adverse outcome pathways (AOP) modelling methodologies. We discuss how network-based approaches can be successfully employed to clarify the mechanism of action (MOA) or specific biomarkers of exposure. We also describe the main AI methodologies applied to TGx data to create predictive classification and regression models and we address current challenges. Finally, we present a short description of deep learning (DL) and data integration methodologies applied in these contexts. Modelling of TGx data represents a valuable tool for more accurate chemical safety assessment. This review is the third part of a three-article series on Transcriptomics in Toxicogenomics.Peer reviewe

    Transcriptomics in Toxicogenomics, Part III: Data Modelling for Risk Assessment

    Get PDF
    Transcriptomics data are relevant to address a number of challenges in Toxicogenomics (TGx). After careful planning of exposure conditions and data preprocessing, the TGx data can be used in predictive toxicology, where more advanced modelling techniques are applied. The large volume of molecular profiles produced by omics-based technologies allows the development and application of artificial intelligence (AI) methods in TGx. Indeed, the publicly available omics datasets are constantly increasing together with a plethora of different methods that are made available to facilitate their analysis, interpretation and the generation of accurate and stable predictive models. In this review, we present the state-of-the-art of data modelling applied to transcriptomics data in TGx. We show how the benchmark dose (BMD) analysis can be applied to TGx data. We review read across and adverse outcome pathways (AOP) modelling methodologies. We discuss how network-based approaches can be successfully employed to clarify the mechanism of action (MOA) or specific biomarkers of exposure. We also describe the main AI methodologies applied to TGx data to create predictive classification and regression models and we address current challenges. Finally, we present a short description of deep learning (DL) and data integration methodologies applied in these contexts. Modelling of TGx data represents a valuable tool for more accurate chemical safety assessment. This review is the third part of a three-article series on Transcriptomics in Toxicogenomics

    Contextual Analysis of Large-Scale Biomedical Associations for the Elucidation and Prioritization of Genes and their Roles in Complex Disease

    Get PDF
    Vast amounts of biomedical associations are easily accessible in public resources, spanning gene-disease associations, tissue-specific gene expression, gene function and pathway annotations, and many other data types. Despite this mass of data, information most relevant to the study of a particular disease remains loosely coupled and difficult to incorporate into ongoing research. Current public databases are difficult to navigate and do not interoperate well due to the plethora of interfaces and varying biomedical concept identifiers used. Because no coherent display of data within a specific problem domain is available, finding the latent relationships associated with a disease of interest is impractical. This research describes a method for extracting the contextual relationships embedded within associations relevant to a disease of interest. After applying the method to a small test data set, a large-scale integrated association network is constructed for application of a network propagation technique that helps uncover more distant latent relationships. Together these methods are adept at uncovering highly relevant relationships without any a priori knowledge of the disease of interest. The combined contextual search and relevance methods power a tool which makes pertinent biomedical associations easier to find, easier to assimilate into ongoing work, and more prominent than currently available databases. Increasing the accessibility of current information is an important component to understanding high-throughput experimental results and surviving the data deluge

    Gene2DisCo : gene to disease using disease commonalities

    Get PDF
    OBJECTIVE: Finding the human genes co-causing complex diseases, also known as "disease-genes", is one of the emerging and challenging tasks in biomedicine. This process, termed gene prioritization (GP), is characterized by a scarcity of known disease-genes for most diseases, and by a vast amount of heterogeneous data, usually encoded into networks describing different types of functional relationships between genes. In addition, different diseases may share common profiles (e.g. genetic or therapeutic profiles), and exploiting disease commonalities may significantly enhance the performance of GP methods. This work aims to provide a systematic comparison of several disease similarity measures, and to embed disease similarities and heterogeneous data into a flexible framework for gene prioritization which specifically handles the lack of known disease-genes. METHODS: We present a novel network-based method, Gene2DisCo, based on generalized linear models (GLMs) to effectively prioritize genes by exploiting data regarding disease-genes, gene interaction networks and disease similarities. The scarcity of disease-genes is addressed by applying an efficient negative selection procedure, together with imbalance-aware GLMs. Gene2DisCo is a flexible framework, in the sense it is not dependent upon specific types of data, and/or upon specific disease ontologies. RESULTS: On a benchmark dataset composed of nine human networks and 708 medical subject headings (MeSH) diseases, Gene2DisCo largely outperformed the best benchmark algorithm, kernelized score functions, in terms of both area under the ROC curve (0.94 against 0.86) and precision at given recall levels (for recall levels from 0.1 to 1 with steps 0.1). Furthermore, we enriched and extended the benchmark data to the whole human genome and provided the top-ranked unannotated candidate genes even for MeSH disease terms without known annotations

    Recent advances in biocuration: Meeting report from the Fifth International Biocuration Conference

    Get PDF
    The 5th International Biocuration Conference brought together over 300 scientists to exchange on their work, as well as discuss issues relevant to the International Society for Biocuration’s (ISB) mission. Recurring themes this year included the creation and promotion of gold standards, the need for more ontologies, and more formal interactions with journals. The conference is an essential part of the ISB\u27s goal to support exchanges among members of the biocuration community. Next year\u27s conference will be held in Cambridge, UK, from 7 to 10 April 2013. In the meanwhile, the ISB website provides information about the society\u27s activities (http://biocurator.org), as well as related events of interest
    • …
    corecore