565 research outputs found

    Identifying New Candidate Genes and Chemicals Related to Prostate Cancer Using a Hybrid Network and Shortest Path Approach

    Get PDF
    Prostate cancer is a type of cancer that occurs in the male prostate, a gland in the male reproductive system. Because prostate cancer cells may spread to other parts of the body and can influence human reproduction, understanding the mechanisms underlying this disease is critical for designing effective treatments. The identification of as many genes and chemicals related to prostate cancer as possible will enhance our understanding of this disease. In this study, we proposed a computational method to identify new candidate genes and chemicals based on currently known genes and chemicals related to prostate cancer by applying a shortest path approach in a hybrid network. The hybrid network was constructed according to information concerning chemical-chemical interactions, chemical-protein interactions, and protein-protein interactions. Many of the obtained genes and chemicals are associated with prostate cancer

    Network Analysis of Microarray Data

    Get PDF
    DNA microarrays are widely used to investigate gene expression. Even though the classical analysis of microarray data is based on the study of differentially expressed genes, it is well known that genes do not act individually. Network analysis can be applied to study association patterns of the genes in a biological system. Moreover, it finds wide application in differential coexpression analysis between different systems. Network based coexpression studies have for example been used in (complex) disease gene prioritization, disease subtyping, and patient stratification.Peer reviewe

    Causal reasoning over knowledge graphs leveraging drug-perturbed and disease-specific transcriptomic signatures for drug discovery

    Get PDF
    Network-based approaches are becoming increasingly popular for drug discovery as they provide a systems-level overview of the mechanisms underlying disease pathophysiology. They have demonstrated significant early promise over other methods of biological data representation, such as in target discovery, side effect prediction and drug repurposing. In parallel, an explosion of -omics data for the deep characterization of biological systems routinely uncovers molecular signatures of disease for similar applications. Here, we present RPath, a novel algorithm that prioritizes drugs for a given disease by reasoning over causal paths in a knowledge graph (KG), guided by both drug-perturbed as well as disease-specific transcriptomic signatures. First, our approach identifies the causal paths that connect a drug to a particular disease. Next, it reasons over these paths to identify those that correlate with the transcriptional signatures observed in a drug-perturbation experiment, and anti-correlate to signatures observed in the disease of interest. The paths which match this signature profile are then proposed to represent the mechanism of action of the drug. We demonstrate how RPath consistently prioritizes clinically investigated drug-disease pairs on multiple datasets and KGs, achieving better performance over other similar methodologies. Furthermore, we present two case studies showing how one can deconvolute the predictions made by RPath as well as predict novel targets.DDF, YG, AP, CWD, BBM, DH, JR, and VC have been funded by Enveda Biosciences. This work has been funded by Enveda Biosciences (https://www.envedabio.com/). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. SM and DRB received no specific funding for this work.Peer ReviewedPostprint (author's final draft

    Text and Network Mining for Literature-Based Scientific Discovery in Biomedicine.

    Full text link
    Most of the new and important findings in biomedicine are only available in the text of the published scientific articles. The first goal of this thesis is to design methods based on natural language processing and machine learning to extract information about genes, proteins, and their interactions from text. We introduce a dependency tree kernel based relation extraction method to identify the interacting protein pairs in a sentence. We propose two kernel functions based on cosine similarity and edit distance among the dependency tree paths connecting the protein names. Using these kernel functions with supervised and semi-supervised machine learning methods, we report significant improvement (59.96% F-Measure performance over the AIMED data set) compared to the previous results in the literature. We also address the problem of distinguishing factual information from speculative information. Unlike previous methods that formulate the problem as a sentence classification task, we propose a two-step method to identify the speculative fragments of sentences. First, we use supervised classification to identify the speculation keywords using a diverse set of linguistic features that represent their contexts. Next, we use the syntactic structures of the sentences to resolve their linguistic scopes. Our results show that the method is effective in identifying speculative portions of sentences. The speculation keyword identification results are close to the upper bound of human inter-annotator agreement. The second goal of this thesis is to generate new scientific hypotheses using the literature-mined protein/gene interactions. We propose a literature-based discovery approach, where we start with a set of genes known to be related to a given concept and integrate text mining with network centrality analysis to predict novel concept-related genes. We present the application of the proposed approach to two different problems, namely predicting gene-disease associations and predicting genes that are important for vaccine development. Our results provide new insights and hypotheses worth future investigations in these domains and show the effectiveness of the proposed approach for literature-based discovery.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/78956/1/ozgur_1.pd

    Drug repurposing using biological networks

    Get PDF
    Drug repositioning is a strategy to identify new uses for existing, approved, or research drugs that are outside the scope of its original medical indication. Drug repurposing is based on the fact that one drug can act on multiple targets or that two diseases can have molecular similarities, among others. Currently, thanks to the rapid advancement of high-performance technologies, a massive amount of biological and biomedical data is being generated. This allows the use of computational methods and models based on biological networks to develop new possibilities for drug repurposing. Therefore, here, we provide an in-depth review of the main applications of drug repositioning that have been carried out using biological network models. The goal of this review is to show the usefulness of these computational methods to predict associations and to find candidate drugs for repositioning in new indications of certain diseases

    Biomedical Information Extraction: Mining Disease Associated Genes from Literature

    Get PDF
    Disease associated gene discovery is a critical step to realize the future of personalized medicine. However empirical and clinical validation of disease associated genes are time consuming and expensive. In silico discovery of disease associated genes from literature is therefore becoming the first essential step for biomarker discovery to support hypothesis formulation and decision making. Completion of human genome project and advent of high-throughput technology have produced tremendous amount of data, which results in exponential growing of biomedical knowledge deposited in literature database. The sheer quantity of unexplored information causes information overflow for biomedical researchers, and poses big challenge for informatics researchers to address user's information extraction needs. This thesis focused on mining disease associated genes from PubMed literature database using machine learning and graph theory based information extraction (IE) methods. Mining disease associated genes is not trivial and requires pipelines of information extraction steps and methods. Beginning from named entity recognition (NER), the author introduced semantic concept type into feature space for conditional random fields machine learning and demonstrated the effectiveness of the concept feature for disease NER. The effects of domain specific POS tagging, domain specific dictionaries, and named entity encoding scheme on NER performance were also explored. Experimental results show that by combining knowledge base with concept feature space, it can significantly improve the overall disease NER performance. It has also shown that shallow linguistic features of global and local word sequence context can be used with string kernel based supporting vector machine (SVM) for efficient disease-gene relation extraction. Lastly, the disease-associated gene network was constructed by utilizing concept co-occurrence matrix computed from disease focused document collection, and subjected to systematic topology analysis. The gene network was then merged with a seed-gene expanded network to form heterogeneous disease-gene network. The author identified and prioritized disease-associated genes by graph centrality measurements. This novel approach provides a new mean for disease associated gene extraction from large corpora.Ph.D., Information Studies -- Drexel University, 201

    Selected Works in Bioinformatics

    Get PDF
    This book consists of nine chapters covering a variety of bioinformatics subjects, ranging from database resources for protein allergens, unravelling genetic determinants of complex disorders, characterization and prediction of regulatory motifs, computational methods for identifying the best classifiers and key disease genes in large-scale transcriptomic and proteomic experiments, functional characterization of inherently unfolded proteins/regions, protein interaction networks and flexible protein-protein docking. The computational algorithms are in general presented in a way that is accessible to advanced undergraduate students, graduate students and researchers in molecular biology and genetics. The book should also serve as stepping stones for mathematicians, biostatisticians, and computational scientists to cross their academic boundaries into the dynamic and ever-expanding field of bioinformatics

    Systems Toxicology: Mining chemical-toxicity signaling paths to enable network medicine

    Get PDF
    Systems toxicology, a branch of toxicology that studies chemical effects on biological systems, presents exciting knowledge discovery challenges for the information researcher. The exponential increase in availability of genomic and proteomic data in this domain needs to be matched with increasingly sophisticated network analysis approaches. Improved ability to mine complex gene and protein interaction networks may eventually lead to discovery of drugs that target biological sub-networks (‘network medicine’) instead of individual proteins. In this thesis, we have proposed and investigated the use of a maximal edge centrality criterion to discover drug-toxicity signaling paths inside a human protein interaction network. The signaling path detection approach utilizes drug and toxicity information along with two novel edge weighting measures, one based on edge centrality for detected paths and another using differential gene expression between tissues treated with toxicity-inducing drugs and a control set. Drugs known to induce non-immune Neutropenia were analyzed as a test case and common path proteins on discovered signaling paths were evaluated for toxicological significance. In addition to investigating the value of topological connectivity for identification of toxicity biomarkers, the gene expression-based measure led to identification of a proposed biomarker panel for screening new drug candidates. Comparative evaluation of findings from the DTSP approach with standard microarray analysis method showed clear improvements in various performance measures including true positive rate, positive predictive value, negative predictive value and overall accuracy. Comparison of non-immune Neutropenia signaling paths with those discovered for a control set showed increased transcript-level activation of discovered signaling paths for toxicity-inducing drugs. We have demonstrated the scientific value from a systems-based approach for identifying toxicity-related proteins inside complex biological networks. The algorithm should be useful for biomarker identification for any toxicity assuming availability of relevant drug and drug-induced toxicity information.Ph.D., Information Studies -- Drexel University, 201

    Systems approaches to drug repositioning

    Get PDF
    PhD ThesisDrug discovery has overall become less fruitful and more costly, despite vastly increased biomedical knowledge and evolving approaches to Research and Development (R&D). One complementary approach to drug discovery is that of drug repositioning which focusses on identifying novel uses for existing drugs. By focussing on existing drugs that have already reached the market, drug repositioning has the potential to both reduce the timeframe and cost of getting a disease treatment to those that need it. Many marketed examples of repositioned drugs have been found via serendipitous or rational observations, highlighting the need for more systematic methodologies. Systems approaches have the potential to enable the development of novel methods to understand the action of therapeutic compounds, but require an integrative approach to biological data. Integrated networks can facilitate systems-level analyses by combining multiple sources of evidence to provide a rich description of drugs, their targets and their interactions. Classically, such networks can be mined manually where a skilled person can identify portions of the graph that are indicative of relationships between drugs and highlight possible repositioning opportunities. However, this approach is not scalable. Automated procedures are required to mine integrated networks systematically for these subgraphs and bring them to the attention of the user. The aim of this project was the development of novel computational methods to identify new therapeutic uses for existing drugs (with particular focus on active small molecules) using data integration. A framework for integrating disparate data relevant to drug repositioning, Drug Repositioning Network Integration Framework (DReNInF) was developed as part of this work. This framework includes a high-level ontology, Drug Repositioning Network Integration Ontology (DReNInO), to aid integration and subsequent mining; a suite of parsers; and a generic semantic graph integration platform. This framework enables the production of integrated networks maintaining strict semantics that are important in, but not exclusive to, drug repositioning. The DReNInF is then used to create Drug Repositioning Network Integration (DReNIn), a semantically-rich Resource Description Framework (RDF) dataset. A Web-based front end was developed, which includes a SPARQL Protocol and RDF Query Language (SPARQL) endpoint for querying this dataset. To automate the mining of drug repositioning datasets, a formal framework for the definition of semantic subgraphs was established and a method for Drug Repositioning Semantic Mining (DReSMin) was developed. DReSMin is an algorithm for mining semantically-rich networks for occurrences of a given semantic subgraph. This algorithm allows instances of complex semantic subgraphs that contain data about putative drug repositioning opportunities to be identified in a computationally tractable fashion, scaling close to linearly with network data. The ability of DReSMin to identify novel Drug-Target (D-T) associations was investigated. 9,643,061 putative D-T interactions were identified and ranked, with a strong correlation between highly scored associations and those supported by literature observed. The 20 top ranked associations were analysed in more detail with 14 found to be novel and six found to be supported by the literature. It was also shown that this approach better prioritises known D-T interactions, than other state-of-the-art methodologies. The ability of DReSMin to identify novel Drug-Disease (Dr-D) indications was also investigated. As target-based approaches are utilised heavily in the field of drug discovery, it is necessary to have a systematic method to rank Gene-Disease (G-D) associations. Although methods already exist to collect, integrate and score these associations, these scores are often not a reliable re flection of expert knowledge. Therefore, an integrated data-driven approach to drug repositioning was developed using a Bayesian statistics approach and applied to rank 309,885 G-D associations using existing knowledge. Ranked associations were then integrated with other biological data to produce a semantically-rich drug discovery network. Using this network it was shown that diseases of the central nervous system (CNS) provide an area of interest. The network was then systematically mined for semantic subgraphs that capture novel Dr-D relations. 275,934 Dr-D associations were identified and ranked, with those more likely to be side-effects filtered. Work presented here includes novel tools and algorithms to enable research within the field of drug repositioning. DReNIn, for example, includes data that previous comparable datasets relevant to drug repositioning have neglected, such as clinical trial data and drug indications. Furthermore, the dataset may be easily extended using DReNInF to include future data as and when it becomes available, such as G-D association directionality (i.e. is the mutation a loss-of-function or gain-of-function). Unlike other algorithms and approaches developed for drug repositioning, DReSMin can be used to infer any types of associations captured in the target semantic network. Moreover, the approaches presented here should be more generically applicable to other fields that require algorithms for the integration and mining of semantically rich networks.European and Physical Sciences Research Council (EPSRC) and GS
    • …
    corecore