565 research outputs found
Identifying New Candidate Genes and Chemicals Related to Prostate Cancer Using a Hybrid Network and Shortest Path Approach
Prostate cancer is a type of cancer that occurs in the male prostate, a gland in the male reproductive system. Because prostate cancer cells may spread to other parts of the body and can influence human reproduction, understanding the mechanisms underlying this disease is critical for designing effective treatments. The identification of as many genes and chemicals related to prostate cancer as possible will enhance our understanding of this disease. In this study, we proposed a computational method to identify new candidate genes and chemicals based on currently known genes and chemicals related to prostate cancer by applying a shortest path approach in a hybrid network. The hybrid network was constructed according to information concerning chemical-chemical interactions, chemical-protein interactions, and protein-protein interactions. Many of the obtained genes and chemicals are associated with prostate cancer
Network Analysis of Microarray Data
DNA microarrays are widely used to investigate gene expression. Even though the classical analysis of microarray data is based on the study of differentially expressed genes, it is well known that genes do not act individually. Network analysis can be applied to study association patterns of the genes in a biological system. Moreover, it finds wide application in differential coexpression analysis between different systems. Network based coexpression studies have for example been used in (complex) disease gene prioritization, disease subtyping, and patient stratification.Peer reviewe
Causal reasoning over knowledge graphs leveraging drug-perturbed and disease-specific transcriptomic signatures for drug discovery
Network-based approaches are becoming increasingly popular for drug discovery as they provide a systems-level overview of the mechanisms underlying disease pathophysiology. They have demonstrated significant early promise over other methods of biological data representation, such as in target discovery, side effect prediction and drug repurposing. In parallel, an explosion of -omics data for the deep characterization of biological systems routinely uncovers molecular signatures of disease for similar applications. Here, we present RPath, a novel algorithm that prioritizes drugs for a given disease by reasoning over causal paths in a knowledge graph (KG), guided by both drug-perturbed as well as disease-specific transcriptomic signatures. First, our approach identifies the causal paths that connect a drug to a particular disease. Next, it reasons over these paths to identify those that correlate with the transcriptional signatures observed in a drug-perturbation experiment, and anti-correlate to signatures observed in the disease of interest. The paths which match this signature profile are then proposed to represent the mechanism of action of the drug. We demonstrate how RPath consistently prioritizes clinically investigated drug-disease pairs on multiple datasets and KGs, achieving better performance over other similar methodologies. Furthermore, we present two case studies showing how one can deconvolute the predictions made by RPath as well as predict novel targets.DDF, YG, AP, CWD, BBM, DH, JR, and VC have been funded by Enveda Biosciences. This work has been funded by Enveda Biosciences (https://www.envedabio.com/). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. SM and DRB received no specific funding for this work.Peer ReviewedPostprint (author's final draft
Text and Network Mining for Literature-Based Scientific Discovery in Biomedicine.
Most of the new and important findings in biomedicine are only available in the
text of the published scientific articles. The first goal of this thesis is to design
methods based on natural language processing and machine learning to extract information about genes, proteins, and their interactions from text. We introduce a
dependency tree kernel based relation extraction method to identify the interacting
protein pairs in a sentence. We propose two kernel functions based on cosine similarity and edit distance among the dependency tree paths connecting the protein names.
Using these kernel functions with supervised and semi-supervised machine learning
methods, we report significant improvement (59.96% F-Measure performance over
the AIMED data set) compared to the previous results in the literature. We also
address the problem of distinguishing factual information from speculative information. Unlike previous methods that formulate the problem as a sentence classification
task, we propose a two-step method to identify the speculative fragments of sentences.
First, we use supervised classification to identify the speculation keywords using a
diverse set of linguistic features that represent their contexts. Next, we use the syntactic structures of the sentences to resolve their linguistic scopes. Our results show
that the method is effective in identifying speculative portions of sentences. The
speculation keyword identification results are close to the upper bound of human
inter-annotator agreement.
The second goal of this thesis is to generate new scientific hypotheses using the
literature-mined protein/gene interactions. We propose a literature-based discovery
approach, where we start with a set of genes known to be related to a given concept
and integrate text mining with network centrality analysis to predict novel concept-related genes. We present the application of the proposed approach to two different
problems, namely predicting gene-disease associations and predicting genes that are
important for vaccine development. Our results provide new insights and hypotheses worth future investigations in these domains and show the effectiveness of the
proposed approach for literature-based discovery.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/78956/1/ozgur_1.pd
Drug repurposing using biological networks
Drug repositioning is a strategy to identify new uses for existing, approved, or research drugs that are outside the scope of its original medical indication. Drug repurposing is based on the fact that one drug can act on multiple targets or that two diseases can have molecular similarities, among others. Currently, thanks to the rapid advancement of high-performance technologies, a massive amount of biological and biomedical data is being generated. This allows the use of computational methods and models based on biological networks to develop new possibilities for drug repurposing. Therefore, here, we provide an in-depth review of the main applications of drug repositioning that have been carried out using biological network models. The goal of this review is to show the usefulness of these computational methods to predict associations and to find candidate drugs for repositioning in new indications of certain diseases
Biomedical Information Extraction: Mining Disease Associated Genes from Literature
Disease associated gene discovery is a critical step to realize the future of personalized medicine. However empirical and clinical validation of disease associated genes are time consuming and expensive. In silico discovery of disease associated genes from literature is therefore becoming the first essential step for biomarker discovery to support hypothesis formulation and decision making. Completion of human genome project and advent of high-throughput technology have produced tremendous amount of data, which results in exponential growing of biomedical knowledge deposited in literature database. The sheer quantity of unexplored information causes information overflow for biomedical researchers, and poses big challenge for informatics researchers to address user's information extraction needs. This thesis focused on mining disease associated genes from PubMed literature database using machine learning and graph theory based information extraction (IE) methods. Mining disease associated genes is not trivial and requires pipelines of information extraction steps and methods. Beginning from named entity recognition (NER), the author introduced semantic concept type into feature space for conditional random fields machine learning and demonstrated the effectiveness of the concept feature for disease NER. The effects of domain specific POS tagging, domain specific dictionaries, and named entity encoding scheme on NER performance were also explored. Experimental results show that by combining knowledge base with concept feature space, it can significantly improve the overall disease NER performance. It has also shown that shallow linguistic features of global and local word sequence context can be used with string kernel based supporting vector machine (SVM) for efficient disease-gene relation extraction. Lastly, the disease-associated gene network was constructed by utilizing concept co-occurrence matrix computed from disease focused document collection, and subjected to systematic topology analysis. The gene network was then merged with a seed-gene expanded network to form heterogeneous disease-gene network. The author identified and prioritized disease-associated genes by graph centrality measurements. This novel approach provides a new mean for disease associated gene extraction from large corpora.Ph.D., Information Studies -- Drexel University, 201
Selected Works in Bioinformatics
This book consists of nine chapters covering a variety of bioinformatics subjects, ranging from database resources for protein allergens, unravelling genetic determinants of complex disorders, characterization and prediction of regulatory motifs, computational methods for identifying the best classifiers and key disease genes in large-scale transcriptomic and proteomic experiments, functional characterization of inherently unfolded proteins/regions, protein interaction networks and flexible protein-protein docking. The computational algorithms are in general presented in a way that is accessible to advanced undergraduate students, graduate students and researchers in molecular biology and genetics. The book should also serve as stepping stones for mathematicians, biostatisticians, and computational scientists to cross their academic boundaries into the dynamic and ever-expanding field of bioinformatics
Recommended from our members
An integrated proteomic and metabolomic approach to investigate cerebral ischemic preconditioning
The molecular mechanism that leads to ischemic preconditioning and hence to ischemic tolerance, are not completely understood although it is clear that multiple effectors and pathways contribute to the instauration of this neuroprotective profile. To study the mechanism/pathway involved in the ischemic tolerance, brain proteins, plasma proteins and plasma metabolites were analyzed in preconditioning stimulus (7'Middle Cerebral Artery occlusion or 7'MCAo), in severe stroke and (permanent Middle Cerebral Artery occlusion or pMCAo) and in preconditioned (7'MCAo/pMCAo) mouse model.
A conventional 2-DE approach was used to study technical replicates of pooled brain proteins revealing an involvement of energy metabolism, mitochondrial electron transport, synaptic vesicle transport and antioxidant processes; moreover network analysis suggested an involvement of the androgen receptor that was validated on technical replicates of pooled brain proteins by western blot analysis revealing an increased expression in preconditioned stimulus animals (7'MCAo).
Plasma proteins were analyzed using a i-DE LC-MS/MS approach on technical replicates of pooled plasma proteins revealing decreased levels of epidermal growth factor receptor (EGFR) and increased levels of insuline like growth factor acid labile subunit (IGFALS), which expression was paralleled by increased insulin like growth factor 1 (IGFi) plasma concentration, as validated by ELISA on biological replicates, in preconditioning stimulus animals (7'MCAo).
Finally an untarget metabolomics analysis was applied to technical replicates of pooled plasma proteins revealing fatty acid oxidation and branched-chain aminoacid metabolism as the main biological processes modulated in ischemic tolerance and highlighted an involvement of the aminoacid leucine, carnitine esters and adenosine.
The results described in this thesis represents the first application of both proteomic and metabolomic approaches in cerebral ischemic sets, highlighting the androgen receptor as an important mediator between proteins and metabolites and adding new evidence to the current knowledge on ischemic preconditioning that may represent a starting point for future experiments on investigating candidate pathways that relate to the Androgen receptor
Systems Toxicology: Mining chemical-toxicity signaling paths to enable network medicine
Systems toxicology, a branch of toxicology that studies chemical effects on biological systems, presents exciting knowledge discovery challenges for the information researcher. The exponential increase in availability of genomic and proteomic data in this domain needs to be matched with increasingly sophisticated network analysis approaches. Improved ability to mine complex gene and protein interaction networks may eventually lead to discovery of drugs that target biological sub-networks (‘network medicine’) instead of individual proteins. In this thesis, we have proposed and investigated the use of a maximal edge centrality criterion to discover drug-toxicity signaling paths inside a human protein interaction network. The signaling path detection approach utilizes drug and toxicity information along with two novel edge weighting measures, one based on edge centrality for detected paths and another using differential gene expression between tissues treated with toxicity-inducing drugs and a control set. Drugs known to induce non-immune Neutropenia were analyzed as a test case and common path proteins on discovered signaling paths were evaluated for toxicological significance. In addition to investigating the value of topological connectivity for identification of toxicity biomarkers, the gene expression-based measure led to identification of a proposed biomarker panel for screening new drug candidates. Comparative evaluation of findings from the DTSP approach with standard microarray analysis method showed clear improvements in various performance measures including true positive rate, positive predictive value, negative predictive value and overall accuracy. Comparison of non-immune Neutropenia signaling paths with those discovered for a control set showed increased transcript-level activation of discovered signaling paths for toxicity-inducing drugs. We have demonstrated the scientific value from a systems-based approach for identifying toxicity-related proteins inside complex biological networks. The algorithm should be useful for biomarker identification for any toxicity assuming availability of relevant drug and drug-induced toxicity information.Ph.D., Information Studies -- Drexel University, 201
Systems approaches to drug repositioning
PhD ThesisDrug discovery has overall become less fruitful and more costly, despite vastly increased
biomedical knowledge and evolving approaches to Research and Development (R&D).
One complementary approach to drug discovery is that of drug repositioning which
focusses on identifying novel uses for existing drugs. By focussing on existing drugs
that have already reached the market, drug repositioning has the potential to both
reduce the timeframe and cost of getting a disease treatment to those that need it.
Many marketed examples of repositioned drugs have been found via serendipitous or
rational observations, highlighting the need for more systematic methodologies.
Systems approaches have the potential to enable the development of novel methods to
understand the action of therapeutic compounds, but require an integrative approach
to biological data. Integrated networks can facilitate systems-level analyses by combining
multiple sources of evidence to provide a rich description of drugs, their targets and
their interactions. Classically, such networks can be mined manually where a skilled
person can identify portions of the graph that are indicative of relationships between
drugs and highlight possible repositioning opportunities. However, this approach is
not scalable. Automated procedures are required to mine integrated networks systematically
for these subgraphs and bring them to the attention of the user. The aim
of this project was the development of novel computational methods to identify new
therapeutic uses for existing drugs (with particular focus on active small molecules)
using data integration.
A framework for integrating disparate data relevant to drug repositioning, Drug Repositioning
Network Integration Framework (DReNInF) was developed as part of this
work. This framework includes a high-level ontology, Drug Repositioning Network
Integration Ontology (DReNInO), to aid integration and subsequent mining; a suite
of parsers; and a generic semantic graph integration platform. This framework enables
the production of integrated networks maintaining strict semantics that are important
in, but not exclusive to, drug repositioning. The DReNInF is then used to create Drug Repositioning Network Integration (DReNIn), a semantically-rich Resource Description
Framework (RDF) dataset. A Web-based front end was developed, which includes
a SPARQL Protocol and RDF Query Language (SPARQL) endpoint for querying this
dataset.
To automate the mining of drug repositioning datasets, a formal framework for the
definition of semantic subgraphs was established and a method for Drug Repositioning
Semantic Mining (DReSMin) was developed. DReSMin is an algorithm for mining
semantically-rich networks for occurrences of a given semantic subgraph. This algorithm
allows instances of complex semantic subgraphs that contain data about putative
drug repositioning opportunities to be identified in a computationally tractable
fashion, scaling close to linearly with network data.
The ability of DReSMin to identify novel Drug-Target (D-T) associations was investigated.
9,643,061 putative D-T interactions were identified and ranked, with a strong
correlation between highly scored associations and those supported by literature observed.
The 20 top ranked associations were analysed in more detail with 14 found
to be novel and six found to be supported by the literature. It was also shown that
this approach better prioritises known D-T interactions, than other state-of-the-art
methodologies.
The ability of DReSMin to identify novel Drug-Disease (Dr-D) indications was also
investigated. As target-based approaches are utilised heavily in the field of drug discovery,
it is necessary to have a systematic method to rank Gene-Disease (G-D) associations.
Although methods already exist to collect, integrate and score these associations,
these scores are often not a reliable re
flection of expert knowledge. Therefore, an
integrated data-driven approach to drug repositioning was developed using a Bayesian
statistics approach and applied to rank 309,885 G-D associations using existing knowledge.
Ranked associations were then integrated with other biological data to produce
a semantically-rich drug discovery network. Using this network it was shown that
diseases of the central nervous system (CNS) provide an area of interest. The network
was then systematically mined for semantic subgraphs that capture novel Dr-D relations.
275,934 Dr-D associations were identified and ranked, with those more likely to
be side-effects filtered. Work presented here includes novel tools and algorithms to enable research within
the field of drug repositioning. DReNIn, for example, includes data that previous
comparable datasets relevant to drug repositioning have neglected, such as clinical
trial data and drug indications. Furthermore, the dataset may be easily extended
using DReNInF to include future data as and when it becomes available, such as G-D
association directionality (i.e. is the mutation a loss-of-function or gain-of-function).
Unlike other algorithms and approaches developed for drug repositioning, DReSMin
can be used to infer any types of associations captured in the target semantic network.
Moreover, the approaches presented here should be more generically applicable to
other fields that require algorithms for the integration and mining of semantically rich
networks.European and Physical Sciences Research Council (EPSRC) and GS
- …