183 research outputs found

    Large–scale data–driven network analysis of human–plasmodium falciparum interactome: extracting essential targets and processes for malaria drug discovery

    Get PDF
    Background: Plasmodium falciparum malaria is an infectious disease considered to have great impact on public health due to its associated high mortality rates especially in sub Saharan Africa. Falciparum drugresistant strains, notably, to chloroquine and sulfadoxine-pyrimethamine in Africa is traced mainly to Southeast Asia where artemisinin resistance rate is increasing. Although careful surveillance to monitor the emergence and spread of artemisinin-resistant parasite strains in Africa is on-going, research into new drugs, particularly, for African populations, is critical since there is no replaceable drug for artemisinin combination therapies (ACTs) yet. Objective: The overall objective of this study is to identify potential protein targets through host–pathogen protein–protein functional interaction network analysis to understand the underlying mechanisms of drug failure and identify those essential targets that can play their role in predicting potential drug candidates specific to the African populations through a protein-based approach of both host and Plasmodium falciparum genomic analysis. Methods: We leveraged malaria-specific genome wide association study summary statistics data obtained from Gambia, Kenya and Malawi populations, Plasmodium falciparum selective pressure variants and functional datasets (protein sequences, interologs, host-pathogen intra-organism and host-pathogen inter-organism protein-protein interactions (PPIs)) from various sources (STRING, Reactome, HPID, Uniprot, IntAct and literature) to construct overlapping functional network for both host and pathogen. Developed algorithms and a large-scale data-driven computational framework were used in this study to analyze the datasets and the constructed networks to identify densely connected subnetworks or hubs essential for network stability and integrity. The host-pathogen network was analyzed to elucidate the influence of parasite candidate key proteins within the network and predict possible resistant pathways due to host-pathogen candidate key protein interactions. We performed biological and pathway enrichment analysis on critical proteins identified to elucidate their functions. In order to leverage disease-target-drug relationships to identify potential repurposable already approved drug candidates that could be used to treat malaria, pharmaceutical datasets from drug bank were explored using semantic similarity approach based of target–associated biological processes Results: About 600,000 significant SNPs (p-value< 0.05) from the summary statistics data were mapped to their associated genes, and we identified 79 human-associated malaria genes. The assembled parasite network comprised of 8 clusters containing 799 functional interactions between 155 reviewed proteins of which 5 clusters contained 43 key proteins (selective variants) and 2 clusters contained 2 candidate key proteins(key proteins characterized by high centrality measure), C6KTB7 and C6KTD2. The human network comprised of 32 clusters containing 4,133,136 interactions between 20,329 unique reviewed proteins of which 7 clusters contained 760 key proteins and 2 clusters contained 6 significant human malaria-associated candidate key proteins or genes P22301 (IL10), P05362 (ICAM1), P01375 (TNF), P30480 (HLA-B), P16284 (PECAM1), O00206 (TLR4). The generated host-pathogen network comprised of 31,512 functional interactions between 8,023 host and pathogen proteins. We also explored the association of pfk13 gene within the host-pathogen. We observed that pfk13 cluster with host kelch–like proteins and other regulatory genes but no direct association with our identified host candidate key malaria targets. We implemented semantic similarity based approach complemented by Kappa and Jaccard statistical measure to identify 115 malaria–similar diseases and 26 potential repurposable drug hits that can be 3 appropriated experimentally for malaria treatment. Conclusion: In this study, we reviewed existing antimalarial drugs and resistance–associated variants contributing to the diminished sensitivity of antimalarials, especially chloroquine, sulfadoxine-pyrimethamine and artemisinin combination therapy within the African population. We also described various computational techniques implemented in predicting drug targets and leads in drug research. In our data analysis, we showed that possible mechanisms of resistance to artemisinin in Africa may arise from the combinatorial effects of many resistant genes to chloroquine and sulfadoxine–pyrimethamine. We investigated the role of pfk13 within the host–pathogen network. We predicted key targets that have been proposed to be essential for malaria drug and vaccine development through structural and functional analysis of host and pathogen function networks. Based on our analysis, we propose these targets as essential co-targets for combinatorial malaria drug discovery

    Discovering lesser known molecular players and mechanistic patterns in Alzheimer's disease using an integrative disease modelling approach

    Get PDF
    Convergence of exponentially advancing technologies is driving medical research with life changing discoveries. On the contrary, repeated failures of high-profile drugs to battle Alzheimer's disease (AD) has made it one of the least successful therapeutic area. This failure pattern has provoked researchers to grapple with their beliefs about Alzheimer's aetiology. Thus, growing realisation that Amyloid-β and tau are not 'the' but rather 'one of the' factors necessitates the reassessment of pre-existing data to add new perspectives. To enable a holistic view of the disease, integrative modelling approaches are emerging as a powerful technique. Combining data at different scales and modes could considerably increase the predictive power of the integrative model by filling biological knowledge gaps. However, the reliability of the derived hypotheses largely depends on the completeness, quality, consistency, and context-specificity of the data. Thus, there is a need for agile methods and approaches that efficiently interrogate and utilise existing public data. This thesis presents the development of novel approaches and methods that address intrinsic issues of data integration and analysis in AD research. It aims to prioritise lesser-known AD candidates using highly curated and precise knowledge derived from integrated data. Here much of the emphasis is put on quality, reliability, and context-specificity. This thesis work showcases the benefit of integrating well-curated and disease-specific heterogeneous data in a semantic web-based framework for mining actionable knowledge. Furthermore, it introduces to the challenges encountered while harvesting information from literature and transcriptomic resources. State-of-the-art text-mining methodology is developed to extract miRNAs and its regulatory role in diseases and genes from the biomedical literature. To enable meta-analysis of biologically related transcriptomic data, a highly-curated metadata database has been developed, which explicates annotations specific to human and animal models. Finally, to corroborate common mechanistic patterns — embedded with novel candidates — across large-scale AD transcriptomic data, a new approach to generate gene regulatory networks has been developed. The work presented here has demonstrated its capability in identifying testable mechanistic hypotheses containing previously unknown or emerging knowledge from public data in two major publicly funded projects for Alzheimer's, Parkinson's and Epilepsy diseases

    Bi-(N-) cluster editing and its biomedical applications

    Get PDF
    The extremely fast advances in wet-lab techniques lead to an exponential growth of heterogeneous and unstructured biological data, posing a great challenge to data integration in nowadays system biology. The traditional clustering approach, although widely used to divide the data into groups sharing common features, is less powerful in the analysis of heterogeneous data from n different sources (n _ 2). The co-clustering approach has been widely used for combined analyses of multiple networks to address the challenge of heterogeneity. In this thesis, novel methods for the co-clustering of large scale heterogeneous data sets are presented in the software package n-CluE: one exact algorithm and two heuristic algorithms based on the model of bi-/n-cluster editing by modeling the input as n-partite graphs and solving the clustering problem with various strategies. In the first part of the thesis, the complexity and the fixed-parameter tractability of the extended bicluster editing model with relaxed constraints are investigated, namely the ?-bicluster editing model and its NP-hardness is proven. Based on the results of this analysis, three strategies within the n-CluE software package are then established and discussed, together with the evaluations on performances and the systematic comparisons against other algorithms of the same type in solving bi-/n-cluster editing problem. To demonstrate the practical impact, three real-world analyses using n-CluE are performed, including (a) prediction of novel genotype-phenotype associations by clustering the data from Genome-Wide Association Studies; (b) comparison between n-CluE and eight other biclustering tools on GEO Omnibus microarray data sets; (c) drug repositioning predictions by co-clustering on drug, gene and disease networks. The outstanding performance of n-CluE in the real-world applications shows its strength and flexibility in integrating heterogeneous data and extracting biological relevant information in bioinformatic analyses.Die enormen Fortschritte im Bereich Labortechnik haben in jüngster Zeit zu einer exponentiell wachsenden Menge an heterogenen und unstrukturierten Daten geführt. Dies stellt eine große Herausforderung für systembiologische Forschung dar, innerhalb derer diese Datenmengen durch Datenintegration und Datamining zusammengefasst und in Kombination analysiert werden. Traditionelles Clustering ist eine vielseitig eingesetzte Methode, um Entitäten innerhalb grosser Datenmengen bezüglich ihrer Ähnlichkeit bestimmter Attribute zu gruppieren (“clustern„). Beim Clustern von heterogenen Daten aus n (n > 2) unterschiedlichen Quellen zeigen traditionelle Clusteringmethoden jedoch Schwächen. In solchen Fällen bieten Co-clusteringmethoden dadurch Vorteile, dass sie Datensätze gleichzeitig partitionieren können. In dieser Dissertation stelle ich neue Clusteringmethoden vor, die in der Software n-CluE zusammengeführt sind. Diese neuen Methoden wurden aus dem bi-/n-cluster editing heraus entwickelt und lösen durch Transformation der Eingangsdatensätze in n-partite Graphen mit verschiedenen Strategien das zugrundeliegende Clusteringproblem. Diese Dissertation ist in zwei verschiedene Teile gegliedert. Der erste Teil befasst sich eingehend mit der Komplexitätanalyse verschiedener erweiterter bicluster editing Modelle, die sog. ?-bicluster editing Modelle und es wird der Beweis der NP-Schwere erbracht. Basierend auf diesen theoretischen Gesichtspunkten präsentiere ich im zweiten Teil drei unterschiedliche Algorithmen, einen exakten Algorithmus und zwei Heuristiken und demonstriere ihre Leistungsfähigkeit und Robustheit im Vergleich mit anderen algorithmischen Herangehensweisen. Die Stärken von n-CluE werden anhand von drei realen Anwendungsbeispielen untermauert: (a) Die Vorhersage neuartiger Genotyp-Phänotyp-Assoziationen durch Biclustering-Analyse von Daten aus genomweiten Assoziationsstudien (GWAS);(b) Der Vergleich zwischen n-CluE und acht weiteren Softwarepaketen anhand von Bicluster-Analysen von Microarraydaten aus den Gene Expression Omnibus (GEO); (c) Die Vorhersage von Medikamenten-Repositionierung durch integrierte Analyse von Medikamenten-, Gen- und Krankeitsnetzwerken. Die Resultate zeigen eindrucksvoll die Stärken der n-CluE Software. Das Ergebnis ist eine leistungsstarke, robuste und flexibel erweiterbare Implementierung des Biclustering-Theorems zur Integration grosser heterogener Datenmengen für das Extrahieren biologisch relevanter Ergebnisse im Rahmen von bioinformatischen Studien

    Prediction and visualization of the carcinogenic potential of chemicals with short-term omics assays

    Get PDF
    Drug candidates that induce or promote cancer formation must be identified and eliminated during the preclinical phase of drug development to minimize the risk of adverse, carcinogenic effects in patients. Genotoxic carcinogens can be identified with short-term assays. In contrast, the lifetime rodent cancer bioassay that is used to identify nongenotoxic carcinogenic substances, requires a high number of test animals and takes up to five years for completion. In addition, the lifetime rodent cancer bioassay does not provide sufficient data to evaluate the human risk if carcinogenic effects are observed in rodents. This can result in discontinuation of the development of the drug candidate or a black label warning on the drug packaging. The application of high-throughput omics methods such as transcriptomics or proteomics in toxicological studies is a promising approach for the development of short-term alternatives to the lifetime rodent cancer bioassay. However, these omics methods are difficult to use for life sciences researchers and few specialized visualization tools exist for toxicogenomics data. Furthermore, most existing studies used only a single omics platform to determine the molecular effects of carcinogens. This thesis introduces new approaches that integrate multiple omics platforms for the identification of nongenotoxic carcinogens and presents analysis and visualization tools that were specifically developed for toxicogenomics data. We performed a series of experiments to demonstrate that our multi-omics approach improves the prediction performance compared to single-omics approaches. To facilitate the access to our analysis and visualization tools, we implemented two web platforms, the ZBIT Bioinformatics Toolbox and MARCARviz. These web platforms enable toxicologists to gain new insights into the mechanisms of nongenotoxic tumor promotion. Furthermore, we demonstrated that our multi-omics approach can provide the basis of new short-term alternatives to the lifetime rodent cancer bioassay.Arzneimittelkandidaten die die Entstehung und das Wachstum von Tumoren begünstigen, müssen in der präklinischen Phase der Medikamentenentwicklung identifiziert und aus der weiteren Entwicklung ausgeschlossen werden, um das Risiko von gefährlichen, tumorfördernden Nebenwirkungen für Patienten zu minimieren. Während genotoxische Substanzen mit Schnelltests identifiziert werden können, dauert das aktuelle Standardprüfverfahren zur Erkennung von nicht-genotoxischen, karzinogenen Substanzen bis zu fünf Jahre und benötigt eine große Anzahl an Versuchstieren. Außerdem können aus dem Ergebnis keine Hinweise auf den Mechanismus gezogen werden wenn bei der Prüfung Tumore gefunden werden, was zur Einstellung der Entwicklung des Arzneimittelkandidaten oder zu einer Black-Box-Warnung auf der Verpackung führen kann. Die Anwendung von modernen Hochdurchsatz-Technologien in toxikologische Studien, Toxikogenomik genannt, ist ein vielversprechender Ansatz zur Entwicklung von Prüfverfahren, die weniger Zeit und Versuchstiere benötigen. Allerdings sind die Methoden aus der Toxikogenomik für Toxikologen oft schwierig anzuwenden. Außerdem berücksichtigten die meisten existierenden Studien nur Daten einer einzelnen omics-Technologie und es existieren nur wenige spezialisierte Visualisierungswerkzeuge für toxikogenomische Daten. Diese Arbeit stellt neue Analyse- und Visualisierungswerkzeuge vor, die spezifisch für toxikogenomische Studien entwickelt wurden, sowie integrative Ansätze, die es ermöglichen Daten von mehreren omics-Plattformen zu berücksichtigen, um die Identifikation von nicht-genotoxischen Karzinogenen zu verbessern. Wir beschreiben eine Reihe von Experimenten mit einem neuen Toxikogenomikdatensatz, um zu demonstrieren, dass unsere integrativen Ansätze die Vorhersage der Karzinogenität von Substanzen verbessern. Die Weiterentwicklung der von uns beschriebenen integrativen Verfahren bietet möglicherweise Alternativen zu dem aktuell verwendeten, zeitaufwändigen Verfahren zur Feststellung der Karzinogenität. Außerdem beschreiben wir neue Webplattformen zur Analyse und Visualisierung von Expressionsdaten aus der Toxikogenomik, die wir entwickelt haben, um Toxikologen den Zugang zu bioinformatischen Werkzeugen zu vereinfachen. Mit diesen neuen Webplattformen können Toxikologen neue Erkenntnisse über die Wirkmechanismen der nicht-genotoxischen Krebsentstehung gewinnen

    Immune-Mediated Drug Induced Liver Injury: A Multidisciplinary Approach

    Get PDF
    This thesis presents an approach to expose relationships between immune mediated drug induced liver injury (IMDILI) and the three-dimensional structural features of toxic drug molecules and their metabolites. The series of analyses test the hypothesis that drugs which produce similar patterns of toxicity interact with targets within common toxicological pathways and that activation of the underlying mechanisms depends on structural similarity among toxic molecules. Spontaneous adverse drug reaction (ADR) reports were used to identify cases of IMDILI. Network map tools were used to compare the known and predicted protein interactions with each of the probe drugs to explore the interactions that are common between the drugs. The IMDILI probe set was then used to develop a pharmacophore model which became the starting point for identifying potential toxicity targets for IMDILI. Pharmacophore screening results demonstrated similarities between the probe IMDILI set of drugs and Toll-Like Receptor 7 (TLR7) agonists, suggesting TLR7 as a potential toxicity target. This thesis highlights the potential for multidisciplinary approaches in the study of complex diseases. Such approaches are particularly helpful for rare diseases where little knowledge is available, and may provide key insights into mechanisms of toxicity that cannot be gleaned from a single disciplinary study

    Immune-Mediated Drug Induced Liver Injury: A Multidisciplinary Approach

    Get PDF
    This thesis presents an approach to expose relationships between immune mediated drug induced liver injury (IMDILI) and the three-dimensional structural features of toxic drug molecules and their metabolites. The series of analyses test the hypothesis that drugs which produce similar patterns of toxicity interact with targets within common toxicological pathways and that activation of the underlying mechanisms depends on structural similarity among toxic molecules. Spontaneous adverse drug reaction (ADR) reports were used to identify cases of IMDILI. Network map tools were used to compare the known and predicted protein interactions with each of the probe drugs to explore the interactions that are common between the drugs. The IMDILI probe set was then used to develop a pharmacophore model which became the starting point for identifying potential toxicity targets for IMDILI. Pharmacophore screening results demonstrated similarities between the probe IMDILI set of drugs and Toll-Like Receptor 7 (TLR7) agonists, suggesting TLR7 as a potential toxicity target. This thesis highlights the potential for multidisciplinary approaches in the study of complex diseases. Such approaches are particularly helpful for rare diseases where little knowledge is available, and may provide key insights into mechanisms of toxicity that cannot be gleaned from a single disciplinary study

    Global analysis of SNPs, proteins and protein-protein interactions: approaches for the prioritisation of candidate disease genes.

    Get PDF
    PhDUnderstanding the etiology of complex disease remains a challenge in biology. In recent years there has been an explosion in biological data, this study investigates machine learning and network analysis methods as tools to aid candidate disease gene prioritisation, specifically relating to hypertension and cardiovascular disease. This thesis comprises four sets of analyses: Firstly, non synonymous single nucleotide polymorphisms (nsSNPs) were analysed in terms of sequence and structure based properties using a classifier to provide a model for predicting deleterious nsSNPs. The degree of sequence conservation at the nsSNP position was found to be the single best attribute but other sequence and structural attributes in combination were also useful. Predictions for nsSNPs within Ensembl have been made publicly available. Secondly, predicting protein function for proteins with an absence of experimental data or lack of clear similarity to a sequence of known function was addressed. Protein domain attributes based on physicochemical and predicted structural characteristics of the sequence were used as input to classifiers for predicting membership of large and diverse protein superfamiles from the SCOP database. An enrichment method was investigated that involved adding domains to the training dataset that are currently absent from SCOP. This analysis resulted in improved classifier accuracy, optimised classifiers achieved 66.3% for single domain proteins and 55.6% when including domains from multi domain proteins. The domains from superfamilies with low sequence similarity, share global sequence properties enabling applications to be developed which compliment profile methods for detecting distant sequence relationships. Thirdly, a topological analysis of the human protein interactome was performed. The results were combined with functional annotation and sequence based properties to build models for predicting hypertension associated proteins. The study found that predicted hypertension related proteins are not generally associated with network hubs and do not exhibit high clustering coefficients. Despite this, they tend to be closer and better connected to other hypertension proteins on the interaction network than would be expected by chance. Classifiers that combined PPI network, amino acid sequence and functional properties produced a range of precision and recall scores according to the applied 3 weights. Finally, interactome properties of proteins implicated in cardiovascular disease and cancer were studied. The analysis quantified the influential (central) nature of each protein and defined characteristics of functional modules and pathways in which the disease proteins reside. Such proteins were found to be enriched 2 fold within proteins that are influential (p<0.05) in the interactome. Additionally, they cluster in large, complex, highly connected communities, acting as interfaces between multiple processes more often than expected. An approach to prioritising disease candidates based on this analysis was proposed. Each analyses can provide some new insights into the effort to identify novel disease related proteins for cardiovascular disease

    Comparative Analysis of Small Non-Coding RNA and Messenger RNA Expression in Somatic Cell Nuclear Transfer and In Vitro-Fertilized Bovine Embryos During Early Development Through the Maternal-to-Embryonic Transition

    Get PDF
    Cloning animals using somatic cell nuclear transfer (scNT) was first successfully demonstrated with the birth of Dolly the sheep, but the process of cloning remains highly inefficient. By improving our understanding of the errors that may occur during cloned cattle embryo development, we could obtain a greater understanding of how specific molecular events contribute to successful development. The central dogma of biology refers to the process of DNA being transcribed into messenger RNA (mRNA) and the translation of mRNA into proteins, which ultimately carry out the functions encoded by genes. The epigenetic code is defined as the array of chemical modifications, or “marks”, to DNA molecules that do not change the genome sequence but do allow for control of gene expression. During early development, genome reprogramming involves the removal of epigenetic marks from the sperm and egg and re-establishment of marks for the embryonic genome that code for proper gene expression to support embryo development. The point during this process at which the embryo’s genes are turned on is known as embryonic genome activation (EGA). Small non-coding RNAs (sncRNAs), including microRNAs (miRNAs), may also contribute to the this process. For example, miRNA molecules do not code for proteins themselves, but rather bind to mRNAs and effectively block their translation into protein. We hypothesized that aberrant expression of sncRNAs in cloned embryos may lead to anomalous abundance of mRNA molecules, thus explaining poor development of cloned embryos. First, we used RNA sequencing to examine the total population of sncRNAs in cattle embryos produced by in vitro fertilization (IVF) and found a dramatic shift in populations at the EGA. Next, we collected both sncRNA and mRNA from scNT cattle embryos, and again performed sequencing of both RNA fractions. We found that few sncRNAs were abnormally expressed in scNT embryos, with all differences appearing after EGA at the morula developmental stage. However, notable differences in the populations of sncRNAs were evident when comparing embryos by developmental stage. For populations of mRNA, we observed dramatic differences when comparing scNT and IVF cattle embryos, with the highest number of changes occurring at the EGA (8-cell stage) and after (morula stage). While changes in specific miRNA molecules (miR-34a and miR-345) were negatively correlated with some of their predicted target mRNAs, this pattern was not widespread as would be expected if these sncRNAs are functionally binding to all of the predicted mRNA targets. Collectively, our observations suggest that other mechanisms leading to altered expression of mRNA in cloned embryos may be responsible for their relatively poor development

    Knowledge Management Approaches for predicting Biomarker and Assessing its Impact on Clinical Trials

    Get PDF
    The recent success of companion diagnostics along with the increasing regulatory pressure for better identification of the target population has created an unprecedented incentive for the drug discovery companies to invest into novel strategies for stratified biomarker discovery. Catching with this trend, trials with stratified biomarker in drug development have quadrupled in the last decade but represent a small part of all Interventional trials reflecting multiple co-developmental challenges of therapeutic compounds and companion diagnostics. To overcome the challenge, varied knowledge management and system biology approaches are adopted in the clinics to analyze/interpret an ever increasing collection of OMICS data. By semi-automatic screening of more than 150,000 trials, we filtered trials with stratified biomarker to analyse their therapeutic focus, major drivers and elucidated the impact of stratified biomarker programs on trial duration and completion. The analysis clearly shows that cancer is the major focus for trials with stratified biomarker. But targeted therapies in cancer require more accurate stratification of patient population. This can be augmented by a fresh approach of selecting a new class of biomolecules i.e. miRNA as candidate stratification biomarker. miRNA plays an important role in tumorgenesis in regulating expression of oncogenes and tumor suppressors; thus affecting cell proliferation, differentiation, apoptosis, invasion, angiogenesis. miRNAs are potential biomarkers in different cancer. However, the relationship between response of cancer patients towards targeted therapy and resulting modifications of the miRNA transcriptome in pathway regulation is poorly understood. With ever-increasing pathways and miRNA-mRNA interaction databases, freely available mRNA and miRNA expression data in multiple cancer therapy have created an unprecedented opportunity to decipher the role of miRNAs in early prediction of therapeutic efficacy in diseases. We present a novel SMARTmiR algorithm to predict the role of miRNA as therapeutic biomarker for an anti-EGFR monoclonal antibody i.e. cetuximab treatment in colorectal cancer. The application of an optimised and fully automated version of the algorithm has the potential to be used as clinical decision support tool. Moreover this research will also provide a comprehensive and valuable knowledge map demonstrating functional bimolecular interactions in colorectal cancer to scientific community. This research also detected seven miRNA i.e. hsa-miR-145, has-miR-27a, has- miR-155, hsa-miR-182, hsa-miR-15a, hsa-miR-96 and hsa-miR-106a as top stratified biomarker candidate for cetuximab therapy in CRC which were not reported previously. Finally a prospective plan on future scenario of biomarker research in cancer drug development has been drawn focusing to reduce the risk of most expensive phase III drug failures
    • …
    corecore