1,294 research outputs found

    Herb Target Prediction Based on Representation Learning of Symptom related Heterogeneous Network.

    Get PDF
    Traditional Chinese Medicine (TCM) has received increasing attention as a complementary approach or alternative to modern medicine. However, experimental methods for identifying novel targets of TCM herbs heavily relied on the current available herb-compound-target relationships. In this work, we present an Herb-Target Interaction Network (HTINet) approach, a novel network integration pipeline for herb-target prediction mainly relying on the symptom related associations. HTINet focuses on capturing the low-dimensional feature vectors for both herbs and proteins by network embedding, which incorporate the topological properties of nodes across multi-layered heterogeneous network, and then performs supervised learning based on these low-dimensional feature representations. HTINet obtains performance improvement over a well-established random walk based herb-target prediction method. Furthermore, we have manually validated several predicted herb-target interactions from independent literatures. These results indicate that HTINet can be used to integrate heterogeneous information to predict novel herb-target interactions

    Integration of multi-scale protein interactions for biomedical data analysis

    Get PDF
    With the advancement of modern technologies, we observe an increasing accumulation of biomedical data about diseases. There is a need for computational methods to sift through and extract knowledge from the diverse data available in order to improve our mechanistic understanding of diseases and improve patient care. Biomedical data come in various forms as exemplified by the various omics data. Existing studies have shown that each form of omics data gives only partial information on cells state and motivated jointly mining multi-omics, multi-modal data to extract integrated system knowledge. The interactome is of particular importance as it enables the modelling of dependencies arising from molecular interactions. This Thesis takes a special interest in the multi-scale protein interactome and its integration with computational models to extract relevant information from biomedical data. We define multi-scale interactions at different omics scale that involve proteins: pairwise protein-protein interactions, multi-protein complexes, and biological pathways. Using hypergraph representations, we motivate considering higher-order protein interactions, highlighting the complementary biological information contained in the multi-scale interactome. Based on those results, we further investigate how those multi-scale protein interactions can be used as either prior knowledge, or auxiliary data to develop machine learning algorithms. First, we design a neural network using the multi-scale organization of proteins in a cell into biological pathways as prior knowledge and train it to predict a patient's diagnosis based on transcriptomics data. From the trained models, we develop a strategy to extract biomedical knowledge pertaining to the diseases investigated. Second, we propose a general framework based on Non-negative Matrix Factorization to integrate the multi-scale protein interactome with multi-omics data. We show that our approach outperforms the existing methods, provide biomedical insights and relevant hypotheses for specific cancer types

    Following the trail of cellular signatures : computational methods for the analysis of molecular high-throughput profiles

    Get PDF
    Over the last three decades, high-throughput techniques, such as next-generation sequencing, microarrays, or mass spectrometry, have revolutionized biomedical research by enabling scientists to generate detailed molecular profiles of biological samples on a large scale. These profiles are usually complex, high-dimensional, and often prone to technical noise, which makes a manual inspection practically impossible. Hence, powerful computational methods are required that enable the analysis and exploration of these data sets and thereby help researchers to gain novel insights into the underlying biology. In this thesis, we present a comprehensive collection of algorithms, tools, and databases for the integrative analysis of molecular high-throughput profiles. We developed these tools with two primary goals in mind. The detection of deregulated biological processes in complex diseases, like cancer, and the identification of driving factors within those processes. Our first contribution in this context are several major extensions of the GeneTrail web service that make it one of the most comprehensive toolboxes for the analysis of deregulated biological processes and signaling pathways. GeneTrail offers a collection of powerful enrichment and network analysis algorithms that can be used to examine genomic, epigenomic, transcriptomic, miRNomic, and proteomic data sets. In addition to approaches for the analysis of individual -omics types, our framework also provides functionality for the integrative analysis of multi-omics data sets, the investigation of time-resolved expression profiles, and the exploration of single-cell experiments. Besides the analysis of deregulated biological processes, we also focus on the identification of driving factors within those processes, in particular, miRNAs and transcriptional regulators. For miRNAs, we created the miRNA pathway dictionary database miRPathDB, which compiles links between miRNAs, target genes, and target pathways. Furthermore, it provides a variety of tools that help to study associations between them. For the analysis of transcriptional regulators, we developed REGGAE, a novel algorithm for the identification of key regulators that have a significant impact on deregulated genes, e.g., genes that show large expression differences in a comparison between disease and control samples. To analyze the influence of transcriptional regulators on deregulated biological processes,, we also created the RegulatorTrail web service. In addition to REGGAE, this tool suite compiles a range of powerful algorithms that can be used to identify key regulators in transcriptomic, proteomic, and epigenomic data sets. Moreover, we evaluate the capabilities of our tool suite through several case studies that highlight the versatility and potential of our framework. In particular, we used our tools to conducted a detailed analysis of a Wilms' tumor data set. Here, we could identify a circuitry of regulatory mechanisms, including new potential biomarkers, that might contribute to the blastemal subtype's increased malignancy, which could potentially lead to new therapeutic strategies for Wilms' tumors. In summary, we present and evaluate a comprehensive framework of powerful algorithms, tools, and databases to analyze molecular high-throughput profiles. The provided methods are of broad interest to the scientific community and can help to elucidate complex pathogenic mechanisms.Heutzutage werden molekulare Hochdurchsatzmessverfahren, wie Hochdurchsatzsequenzierung, Microarrays, oder Massenspektrometrie, regelmäßig angewendet, um Zellen im großen Stil und auf verschiedenen molekularen Ebenen zu charakterisieren. Die dabei generierten Datensätze sind in der Regel hochdimensional und oft verrauscht. Daher werden leistungsfähige computergestützte Anwendungen benötigt, um deren Analyse zu ermöglichen. In dieser Arbeit präsentieren wir eine Reihe von effektiven Algorithmen, Programmen, und Datenbaken für die Analyse von molekularen Hochdurchsetzdatensätzen. Diese Ansätze wurden entwickelt, um deregulierte biologische Prozesse zu untersuchen und in diesen wichtige Schlüsselmoleküle zu identifizieren. Zusätzlich wurden eine Reihe von Analysen durchgeführt um die verschiedenen Methoden zu evaluieren. Zu diesem Zweck haben wir insbesondere eine Wilmstumor Studie durchgeführt, in der wir verschiedene regulatorische Mechanismen und dazugehörige Biomarker identifizieren konnten, die für die erhöhte Malignität von Wilmstumoren mit blastemreichen Subtyp verantwortlich sein könnten. Diese Erkenntnisse könnten in der Zukunft zu einer verbesserten Behandlung dieser Tumore führen. Diese Ergebnisse zeigen eindrucksvoll, dass unsere Ansätze in der Lage sind, verschiedene molekulare Hochdurchsatzmessungen auszuwerten und dabei helfen können pathogene Mechanismen im Zusammenhang mit Krebs oder anderen komplexen Krankheiten aufzuklären

    Falsifiable Network Models. A Network-based Approach to Predict Treatment Efficacy in Ulcerative Colitis

    Get PDF
    This work is focused on understanding the treatment efficacy of patients with ulcerative colitis (UC) using a network-based approach. UC is one of two forms of inflammatory bowel disease (IBD) along with Crohn’s disease. UC is a debilitating condition characterized by chronic inflammation and ulceration of the colon and rectum. UC symptoms occur gradually rather than abruptly, and the degree of symptoms differs across UC patients. Only around 20% of all UC cases can be explained by known genetic variations, implying a more ambiguous aetiology that is yet not fully understood but is thought to involve a complex interplay between genetic and environmental factors. The available therapy for UC substantially reduces symptoms and achieves long-term remission. However, about one-third of UC patients fail to respond to anti-TNFα therapy and consequently develop long-term side effects due to medication. Non-response to existing antibody-based therapies in subgroups of UC patients is a major challenge and incurs a healthcare burden. Therefore, the disease markers for predicting therapy response to assist individualized therapy decisions are needed. To date, no quantitative computational framework is available to predict treatment response in UC. We developed a quantitative framework that uses gene expression data and existing biological background information on signalling pathways to quantify network connectivity from receptors to transcription factors (TF) that are involved in UC pathogenesis. Variations in network connectivity in UC patients can be used to identify responders and non-responders to anti-TNFα and anti-Integrin treatment. Our findings allow us to summarize the effect of small gene expression changes on the overall connectivity of a signalling network and estimate the effect this will have on the individual patients' responses. Estimating the network connectivity associated with varied drug responses may provide an understanding of individualized treatment outcomes. Our model could be used to generate testable hypotheses about how individual genes act together in networks to cause inflammation in UC as well as other immune-inflammatory diseases such as psoriasis, asthma, and rheumatoid arthritis

    Computational techniques for cell signaling

    Get PDF
    Cells can be viewed as sophisticated machines that organize their constituent components and molecules to receive, process, and respond to signals. The goal of the scientist is to uncover both the individual operations underlying these processes and the mechanism of the emergent properties of interest that give rise to the various phenomena such as disease, development, recovery or aging. Cell signaling plays a crucial role in all of these areas. The complexity of biological processes coupled with the physical limitations of experiments to observe individual molecular components across small to large scales limits the knowlege that can be gleaned from direct observations. Mathematical modeling can be used to estimate parameters that are hidden or too difficult to observe in experiments, and it can make qualitative predictions that can distinguish between hypotheses of interest. Statistical analysis can be employed to explore the large amounts of data generated by modern experimental techniques such as sequencing and high-throughput screening, and it can integrate the observations from many individual experiments or even separate studies to generate new hypotheses. This dissertation employs mathematical and statistical analyses for three prominent aspects of cell signaling: the physical transfer of signaling molecules between cells, the intracellular protein machinery that organizes into pathways to process these signals, and changes in gene expression in response to cell signaling. Computational biology can be described as an applied discipline in that it aims to further the knowledge of a discipline that is distinct from itself. However, the richness of the problems encountered in biology requires continuous development of better methods equipped to handle the complexity, size, or uncertainty of the data, and to build in constraints motivated by the reality of the underlying biological system. In addition, better computational and mathematical methods are also needed to model the emergent behavior that arises from many components. The work presented in this dissertation fulfills both of these roles. We apply known and existing techniques to analyse experimental data and provide biological meaning, and we also develop new statistical and mathematical models that add to the knowledge and practice of computational biology. Much of cell signaling is initiated by signal transduction from the exterior, either by sensing the environmental conditions or the recpetion of specific signals from other cells. The phenomena of most immediate concern to our species, that of human health and disease, are usually also generated from, and manifest in, our tissues and organs due to the interaction and signaling between cells. A modality of inter-cellular communication that was regarded earlier as an obscure phenomenon but has more recently come to the attention of the scientific community is that of tunneling nanotubes (TNs). TNs have been observed as thin (of the order of 100 nanometers) extensions from a cell to another closely located one. The formation of such structures along with the intercellular exchange of molecules through them, and their interaction with the cytoskeleton, could be involved in many important processes, such as tissue formation and cancer growth. We describe a simple model of passive transport of molecules between cells due to TNs. Building on a few basic assumptions, we derive parametrized, closed-form expressions to describe the concentration of transported molecules as a function of distance from a population of TN-forming cells. Our model predicts how the perfusion of molecules through the TNs is affected by the size of the transferred molecules, the length and stability of nanotube formation, and the differences between membrane-bound and cytosolic proteins. To our knowledge, this is the first published mathematical model of intercellular transfer through tunneling nanotubes. We envision that experimental observations will be able to confirm or improve the assumptions made in our model. Furthermore, quantifying the form of inter-cellular communication in the basic scenario envisioned in our model can help suggest ways to measure and investigate cases of possible regulation of either formation of tunneling nanotubes or transport through them. The next problem we focus on is uncovering how the interactions between the genes and proteins in a cell organize into pathways to process call signals or perform other tasks. The ability to accurately model and deeply understand gene and protein interaction networks of various kinds can be very powerful for prioritizing candidate genes and predicting their role in various signaling pathways and processes. A popular technique for gene prioritization and function prediction is the graph diffusion kernel. We show how the graph diffusion kernel is mathematically similar to the Ising spin graph, a model popular in statistical physics but not usually employed on biological interaction networks. We develop a new method for calculating gene association based on the Ising spin model which is different from the methods common in either bioinformatics or statistical physics. We show that our method performs better than both the graph diffusion kernel and its commonly used equivalent in the Ising model. We present a theoretical argument for understanding its performance based on ideas of phase transitions on networks. We measure its performance by applying our method to link prediction on protein interaction networks. Unlike candidate gene prioritization or function prediction, link prediction does not depend on the existing annotation or characterization of genes for ground truth. It helps us to avoid the confounding noise and uncertainty in the network and annotation data. As a purely network analysis problem, it is well suited for comparing network analysis methods. Once we know that we are accurately modeling the interaction network, we can employ our model to solve other problems like gene prioritization using interaction data. We also apply statistical analysis for a specific instance of a cell signaling process: the drought response in Brassica napus, a plant of scientific and economic importance. Important changes in the cell physiology of guard cells are initiated by abscisic acid, an important phytohormone that signals water deficit stress. We analyse RNA-seq reads resulting from the sequencing of mRNA extracted from protoplasts treated with abscisic acid. We employ sequence analysis, statisitical modeling, and the integration of cross-species network data to uncover genes, pathways, and interactions important in this process. We confirm what is known from other species and generate new gene and interaction candidates. By associating functional and sequence modification, we are also able to uncover evidence of evolution of gene specialization, a process that is likely widespread in polyploid genomes. This work has developed new computational methods and applied existing tools for understanding cellular signaling and pathways. We have applied statistical analysis to integrate expression, interactome, pathway, regulatory elements, and homology data to infer \textit{Brassica napus} genes and their roles involved in drought response. Previous literature suggesting support for our findings from other species based on independent experiments is found for many of of these findings. By relating the changes in regulatory elements, our RNA-seq results and common gene ancestry, we present evidence of its evolution in the context of polyploidy. Our work can provide a scientific basis for the pursuit of certain genes as targets of breeding and genetic engineering efforts for the development of drought tolerant oil crops. Building on ideas from statistical physics, we developed a new model of gene associations in networks. Using link prediction as a metric for the accuracy of modeling the underlying structure of a real network, we show that our model shows improved performance on real protein interaction networks. Our model of gene associations can be use to prioritize candidate genes for a disease or phenotype of interest. We also develop a mathematical model for a novel inter-cellular mode of biomolecule transfer. We relate hypotheses about the dynamics of TN formation, stability, and nature of molecular transport to quantitative predictions that may be tested by suitable experiments. In summary, this work demostrates the application and development of computational analysis of cell signaling at the level of the transcriptome, the interactome, and physical transport

    Machine Learning and Deep Learning Approaches for Brain Disease Diagnosis : Principles and Recent Advances

    Get PDF
    This work was supported in part by the National Research Foundation of Korea-Grant funded by the Korean Government (Ministry of Science and ICT) under Grant NRF 2020R1A2B5B02002478, and in part by Sejong University through its Faculty Research Program under Grant 20212023.Peer reviewedPublisher PD

    Pathway-Based Multi-Omics Data Integration for Breast Cancer Diagnosis and Prognosis.

    Get PDF
    Ph.D. Thesis. University of Hawaiʻi at Mānoa 2017
    corecore