32 research outputs found

    Graph Theoretic and Pearson Correlation-Based Discovery of Network Biomarkers for Cancer

    Get PDF
    Two graph theoretic concepts—clique and bipartite graphs—are explored to identify the network biomarkers for cancer at the gene network level. The rationale is that a group of genes work together by forming a cluster or a clique-like structures to initiate a cancer. After initiation, the disease signal goes to the next group of genes related to the second stage of a cancer, which can be represented as a bipartite graph. In other words, bipartite graphs represent the cross-talk among the genes between two disease stages. To prove this hypothesis, gene expression values for three cancers— breast invasive carcinoma (BRCA), colorectal adenocarcinoma (COAD) and glioblastoma multiforme (GBM)—are used for analysis. First, a co-expression gene network is generated with highly correlated gene pairs with a Pearson correlation coefficient ≥ 0.9. Second, clique structures of all sizes are isolated from the co-expression network. Then combining these cliques, three different biomarker modules are developed—maximal clique-like modules, 2-clique-1-bipartite modules, and 3-clique-2-bipartite modules. The list of biomarker genes discovered from these network modules are validated as the essential genes for causing a cancer in terms of network properties and survival analysis. This list of biomarker genes will help biologists to design wet lab experiments for further elucidating the complex mechanism of cancer

    Bioinformatics applied to human genomics and proteomics: development of algorithms and methods for the discovery of molecular signatures derived from omic data and for the construction of co-expression and interaction networks

    Get PDF
    [EN] The present PhD dissertation develops and applies Bioinformatic methods and tools to address key current problems in the analysis of human omic data. This PhD has been organised by main objectives into four different chapters focused on: (i) development of an algorithm for the analysis of changes and heterogeneity in large-scale omic data; (ii) development of a method for non-parametric feature selection; (iii) integration and analysis of human protein-protein interaction networks and (iv) integration and analysis of human co-expression networks derived from tissue expression data and evolutionary profiles of proteins. In the first chapter, we developed and tested a new robust algorithm in R, called DECO, for the discovery of subgroups of features and samples within large-scale omic datasets, exploring all feature differences possible heterogeneity, through the integration of both data dispersion and predictor-response information in a new statistic parameter called h (heterogeneity score). In the second chapter, we present a simple non-parametric statistic to measure the cohesiveness of categorical variables along any quantitative variable, applicable to feature selection in all types of big data sets. In the third chapter, we describe an analysis of the human interactome integrating two global datasets from high-quality proteomics technologies: HuRI (a human protein-protein interaction network generated by a systematic experimental screening based on Yeast-Two-Hybrid technology) and Cell-Atlas (a comprehensive map of subcellular localization of human proteins generated by antibody imaging). This analysis aims to create a framework for the subcellular localization characterization supported by the human protein-protein interactome. In the fourth chapter, we developed a full integration of three high-quality proteome-wide resources (Human Protein Atlas, OMA and TimeTree) to generate a robust human co-expression network across tissues assigning each human protein along the evolutionary timeline. In this way, we investigate how old in evolution and how correlated are the different human proteins, and we place all them in a common interaction network. As main general comment, all the work presented in this PhD uses and develops a wide variety of bioinformatic and statistical tools for the analysis, integration and enlighten of molecular signatures and biological networks using human omic data. Most of this data corresponds to sample cohorts generated in recent biomedical studies on specific human diseases

    Recent advances in clustering methods for protein interaction networks

    Get PDF
    The increasing availability of large-scale protein-protein interaction data has made it possible to understand the basic components and organization of cell machinery from the network level. The arising challenge is how to analyze such complex interacting data to reveal the principles of cellular organization, processes and functions. Many studies have shown that clustering protein interaction network is an effective approach for identifying protein complexes or functional modules, which has become a major research topic in systems biology. In this review, recent advances in clustering methods for protein interaction networks will be presented in detail. The predictions of protein functions and interactions based on modules will be covered. Finally, the performance of different clustering methods will be compared and the directions for future research will be discussed

    Activation of the Notch Signaling Pathway In Vivo Elicits Changes in CSL Nuclear Dynamics.

    Get PDF
    A key feature of Notch signaling is that it directs immediate changes in transcription via the DNA-binding factor CSL, switching it from repression to activation. How Notch generates both a sensitive and accurate response-in the absence of any amplification step-remains to be elucidated. To address this question, we developed real-time analysis of CSL dynamics including single-molecule tracking in vivo. In Notch-OFF nuclei, a small proportion of CSL molecules transiently binds DNA, while in Notch-ON conditions CSL recruitment increases dramatically at target loci, where complexes have longer dwell times conferred by the Notch co-activator Mastermind. Surprisingly, recruitment of CSL-related corepressors also increases in Notch-ON conditions, revealing that Notch induces cooperative or "assisted" loading by promoting local increase in chromatin accessibility. Thus, in vivo Notch activity triggers changes in CSL dwell times and chromatin accessibility, which we propose confer sensitivity to small input changes and facilitate timely shut-down

    Falsifiable Network Models. A Network-based Approach to Predict Treatment Efficacy in Ulcerative Colitis

    Get PDF
    This work is focused on understanding the treatment efficacy of patients with ulcerative colitis (UC) using a network-based approach. UC is one of two forms of inflammatory bowel disease (IBD) along with Crohn’s disease. UC is a debilitating condition characterized by chronic inflammation and ulceration of the colon and rectum. UC symptoms occur gradually rather than abruptly, and the degree of symptoms differs across UC patients. Only around 20% of all UC cases can be explained by known genetic variations, implying a more ambiguous aetiology that is yet not fully understood but is thought to involve a complex interplay between genetic and environmental factors. The available therapy for UC substantially reduces symptoms and achieves long-term remission. However, about one-third of UC patients fail to respond to anti-TNFα therapy and consequently develop long-term side effects due to medication. Non-response to existing antibody-based therapies in subgroups of UC patients is a major challenge and incurs a healthcare burden. Therefore, the disease markers for predicting therapy response to assist individualized therapy decisions are needed. To date, no quantitative computational framework is available to predict treatment response in UC. We developed a quantitative framework that uses gene expression data and existing biological background information on signalling pathways to quantify network connectivity from receptors to transcription factors (TF) that are involved in UC pathogenesis. Variations in network connectivity in UC patients can be used to identify responders and non-responders to anti-TNFα and anti-Integrin treatment. Our findings allow us to summarize the effect of small gene expression changes on the overall connectivity of a signalling network and estimate the effect this will have on the individual patients' responses. Estimating the network connectivity associated with varied drug responses may provide an understanding of individualized treatment outcomes. Our model could be used to generate testable hypotheses about how individual genes act together in networks to cause inflammation in UC as well as other immune-inflammatory diseases such as psoriasis, asthma, and rheumatoid arthritis

    Knowledge derivation and data mining strategies for probabilistic functional integrated networks

    Get PDF
    PhDOne of the fundamental goals of systems biology is the experimental verification of the interactome: the entire complement of molecular interactions occurring in the cell. Vast amounts of high-throughput data have been produced to aid this effort. However these data are incomplete and contain high levels of both false positives and false negatives. In order to combat these limitations in data quality, computational techniques have been developed to evaluate the datasets and integrate them in a systematic fashion using graph theory. The result is an integrated network which can be analysed using a variety of network analysis techniques to draw new inferences about biological questions and to guide laboratory experiments. Individual research groups are interested in specific biological problems and, consequently, network analyses are normally performed with regard to a specific question. However, the majority of existing data integration techniques are global and do not focus on specific areas of biology. Currently this issue is addressed by using known annotation data (such as that from the Gene Ontology) to produce process-specific subnetworks. However, this approach discards useful information and is of limited use in poorly annotated areas of the interactome. Therefore, there is a need for network integration techniques that produce process-specific networks without loss of data. The work described here addresses this requirement by extending one of the most powerful integration techniques, probabilistic functional integrated networks (PFINs), to incorporate a concept of biological relevance. Initially, the available functional data for the baker’s yeast Saccharomyces cerevisiae was evaluated to identify areas of bias and specificity which could be exploited during network integration. This information was used to develop an integration technique which emphasises interactions relevant to specific biological questions, using yeast ageing as an exemplar. The integration method improves performance during network-based protein functional prediction in relation to this process. Further, the process-relevant networks complement classical network integration techniques and significantly improve network analysis in a wide range of biological processes. The method developed has been used to produce novel predictions for 505 Gene Ontology biological processes. Of these predictions 41,610 are consistent with existing computational annotations, and 906 are consistent with known expert-curated annotations. The approach significantly reduces the hypothesis space for experimental validation of genes hypothesised to be involved in the oxidative stress response. Therefore, incorporation of biological relevance into network integration can significantly improve network analysis with regard to individual biological questions
    corecore