1,122 research outputs found

    Detecting Differentially Co-Expressed Gene Modules Via The Edge-Count Test

    Get PDF
    Background Gene expression profiling by microarray has been used to uncover molecular variations in many different diseases. Complementary to conventional differential expression analysis, differential co-expression analysis can identify gene markers from the systematic and granular level. There are three aspects for differential co-expression network analysis, including the network global topological comparison, differential co-expression cluster identification, and differential co-expressed genes and gene pair identification. To date, most of the methods available still rely on Pearsonā€™s correlation coefficient despite its nonlinear insensitivity. Results Here we present an approach that is robust to nonlinearity by using the edge-count test for differential co-expression analysis. The performance of the new approach was tested with synthetic data and found to have significant results. For real data, we used a human cervical cancer data set prepared from 29 pairs of cervical tumor and matched normal tissue samples. Hierarchical cluster analysis resulted in the identification of clusters containing differentially co-expressed genes associated with the regulation of cervical cancer. Conclusion The proposed approach targets all different types of differential co-expression and it is sensitive to nonlinear relations. It is easy to implement and can be applied to any sequencing data to identify gene co-expression differences between multiple conditions

    On the design of advanced filters for biological networks using graph theoretic properties

    Get PDF
    Network modeling of biological systems is a powerful tool for analysis of high-throughput datasets by computational systems biologists. Integration of networks to form a heterogeneous model requires that each network be as noise-free as possible while still containing relevant biological information. In earlier work, we have shown that the graph theoretic properties of gene correlation networks can be used to highlight and maintain important structures such as high degree nodes, clusters, and critical links between sparse network branches while reducing noise. In this paper, we propose the design of advanced network filters using structurally related graph theoretic properties. While spanning trees and chordal subgraphs provide filters with special advantages, we hypothesize that a hybrid subgraph sampling method will allow for the design of a more effective filter preserving key properties in biological networks. That the proposed approach allows us to optimize a number of parameters associated with the filtering process which in turn improves upon the identification of essential genes in mouse aging networks

    A networkā€based variable selection approach for identification of modules and biomarker genes associated with endā€stage kidney disease

    Full text link
    AimsIntervention for endā€stage kidney disease (ESKD), which is associated with adverse prognoses and major economic burdens, is challenging due to its complex pathogenesis. The study was performed to identify biomarker genes and molecular mechanisms for ESKD by bioinformatics approach.MethodsUsing the Gene Expression Omnibus dataset GSE37171, this study identified pathways and genomic biomarkers associated with ESKD via a multiā€stage knowledge discovery process, including identification of modules of genes by weighted gene coā€expression network analysis, discovery of important involved pathways by Gene Ontology and Kyoto Encyclopedia of Genes and Genomes enrichment analyses, selection of differentially expressed genes by the empirical Bayes method, and screening biomarker genes by the least absolute shrinkage and selection operator (Lasso) logistic regression. The results were validated using GSE70528, an independent testing dataset.ResultsThree clinically important gene modules associated with ESKD, were identified by weighted gene coā€expression network analysis. Within these modules, Gene Ontology and Kyoto Encyclopedia of Genes and Genomes enrichment analyses revealed important biological pathways involved in ESKD, including transforming growth factorā€Ī² and Wnt signalling, RNAā€splicing, autophagy and chromatin and histone modification. Furthermore, Lasso logistic regression was conducted to identify five final genes, namely, CNOT8, MST4, PPP2CB, PCSK7 and RBBP4 that are differentially expressed and associated with ESKD. The accuracy of the final model in distinguishing the ESKD cases and controls was 96.8% and 91.7% in the training and validation datasets, respectively.ConclusionNetworkā€based variable selection approaches can identify biological pathways and biomarker genes associated with ESKD. The findings may inform more inā€depth followā€up research and effective therapy.SUMMARY AT A GLANCEThis geneā€“gene network analysis to identify genes associated with endā€stage renal disease is an important step, albeit early, towards the discovery of biomarkers using peripheral blood cells. The findings also provide insight on disease pathophysiology at the molecular level, and hence therapeutic targets for future research.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/162799/2/nep13655.pdfhttp://deepblue.lib.umich.edu/bitstream/2027.42/162799/1/nep13655_am.pd

    Community detection for correlation matrices

    Get PDF
    A challenging problem in the study of complex systems is that of resolving, without prior information, the emergent, mesoscopic organization determined by groups of units whose dynamical activity is more strongly correlated internally than with the rest of the system. The existing techniques to filter correlations are not explicitly oriented towards identifying such modules and can suffer from an unavoidable information loss. A promising alternative is that of employing community detection techniques developed in network theory. Unfortunately, this approach has focused predominantly on replacing network data with correlation matrices, a procedure that tends to be intrinsically biased due to its inconsistency with the null hypotheses underlying the existing algorithms. Here we introduce, via a consistent redefinition of null models based on random matrix theory, the appropriate correlation-based counterparts of the most popular community detection techniques. Our methods can filter out both unit-specific noise and system-wide dependencies, and the resulting communities are internally correlated and mutually anti-correlated. We also implement multiresolution and multifrequency approaches revealing hierarchically nested sub-communities with `hard' cores and `soft' peripheries. We apply our techniques to several financial time series and identify mesoscopic groups of stocks which are irreducible to a standard, sectorial taxonomy, detect `soft stocks' that alternate between communities, and discuss implications for portfolio optimization and risk management.Comment: Final version, accepted for publication on PR

    AutoSOME: a clustering method for identifying gene expression modules without prior knowledge of cluster number

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Clustering the information content of large high-dimensional gene expression datasets has widespread application in "omics" biology. Unfortunately, the underlying structure of these natural datasets is often fuzzy, and the computational identification of data clusters generally requires knowledge about cluster number and geometry.</p> <p>Results</p> <p>We integrated strategies from machine learning, cartography, and graph theory into a new informatics method for automatically clustering self-organizing map ensembles of high-dimensional data. Our new method, called AutoSOME, readily identifies discrete and fuzzy data clusters without prior knowledge of cluster number or structure in diverse datasets including whole genome microarray data. Visualization of AutoSOME output using network diagrams and differential heat maps reveals unexpected variation among well-characterized cancer cell lines. Co-expression analysis of data from human embryonic and induced pluripotent stem cells using AutoSOME identifies >3400 up-regulated genes associated with pluripotency, and indicates that a recently identified protein-protein interaction network characterizing pluripotency was underestimated by a factor of four.</p> <p>Conclusions</p> <p>By effectively extracting important information from high-dimensional microarray data without prior knowledge or the need for data filtration, AutoSOME can yield systems-level insights from whole genome microarray expression studies. Due to its generality, this new method should also have practical utility for a variety of data-intensive applications, including the results of deep sequencing experiments. AutoSOME is available for download at <url>http://jimcooperlab.mcdb.ucsb.edu/autosome</url>.</p

    A structure-preserving hybrid-chordal filter for sampling in correlation networksA structure-preserving hybrid-chordal filter for sampling in correlation networks

    Get PDF
    Biological networks are fast becoming a popular tool for modeling high-throughput data, especially due to the ability of the network model to readily identify structures with biological function. However, many networks are fraught with noise or coincidental edges, resulting in signal corruption. Previous work has found that the implementation of network filters can reduce network noise and size while revealing significant network structures, even enhancing the ability to identify these structures by exaggerating their inherent qualities. In this study, we implement a hybrid network filter that combines features from a spanning tree and near-chordal subgraph identification to show how a filter that incorporates multiple graph theoretic concepts can improve upon network filtering. We use three different clustering methods to highlight the ability of the filter to maintain network clusters, and find evidence that suggests the clusters maintained are of high importance in the original unfiltered network due to high-degree and biological relevance (essentiality). Our filter highlights the advantages of integration of graph theoretic concepts into biological network analysis

    Graph theory applied to neuroimaging data reveals key functional connectivity alterations in brain of behavioral variant Frontotemporal Dementia subjects

    Get PDF
    Brain functional architecture and anatomical structure have been intensively studied to generate efficient models of its complex mechanisms. Functional alterations and cognitive impairments are the most investigated aspects in the recent clinical research as distinctive traits of neurodegeneration. Although specific behaviours are clearly associated to neurodegeneration, information flow breakdown within the brain functional network, responsible to deeply affect cognitive skills, remains not completely understood. Behavioural variant Frontotemporal Dementia (bvFTD) is the most common type of Frontotemporal degeneration, marked by behavioural disturbances, social instabilities and impairment of executive functions. Mathematical modelling offers effective tools to inspect deviations from physiological cognitive functions and connectivity alterations. As a popular recent methodology, graph theoretical approaches applied to imaging data expanded our knowledge of neurodegenerative disorders, although the need for unbiased metrics is still an open issue. In this thesis, we propose an integrated analysis of functional features among brain areas in bvFTD patients, to assess global connectivity and topological network alterations respect to the healthy condition, using a minimum spanning tree (MST) based-model to resting state functional MRI (rs-fMRI) data. Contrary to several graph theoretical approaches, dependent to arbitrary criteria (e.g., correlation thresholds, network density or a priori distribution), MST represents an unambiguous modelling solution, ensuring full reproducibility and robustness in different conditions. Our MSTs were obtained from wavelet correlation matrices derived from mean time series intensities, extracted from 116 regions of interest (ROIs) of 41 bvFTD patients and 39 healthy controls (HC), which underwent rs-fMRI. The resulting graphs were tested for global connectivity and topological differences between the two groups, by applying a Wilcoxon rank sum test with a significance level at 0.05 (nonparametric median difference estimates with 95% confidence interval). The same test was applied for methodological comparison between MST and other common graph theory methods. After methodological comparisons, our MST model achieved the best bvFTD/HC separation performances, without a priori assumptions. Direct MST comparison between bvFTD and healty controls revealed key brain functional architecture differences. Diseased subjects showed a linear-shape network configuration tendency, with high distance between nodes, low centrality parameter values, and a low exchange information capacity (i.e., low network integration) in MST parameters. Moreover, edge-level and node-level features (i.e., superhighways, and node degree and betweenness centrality) indicated a more complex scenario, showing some of the key bvFTD dysfunctions observed in large scale resting-state functional networks (default-mode (DMN), salience (SN), and executive (EN) networks), suggesting an underlying involvement of the limbic system in the observed functional deterioration. Functional isolation has been observed as a generalized process affecting the entire bvFTD network, showing brain macro-regions isolation, with homogeneous functional distribution of brain areas, longer distances between hubs, and longer within-lobe superhighways. Conversely, the HC network showed marked functional integration, where superhighways serve as shortcuts to connect areas from different brain macro-regions. The combination of this theoretical model with rs-fMRI data constitutes an effective method to generate a clear picture of the functional divergence between bvFTD and HCs, providing possible insights on the effects of frontotemporal neurodegeneration and compensatory mechanisms underlying characteristic bvFTD cognitive, social, and executive impairments

    Analysis of High-Throughput Data - Protein-Protein Interactions, Protein Complexes and RNA Half-life

    Get PDF
    The development of high-throughput techniques has lead to a paradigm change in biology from the small-scale analysis of individual genes and proteins to a genome-scale analysis of biological systems. Proteins and genes can now be studied in their interaction with each other and the cooperation within multi-subunit protein complexes can be investigated. Moreover, time-dependent dynamics and regulation of these processes and associations can now be explored by monitoring mRNA changes and turnover. The in-depth analysis of these large and complex data sets would not be possible without sophisticated algorithms for integrating different data sources, identifying interesting patterns in the data and addressing the high variability and error rates in biological measurements. In this thesis, we developed such methods for the investigation of protein interactions and complexes and the corresponding regulatory processes. In the first part, we analyze networks of physical protein-protein interactions measured in large-scale experiments. We show that the topology of the complete interactomes can be confidently extrapolated despite high numbers of missing and wrong interactions from only partial measurements of interaction networks. Furthermore, we find that the structure and stability of protein interaction networks is not only influenced by the degree distribution of the network but also considerably by the suppression or propagation of interactions between highly connected proteins. As analysis of network topology is generally focused on large eukaryotic networks, we developed new methods to analyze smaller networks of intraviral and virus-host interactions. By comparing interactomes of related herpesviral species, we could detect a conserved core of protein interactions and could address the low coverage of the yeast two-hybrid system. In addition, common strategies in the interaction of the viruses with the host cell were identified. New affinity purification methods now make it possible to directly study associations of proteins in complexes. Due to experimental errors the individual protein complexes have to be predicted with computational methods from these purification results. As previously published methods relied more or less heavily on existing knowledge on complexes, we developed an unsupervised prediction algorithm which is independent from such additional data. Using this approach, high-quality protein complexes can be identified from the raw purification data alone for any species purification experiments are performed. To identify the direct, physical interactions within these predicted complexes and their subcomponent structure, we describe a new approach to extract the highest scoring subnetwork connecting the complex and interactions not explained by alternative paths of indirect interactions. In this way, important interactions within the complexes can be identified and their substructure can be resolved in a straightforward way. To explore the regulation of proteins and complexes, we analyzed microarray measurements of mRNA abundance, de novo transcription and decay. Based on the relationship between newly transcribed, pre-existing and total RNA, transcript half-life can be estimated for individual genes using a new microarray normalization method and a quality control can be applied. We show that precise measurements of RNA half-life can be obtained from de novo transcription which are of superior accuracy to previously published results from RNA decay. Using such precise measurements, we studied RNA half-lives in human B-cells and mouse fibroblasts to identify conserved patterns governing RNA turnover. Our results show that transcript half-lives are strongly conserved and specifically correlated to gene function. Although transcript half-life is highly similar in protein complexes and \mbox{families}, individual proteins may deviate significantly from the remaining complex subunits or family members to efficiently support the regulation of protein complexes or to create non-redundant roles of functionally similar proteins. These results illustrate several of the many ways in which high-throughput measurements lead to a better understanding of biological systems. By studying large-scale measure\-ments in this thesis, the structure of protein interaction networks and protein complexes could be better characterized, important interactions and conserved strategies for herpes\-viral infection could be identified and interesting insights could be gained into the regulation of important biological processes and protein complexes. This was made possible by the development of novel algorithms and analysis approaches which will also be valuable for further research on these topics
    • ā€¦
    corecore