872 research outputs found

    Generalized adjacency and the conservation of gene clusters in genetic networks defined by synthetic lethals

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Given genetic networks derived from two genomes, it may be difficult to decide if their local structures are similar enough in both genomes to infer some ancestral configuration or some conserved functional relationships. Current methods all depend on searching for identical substructures.</p> <p>Methods</p> <p>We explore a generalized vertex proximity criterion, and present analytic and probability results for the comparison of random lattice networks.</p> <p>Results</p> <p>We apply this criterion to the comparison of the genetic networks of two evolutionarily divergent yeasts, <it>Saccharomyces cerevisiae </it>and <it>Schizosaccharomyces pombe</it>, derived using the Synthetic Genetic Array screen. We show that the overlapping parts of the networks of the two yeasts share a common structure beyond the shared edges. This may be due to their conservation of redundant pathways containing many synthetic lethal pairs of genes.</p> <p>Conclusions</p> <p>Detecting the shared generalized adjacency clusters in the genetic networks of the two yeasts show that this analytical construct can be a useful tool in probing conserved network structure across divergent genomes.</p

    Measures for the exceptionality of gene order in conserved genomic regions

    Get PDF
    International audienceWe propose in this article three measures for quantifying the exceptionality of gene order in conserved genomic regions found by the reference region approach. The three measures are based on the transposition distance in the permutation group. We obtain analytic expressions for their distribution in the case of a random uniform permutation, i.e. under the null hypothesis of random gene order. Our results can be used to increase the power of the significance tests for gene clusters which take into account only the proximity of the orthologous genes and not their order

    Data Representation for Learning and Information Fusion in Bioinformatics

    Get PDF
    This thesis deals with the rigorous application of nonlinear dimension reduction and data organization techniques to biomedical data analysis. The Laplacian Eigenmaps algorithm is representative of these methods and has been widely applied in manifold learning and related areas. While their asymptotic manifold recovery behavior has been well-characterized, the clustering properties of Laplacian embeddings with finite data are largely motivated by heuristic arguments. We develop a precise bound, characterizing cluster structure preservation under Laplacian embeddings. From this foundation, we introduce flexible and mathematically well-founded approaches for information fusion and feature representation. These methods are applied to three substantial case studies in bioinformatics, illustrating their capacity to extract scientifically valuable information from complex data

    Statistical Methods for High Dimensional Networked Data Analysis.

    Full text link
    Networked data are frequently encountered in many scientific disciplines. One major challenges in the analysis of such data are its high dimensionality and complex dependence. My dissertation consists of three projects. The first project focuses on the development of sparse multivariate factor analysis regression model to construct the underlying sparse association map between gene expressions and biomarkers. This is motivated by the fact that some associations may be obscured by unknown confounding factors that are not collected in the data. I have shown that accounting for such unobserved confounding factors can increase both sensitivity and specificity for detecting important gene-biomarker associations and thus lead to more interpretable association maps. The second project concerns the reconstruction of the underlying gene regulatory network using directed acyclic graphical models. My project aims to reduce false discoveries by identifying and removing edges resulted from shared confounding factors. I propose sparse structural factor equation models, in which structural equation models are used to capture directed graphs while factor analysis models are used to account for potential latent factors. I have shown that the proposed method enables me to obtain a simpler and more interpretable topology of a gene regulatory network. The third project is devoted to the development of a new regression analysis methodology to analyze electroencephalogram (EEG) neuroimaging data that are correlated among electrodes within an EEG-net. To address analytic challenges pertaining to the integration of network topology into the analysis, I propose hybrid quadratic inference functions that utilize both prior and data-driven correlations among network nodes into statistical estimation and inference. The proposed method is conceptually simple and computationally fast and more importantly has appealing large-sample properties. In a real EEG data analysis I applied the proposed method to detect significant association of iron deficiency on event-related potential measured in two subregions, which was not found using the classical spatial ANOVA random-effects models.PHDBiostatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/111595/1/zhouyan_1.pd
    • …
    corecore