4,237 research outputs found

    Mining Maximal Cliques from an Uncertain Graph

    Get PDF
    We consider mining dense substructures (maximal cliques) from an uncertain graph, which is a probability distribution on a set of deterministic graphs. For parameter 0 < {\alpha} < 1, we present a precise definition of an {\alpha}-maximal clique in an uncertain graph. We present matching upper and lower bounds on the number of {\alpha}-maximal cliques possible within an uncertain graph. We present an algorithm to enumerate {\alpha}-maximal cliques in an uncertain graph whose worst-case runtime is near-optimal, and an experimental evaluation showing the practical utility of the algorithm.Comment: ICDE 201

    Statistical data mining for symbol associations in genomic databases

    Full text link
    A methodology is proposed to automatically detect significant symbol associations in genomic databases. A new statistical test is proposed to assess the significance of a group of symbols when found in several genesets of a given database. Applied to symbol pairs, the thresholded p-values of the test define a graph structure on the set of symbols. The cliques of that graph are significant symbol associations, linked to a set of genesets where they can be found. The method can be applied to any database, and is illustrated MSigDB C2 database. Many of the symbol associations detected in C2 or in non-specific selections did correspond to already known interactions. On more specific selections of C2, many previously unkown symbol associations have been detected. These associations unveal new candidates for gene or protein interactions, needing further investigation for biological evidence

    Development and implementation of high-throughput SNP genotyping in barley

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>High density genetic maps of plants have, nearly without exception, made use of marker datasets containing missing or questionable genotype calls derived from a variety of genic and non-genic or anonymous markers, and been presented as a single linear order of genetic loci for each linkage group. The consequences of missing or erroneous data include falsely separated markers, expansion of cM distances and incorrect marker order. These imperfections are amplified in consensus maps and problematic when fine resolution is critical including comparative genome analyses and map-based cloning. Here we provide a new paradigm, a high-density consensus genetic map of barley based only on complete and error-free datasets and genic markers, represented accurately by graphs and approximately by a best-fit linear order, and supported by a readily available SNP genotyping resource.</p> <p>Results</p> <p>Approximately 22,000 SNPs were identified from barley ESTs and sequenced amplicons; 4,596 of them were tested for performance in three pilot phase Illumina GoldenGate assays. Data from three barley doubled haploid mapping populations supported the production of an initial consensus map. Over 200 germplasm selections, principally European and US breeding material, were used to estimate minor allele frequency (MAF) for each SNP. We selected 3,072 of these tested SNPs based on technical performance, map location, MAF and biological interest to fill two 1536-SNP "production" assays (BOPA1 and BOPA2), which were made available to the barley genetics community. Data were added using BOPA1 from a fourth mapping population to yield a consensus map containing 2,943 SNP loci in 975 marker bins covering a genetic distance of 1099 cM.</p> <p>Conclusion</p> <p>The unprecedented density of genic markers and marker bins enabled a high resolution comparison of the genomes of barley and rice. Low recombination in pericentric regions is evident from bins containing many more than the average number of markers, meaning that a large number of genes are recombinationally locked into the genetic centromeric regions of several barley chromosomes. Examination of US breeding germplasm illustrated the usefulness of BOPA1 and BOPA2 in that they provide excellent marker density and sensitivity for detection of minor alleles in this genetically narrow material.</p

    Popularity versus Similarity in Growing Networks

    Full text link
    Popularity is attractive -- this is the formula underlying preferential attachment, a popular explanation for the emergence of scaling in growing networks. If new connections are made preferentially to more popular nodes, then the resulting distribution of the number of connections that nodes have follows power laws observed in many real networks. Preferential attachment has been directly validated for some real networks, including the Internet. Preferential attachment can also be a consequence of different underlying processes based on node fitness, ranking, optimization, random walks, or duplication. Here we show that popularity is just one dimension of attractiveness. Another dimension is similarity. We develop a framework where new connections, instead of preferring popular nodes, optimize certain trade-offs between popularity and similarity. The framework admits a geometric interpretation, in which popularity preference emerges from local optimization. As opposed to preferential attachment, the optimization framework accurately describes large-scale evolution of technological (Internet), social (web of trust), and biological (E.coli metabolic) networks, predicting the probability of new links in them with a remarkable precision. The developed framework can thus be used for predicting new links in evolving networks, and provides a different perspective on preferential attachment as an emergent phenomenon

    Inference of the genetic network regulating lateral root initiation in Arabidopsis thaliana

    Get PDF
    Regulation of gene expression is crucial for organism growth, and it is one of the challenges in Systems Biology to reconstruct the underlying regulatory biological networks from transcriptomic data. The formation of lateral roots in Arabidopsis thaliana is stimulated by a cascade of regulators of which only the interactions of its initial elements have been identified. Using simulated gene expression data with known network topology, we compare the performance of inference algorithms, based on different approaches, for which ready-to-use software is available. We show that their performance improves with the network size and the inclusion of mutants. We then analyse two sets of genes, whose activity is likely to be relevant to lateral root initiation in Arabidopsis, by integrating sequence analysis with the intersection of the results of the best performing methods on time series and mutants to infer their regulatory network. The methods applied capture known interactions between genes that are candidate regulators at early stages of development. The network inferred from genes significantly expressed during lateral root formation exhibits distinct scale-free, small world and hierarchical properties and the nodes with a high out-degree may warrant further investigation

    Interval graph limits

    Full text link
    We work out the graph limit theory for dense interval graphs. The theory developed departs from the usual description of a graph limit as a symmetric function W(x,y)W(x,y) on the unit square, with xx and yy uniform on the interval (0,1)(0,1). Instead, we fix a WW and change the underlying distribution of the coordinates xx and yy. We find choices such that our limits are continuous. Connections to random interval graphs are given, including some examples. We also show a continuity result for the chromatic number and clique number of interval graphs. Some results on uniqueness of the limit description are given for general graph limits.Comment: 28 pages, 4 figure

    Methods and tools to improve performance of plant genome analysis

    Get PDF
    Multi -omics data analysis and integration facilitates hypothesis building toward an understanding of genes and pathway responses driven by environments. Methods designed to estimate and analyze gene expression, with regard to treatments or conditions, can be leveraged to understand gene-level responses in the cell. However, genes often interact and signal within larger structures such as pathways and networks. Complex studies guided toward describing dynamic genetic pathways and networks require algorithms or methods designed for inference based on gene interactions and related topologies. Classes of algorithms and methods may be integrated into generalized workflows for comparative genomics studies, as multi -omics data can be standardized between contact points in various software applications. Further, network inference or network comparison algorithmic designs may involve interchangeable operations given the structure of their implementations. Network comparison and inference methods can also guide transfer-of-knowledge between model organisms and those with less knowledge base
    corecore