4,237 research outputs found
Mining Maximal Cliques from an Uncertain Graph
We consider mining dense substructures (maximal cliques) from an uncertain
graph, which is a probability distribution on a set of deterministic graphs.
For parameter 0 < {\alpha} < 1, we present a precise definition of an
{\alpha}-maximal clique in an uncertain graph. We present matching upper and
lower bounds on the number of {\alpha}-maximal cliques possible within an
uncertain graph. We present an algorithm to enumerate {\alpha}-maximal cliques
in an uncertain graph whose worst-case runtime is near-optimal, and an
experimental evaluation showing the practical utility of the algorithm.Comment: ICDE 201
Statistical data mining for symbol associations in genomic databases
A methodology is proposed to automatically detect significant symbol
associations in genomic databases. A new statistical test is proposed to assess
the significance of a group of symbols when found in several genesets of a
given database. Applied to symbol pairs, the thresholded p-values of the test
define a graph structure on the set of symbols. The cliques of that graph are
significant symbol associations, linked to a set of genesets where they can be
found. The method can be applied to any database, and is illustrated MSigDB C2
database. Many of the symbol associations detected in C2 or in non-specific
selections did correspond to already known interactions. On more specific
selections of C2, many previously unkown symbol associations have been
detected. These associations unveal new candidates for gene or protein
interactions, needing further investigation for biological evidence
Development and implementation of high-throughput SNP genotyping in barley
<p>Abstract</p> <p>Background</p> <p>High density genetic maps of plants have, nearly without exception, made use of marker datasets containing missing or questionable genotype calls derived from a variety of genic and non-genic or anonymous markers, and been presented as a single linear order of genetic loci for each linkage group. The consequences of missing or erroneous data include falsely separated markers, expansion of cM distances and incorrect marker order. These imperfections are amplified in consensus maps and problematic when fine resolution is critical including comparative genome analyses and map-based cloning. Here we provide a new paradigm, a high-density consensus genetic map of barley based only on complete and error-free datasets and genic markers, represented accurately by graphs and approximately by a best-fit linear order, and supported by a readily available SNP genotyping resource.</p> <p>Results</p> <p>Approximately 22,000 SNPs were identified from barley ESTs and sequenced amplicons; 4,596 of them were tested for performance in three pilot phase Illumina GoldenGate assays. Data from three barley doubled haploid mapping populations supported the production of an initial consensus map. Over 200 germplasm selections, principally European and US breeding material, were used to estimate minor allele frequency (MAF) for each SNP. We selected 3,072 of these tested SNPs based on technical performance, map location, MAF and biological interest to fill two 1536-SNP "production" assays (BOPA1 and BOPA2), which were made available to the barley genetics community. Data were added using BOPA1 from a fourth mapping population to yield a consensus map containing 2,943 SNP loci in 975 marker bins covering a genetic distance of 1099 cM.</p> <p>Conclusion</p> <p>The unprecedented density of genic markers and marker bins enabled a high resolution comparison of the genomes of barley and rice. Low recombination in pericentric regions is evident from bins containing many more than the average number of markers, meaning that a large number of genes are recombinationally locked into the genetic centromeric regions of several barley chromosomes. Examination of US breeding germplasm illustrated the usefulness of BOPA1 and BOPA2 in that they provide excellent marker density and sensitivity for detection of minor alleles in this genetically narrow material.</p
Popularity versus Similarity in Growing Networks
Popularity is attractive -- this is the formula underlying preferential
attachment, a popular explanation for the emergence of scaling in growing
networks. If new connections are made preferentially to more popular nodes,
then the resulting distribution of the number of connections that nodes have
follows power laws observed in many real networks. Preferential attachment has
been directly validated for some real networks, including the Internet.
Preferential attachment can also be a consequence of different underlying
processes based on node fitness, ranking, optimization, random walks, or
duplication. Here we show that popularity is just one dimension of
attractiveness. Another dimension is similarity. We develop a framework where
new connections, instead of preferring popular nodes, optimize certain
trade-offs between popularity and similarity. The framework admits a geometric
interpretation, in which popularity preference emerges from local optimization.
As opposed to preferential attachment, the optimization framework accurately
describes large-scale evolution of technological (Internet), social (web of
trust), and biological (E.coli metabolic) networks, predicting the probability
of new links in them with a remarkable precision. The developed framework can
thus be used for predicting new links in evolving networks, and provides a
different perspective on preferential attachment as an emergent phenomenon
Inference of the genetic network regulating lateral root initiation in Arabidopsis thaliana
Regulation of gene expression is crucial for organism growth, and it is one of the challenges in Systems Biology to reconstruct the underlying regulatory biological networks from transcriptomic data. The formation of lateral roots in Arabidopsis thaliana is stimulated by a cascade of regulators of which only the interactions of its initial elements have been identified. Using simulated gene expression data with known network topology, we compare the performance of inference algorithms, based on different approaches, for which ready-to-use software is available. We show that their performance improves with the network size and the inclusion of mutants. We then analyse two sets of genes, whose activity is likely to be relevant to lateral root initiation in Arabidopsis, by integrating sequence analysis with the intersection of the results of the best performing methods on time series and mutants to infer their regulatory network. The methods applied capture known interactions between genes that are candidate regulators at early stages of development. The network inferred from genes significantly expressed during lateral root formation exhibits distinct scale-free, small world and hierarchical properties and the nodes with a high out-degree may warrant further investigation
Interval graph limits
We work out the graph limit theory for dense interval graphs. The theory
developed departs from the usual description of a graph limit as a symmetric
function on the unit square, with and uniform on the interval
. Instead, we fix a and change the underlying distribution of the
coordinates and . We find choices such that our limits are continuous.
Connections to random interval graphs are given, including some examples. We
also show a continuity result for the chromatic number and clique number of
interval graphs. Some results on uniqueness of the limit description are given
for general graph limits.Comment: 28 pages, 4 figure
Methods and tools to improve performance of plant genome analysis
Multi -omics data analysis and integration facilitates hypothesis building toward an understanding of genes and pathway responses driven by environments. Methods designed to estimate and analyze gene expression, with regard to treatments or conditions, can be leveraged to understand gene-level responses in the cell. However, genes often interact and signal within larger structures such as pathways and networks. Complex studies guided toward describing dynamic genetic pathways and networks require algorithms or methods designed for inference based on gene interactions and related topologies. Classes of algorithms and methods may be integrated into generalized workflows for comparative genomics studies, as multi -omics data can be standardized between contact points in various software applications. Further, network inference or network comparison algorithmic designs may involve interchangeable operations given the structure of their implementations. Network comparison and inference methods can also guide transfer-of-knowledge between model organisms and those with less knowledge base
- …