Search CORE

4,237 research outputs found

Mining Maximal Cliques from an Uncertain Graph

Author: Mukherjee Arko Provo
Tirthapura Srikanta
Xu Pan
Publication venue
Publication date: 22/10/2014
Field of study

We consider mining dense substructures (maximal cliques) from an uncertain graph, which is a probability distribution on a set of deterministic graphs. For parameter 0 < {\alpha} < 1, we present a precise definition of an {\alpha}-maximal clique in an uncertain graph. We present matching upper and lower bounds on the number of {\alpha}-maximal cliques possible within an uncertain graph. We present an algorithm to enumerate {\alpha}-maximal cliques in an uncertain graph whose worst-case runtime is near-optimal, and an experimental evaluation showing the practical utility of the algorithm.Comment: ICDE 201

arXiv.org e-Print Archive

Digital Repository @ Iowa State University (ISU)

CiteSeerX

Statistical data mining for symbol associations in genomic databases

Author: Fournié Jean-Jacques
Pont Frédéric
Ycart Bernard
Publication venue
Publication date: 10/09/2013
Field of study

A methodology is proposed to automatically detect significant symbol associations in genomic databases. A new statistical test is proposed to assess the significance of a group of symbols when found in several genesets of a given database. Applied to symbol pairs, the thresholded p-values of the test define a graph structure on the set of symbols. The cliques of that graph are significant symbol associations, linked to a set of genesets where they can be found. The method can be applied to any database, and is illustrated MSigDB C2 database. Many of the symbol associations detected in C2 or in non-specific selections did correspond to already known interactions. On more specific selections of C2, many previously unkown symbol associations have been detected. These associations unveal new candidates for gene or protein interactions, needing further investigation for biological evidence

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

Development and implementation of high-throughput SNP genotyping in barley

Author: Bhat Prasanna R
Bozdag Serdar
Chao Shiaoman
Close Timothy J
Condamine Pascal
DeYoung Joseph
Druka Arnis
Fenton Raymond D
Graner Andreas
Hayes Patrick M
Kleinhofs Andris
Lonardi Stefano
Madishetty Kavitha
Marshall David F
Matthews David E
Moscou Matthew J
Muehlbauer Gary J
Ramsay Luke
Roose Mikeal L
Rostoks Nils
Sato Kazuhiro
Stein Nils
Svensson Jan T
Szűcs Péter
Varshney Rajeev K
Wanamaker Steve
Waugh Robbie
Wu Yonghui
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background High density genetic maps of plants have, nearly without exception, made use of marker datasets containing missing or questionable genotype calls derived from a variety of genic and non-genic or anonymous markers, and been presented as a single linear order of genetic loci for each linkage group. The consequences of missing or erroneous data include falsely separated markers, expansion of cM distances and incorrect marker order. These imperfections are amplified in consensus maps and problematic when fine resolution is critical including comparative genome analyses and map-based cloning. Here we provide a new paradigm, a high-density consensus genetic map of barley based only on complete and error-free datasets and genic markers, represented accurately by graphs and approximately by a best-fit linear order, and supported by a readily available SNP genotyping resource. Results Approximately 22,000 SNPs were identified from barley ESTs and sequenced amplicons; 4,596 of them were tested for performance in three pilot phase Illumina GoldenGate assays. Data from three barley doubled haploid mapping populations supported the production of an initial consensus map. Over 200 germplasm selections, principally European and US breeding material, were used to estimate minor allele frequency (MAF) for each SNP. We selected 3,072 of these tested SNPs based on technical performance, map location, MAF and biological interest to fill two 1536-SNP "production" assays (BOPA1 and BOPA2), which were made available to the barley genetics community. Data were added using BOPA1 from a fourth mapping population to yield a consensus map containing 2,943 SNP loci in 975 marker bins covering a genetic distance of 1099 cM. Conclusion The unprecedented density of genic markers and marker bins enabled a high resolution comparison of the genomes of barley and rice. Low recombination in pericentric regions is evident from bins containing many more than the average number of markers, meaning that a large number of genes are recombinationally locked into the genetic centromeric regions of several barley chromosomes. Examination of US breeding germplasm illustrated the usefulness of BOPA1 and BOPA2 in that they provide excellent marker density and sensitivity for detection of minor alleles in this genetically narrow material.</p

ICRISAT Open Access Repository

epublications@Marquette

Directory of Open Access Journals

Okayama University Scientific Achievement Repository

Digital Repository @ Iowa State University (ISU)

Crossref

Springer - Publisher Connector

PubMed Central

Copenhagen University Research Information System

eScholarship - University of California

Research Repository

University of Dundee Online Publications

Popularity versus Similarity in Growing Networks

Author: A Clauset
A Vázquez
A-L Barabási
AE Motter
AFJ van Raan
AK Menon
B Bollobás
D Crandall
DJ Watts
Dmitri Krioukov
F Bonahon
F Menczer
F Menczer
Fragkiskos Papadopoulos
G Bianconi
G Caldarelli
H Jeong
K Börner
LA Adamic
M McPherson
M. Ángeles Serrano
Maksim Kitsak
Marián Boguñá
MEJ Newman
O Simşek
PL Krapivsky
R Pastor-Satorras
R Pastor-Satorras
RM D'Souza
S Fortunato
S Redner
SN Dorogovtsev
SN Dorogovtsev
SN Dorogovtsev
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 17/04/2013
Field of study

Popularity is attractive -- this is the formula underlying preferential attachment, a popular explanation for the emergence of scaling in growing networks. If new connections are made preferentially to more popular nodes, then the resulting distribution of the number of connections that nodes have follows power laws observed in many real networks. Preferential attachment has been directly validated for some real networks, including the Internet. Preferential attachment can also be a consequence of different underlying processes based on node fitness, ranking, optimization, random walks, or duplication. Here we show that popularity is just one dimension of attractiveness. Another dimension is similarity. We develop a framework where new connections, instead of preferring popular nodes, optimize certain trade-offs between popularity and similarity. The framework admits a geometric interpretation, in which popularity preference emerges from local optimization. As opposed to preferential attachment, the optimization framework accurately describes large-scale evolution of technological (Internet), social (web of trust), and biological (E.coli metabolic) networks, predicting the probability of new links in them with a remarkable precision. The developed framework can thus be used for predicting new links in evolving networks, and provides a different perspective on preferential attachment as an emergent phenomenon

arXiv.org e-Print Archive

Crossref

Inference of the genetic network regulating lateral root initiation in Arabidopsis thaliana

Author: Bennett M.
Byrne H. M.
de Smet I.
Hodgman C.
King J. R.
Muraro D.
Voß U.
Wilson M.
Publication venue
Publication date: 01/01/2012
Field of study

Regulation of gene expression is crucial for organism growth, and it is one of the challenges in Systems Biology to reconstruct the underlying regulatory biological networks from transcriptomic data. The formation of lateral roots in Arabidopsis thaliana is stimulated by a cascade of regulators of which only the interactions of its initial elements have been identified. Using simulated gene expression data with known network topology, we compare the performance of inference algorithms, based on different approaches, for which ready-to-use software is available. We show that their performance improves with the network size and the inclusion of mutants. We then analyse two sets of genes, whose activity is likely to be relevant to lateral root initiation in Arabidopsis, by integrating sequence analysis with the intersection of the results of the best performing methods on time series and mutants to infer their regulatory network. The methods applied capture known interactions between genes that are candidate regulators at early stages of development. The network inferred from genes significantly expressed during lateral root formation exhibits distinct scale-free, small world and hierarchical properties and the nodes with a high out-degree may warrant further investigation

Ghent University Academic Bibliography

Oxford University Research Archive

Interval graph limits

Author: Diaconis Persi
Holmes Susan
Janson Svante
Publication venue
Publication date: 01/01/2011
Field of study

We work out the graph limit theory for dense interval graphs. The theory developed departs from the usual description of a graph limit as a symmetric function

W(x,y)

on the unit square, with

x

and

y

uniform on the interval

(0,1)

. Instead, we fix a

W

and change the underlying distribution of the coordinates

x

and

y

. We find choices such that our limits are continuous. Connections to random interval graphs are given, including some examples. We also show a continuity result for the chromatic number and clique number of interval graphs. Some results on uniqueness of the limit description are given for general graph limits.Comment: 28 pages, 4 figure

arXiv.org e-Print Archive

CiteSeerX

Methods and tools to improve performance of plant genome analysis

Author: Ferrell Drew
Publication venue: Scholars Junction
Publication date: 09/08/2022
Field of study

Multi -omics data analysis and integration facilitates hypothesis building toward an understanding of genes and pathway responses driven by environments. Methods designed to estimate and analyze gene expression, with regard to treatments or conditions, can be leveraged to understand gene-level responses in the cell. However, genes often interact and signal within larger structures such as pathways and networks. Complex studies guided toward describing dynamic genetic pathways and networks require algorithms or methods designed for inference based on gene interactions and related topologies. Classes of algorithms and methods may be integrated into generalized workflows for comparative genomics studies, as multi -omics data can be standardized between contact points in various software applications. Further, network inference or network comparison algorithmic designs may involve interchangeable operations given the structure of their implementations. Network comparison and inference methods can also guide transfer-of-knowledge between model organisms and those with less knowledge base

Scholars Junction - Mississippi State University Institutional Repository