34,051 research outputs found
On the construction of sparse matrices from expander graphs
We revisit the asymptotic analysis of probabilistic construction of adjacency
matrices of expander graphs proposed in [4]. With better bounds we derived a
new reduced sample complexity for the number of nonzeros per column of these
matrices, precisely ; as opposed to
the standard . This gives insights into
why using small performed well in numerical experiments involving such
matrices. Furthermore, we derive quantitative sampling theorems for our
constructions which show our construction outperforming the existing
state-of-the-art. We also used our results to compare performance of sparse
recovery algorithms where these matrices are used for linear sketching.Comment: 28 pages, 4 figure
Probabilistic Inductive Classes of Graphs
Models of complex networks are generally defined as graph stochastic
processes in which edges and vertices are added or deleted over time to
simulate the evolution of networks. Here, we define a unifying framework -
probabilistic inductive classes of graphs - for formalizing and studying
evolution of complex networks. Our definition of probabilistic inductive class
of graphs (PICG) extends the standard notion of inductive class of graphs (ICG)
by imposing a probability space. A PICG is given by: (1) class B of initial
graphs, the basis of PICG, (2) class R of generating rules, each with
distinguished left element to which the rule is applied to obtain the right
element, (3) probability distribution specifying how the initial graph is
chosen from class B, (4) probability distribution specifying how the rules from
class R are applied, and, finally, (5) probability distribution specifying how
the left elements for every rule in class R are chosen. We point out that many
of the existing models of growing networks can be cast as PICGs. We present how
the well known model of growing networks - the preferential attachment model -
can be studied as PICG. As an illustration we present results regarding the
size, order, and degree sequence for PICG models of connected and 2-connected
graphs.Comment: 15 pages, 6 figure
Query-Driven Sampling for Collective Entity Resolution
Probabilistic databases play a preeminent role in the processing and
management of uncertain data. Recently, many database research efforts have
integrated probabilistic models into databases to support tasks such as
information extraction and labeling. Many of these efforts are based on batch
oriented inference which inhibits a realtime workflow. One important task is
entity resolution (ER). ER is the process of determining records (mentions) in
a database that correspond to the same real-world entity. Traditional pairwise
ER methods can lead to inconsistencies and low accuracy due to localized
decisions. Leading ER systems solve this problem by collectively resolving all
records using a probabilistic graphical model and Markov chain Monte Carlo
(MCMC) inference. However, for large datasets this is an extremely expensive
process. One key observation is that, such exhaustive ER process incurs a huge
up-front cost, which is wasteful in practice because most users are interested
in only a small subset of entities. In this paper, we advocate pay-as-you-go
entity resolution by developing a number of query-driven collective ER
techniques. We introduce two classes of SQL queries that involve ER operators
--- selection-driven ER and join-driven ER. We implement novel variations of
the MCMC Metropolis Hastings algorithm to generate biased samples and
selectivity-based scheduling algorithms to support the two classes of ER
queries. Finally, we show that query-driven ER algorithms can converge and
return results within minutes over a database populated with the extraction
from a newswire dataset containing 71 million mentions
Critical random graphs: limiting constructions and distributional properties
We consider the Erdos-Renyi random graph G(n,p) inside the critical window,
where p = 1/n + lambda * n^{-4/3} for some lambda in R. We proved in a previous
paper (arXiv:0903.4730) that considering the connected components of G(n,p) as
a sequence of metric spaces with the graph distance rescaled by n^{-1/3} and
letting n go to infinity yields a non-trivial sequence of limit metric spaces C
= (C_1, C_2, ...). These limit metric spaces can be constructed from certain
random real trees with vertex-identifications. For a single such metric space,
we give here two equivalent constructions, both of which are in terms of more
standard probabilistic objects. The first is a global construction using
Dirichlet random variables and Aldous' Brownian continuum random tree. The
second is a recursive construction from an inhomogeneous Poisson point process
on R_+. These constructions allow us to characterize the distributions of the
masses and lengths in the constituent parts of a limit component when it is
decomposed according to its cycle structure. In particular, this strengthens
results of Luczak, Pittel and Wierman by providing precise distributional
convergence for the lengths of paths between kernel vertices and the length of
a shortest cycle, within any fixed limit component.Comment: 30 pages, 4 figure
Critical random graphs : limiting constructions and distributional properties
We consider the Erdos-Renyi random graph G(n, p) inside the critical window, where p = 1/n + lambda n(-4/3) for some lambda is an element of R. We proved in Addario-Berry et al. [2009+] that considering the connected components of G(n, p) as a sequence of metric spaces with the graph distance rescaled by n(-1/3) and letting n -> infinity yields a non-trivial sequence of limit metric spaces C = (C-1, C-2,...). These limit metric spaces can be constructed from certain random real trees with vertex-identifications. For a single such metric space, we give here two equivalent constructions, both of which are in terms of more standard probabilistic objects. The first is a global construction using Dirichlet random variables and Aldous' Brownian continuum random tree. The second is a recursive construction from an inhomogeneous Poisson point process on R+. These constructions allow us to characterize the distributions of the masses and lengths in the constituent parts of a limit component when it is decomposed according to its cycle structure. In particular, this strengthens results of Luczak et al. [1994] by providing precise distributional convergence for the lengths of paths between kernel vertices and the length of a shortest cycle, within any fixed limit component
Random enriched trees with applications to random graphs
We establish limit theorems that describe the asymptotic local and global
geometric behaviour of random enriched trees considered up to symmetry. We
apply these general results to random unlabelled weighted rooted graphs and
uniform random unlabelled -trees that are rooted at a -clique of
distinguishable vertices. For both models we establish a Gromov--Hausdorff
scaling limit, a Benjamini--Schramm limit, and a local weak limit that
describes the asymptotic shape near the fixed root
- âŠ