82,985 research outputs found
Uniform Random Sampling of Traces in Very Large Models
This paper presents some first results on how to perform uniform random walks
(where every trace has the same probability to occur) in very large models. The
models considered here are described in a succinct way as a set of
communicating reactive modules. The method relies upon techniques for counting
and drawing uniformly at random words in regular languages. Each module is
considered as an automaton defining such a language. It is shown how it is
possible to combine local uniform drawings of traces, and to obtain some global
uniform random sampling, without construction of the global model
Estimating Graphlet Statistics via Lifting
Exploratory analysis over network data is often limited by the ability to
efficiently calculate graph statistics, which can provide a model-free
understanding of the macroscopic properties of a network. We introduce a
framework for estimating the graphlet count---the number of occurrences of a
small subgraph motif (e.g. a wedge or a triangle) in the network. For massive
graphs, where accessing the whole graph is not possible, the only viable
algorithms are those that make a limited number of vertex neighborhood queries.
We introduce a Monte Carlo sampling technique for graphlet counts, called {\em
Lifting}, which can simultaneously sample all graphlets of size up to
vertices for arbitrary . This is the first graphlet sampling method that can
provably sample every graphlet with positive probability and can sample
graphlets of arbitrary size . We outline variants of lifted graphlet counts,
including the ordered, unordered, and shotgun estimators, random walk starts,
and parallel vertex starts. We prove that our graphlet count updates are
unbiased for the true graphlet count and have a controlled variance for all
graphlets. We compare the experimental performance of lifted graphlet counts to
the state-of-the art graphlet sampling procedures: Waddling and the pairwise
subgraph random walk
Discriminative Link Prediction using Local Links, Node Features and Community Structure
A link prediction (LP) algorithm is given a graph, and has to rank, for each
node, other nodes that are candidates for new linkage. LP is strongly motivated
by social search and recommendation applications. LP techniques often focus on
global properties (graph conductance, hitting or commute times, Katz score) or
local properties (Adamic-Adar and many variations, or node feature vectors),
but rarely combine these signals. Furthermore, neither of these extremes
exploit link densities at the intermediate level of communities. In this paper
we describe a discriminative LP algorithm that exploits two new signals. First,
a co-clustering algorithm provides community level link density estimates,
which are used to qualify observed links with a surprise value. Second, links
in the immediate neighborhood of the link to be predicted are not interpreted
at face value, but through a local model of node feature similarities. These
signals are combined into a discriminative link predictor. We evaluate the new
predictor using five diverse data sets that are standard in the literature. We
report on significant accuracy boosts compared to standard LP methods
(including Adamic-Adar and random walk). Apart from the new predictor, another
contribution is a rigorous protocol for benchmarking and reporting LP
algorithms, which reveals the regions of strengths and weaknesses of all the
predictors studied here, and establishes the new proposal as the most robust.Comment: 10 pages, 5 figure
The Computational Complexity of Estimating Convergence Time
An important problem in the implementation of Markov Chain Monte Carlo
algorithms is to determine the convergence time, or the number of iterations
before the chain is close to stationarity. For many Markov chains used in
practice this time is not known. Even in cases where the convergence time is
known to be polynomial, the theoretical bounds are often too crude to be
practical. Thus, practitioners like to carry out some form of statistical
analysis in order to assess convergence. This has led to the development of a
number of methods known as convergence diagnostics which attempt to diagnose
whether the Markov chain is far from stationarity. We study the problem of
testing convergence in the following settings and prove that the problem is
hard in a computational sense: Given a Markov chain that mixes rapidly, it is
hard for Statistical Zero Knowledge (SZK-hard) to distinguish whether starting
from a given state, the chain is close to stationarity by time t or far from
stationarity at time ct for a constant c. We show the problem is in AM
intersect coAM. Second, given a Markov chain that mixes rapidly it is coNP-hard
to distinguish whether it is close to stationarity by time t or far from
stationarity at time ct for a constant c. The problem is in coAM. Finally, it
is PSPACE-complete to distinguish whether the Markov chain is close to
stationarity by time t or far from being mixed at time ct for c at least 1
Broad Histogram Method for Continuous Systems: the XY-Model
We propose a way of implementing the Broad Histogram Monte Carlo method to
systems with continuous degrees of freedom, and we apply these ideas to
investigate the three-dimensional XY-model with periodic boundary conditions.
We have found an excellent agreement between our method and traditional
Metropolis results for the energy, the magnetization, the specific heat and the
magnetic susceptibility on a very large temperature range. For the calculation
of these quantities in the temperature range 0.7<T<4.7 our method took less CPU
time than the Metropolis simulations for 16 temperature points in that
temperature range. Furthermore, it calculates the whole temperature range
1.2<T<4.7 using only 2.2 times more computer effort than the Histogram Monte
Carlo method for the range 2.1<T<2.2. Our way of treatment is general, it can
also be applied to other systems with continuous degrees of freedom.Comment: 23 pages, 10 Postscript figures, to be published in Int. J. Mod.
Phys.
Cost-efficient vaccination protocols for network epidemiology
We investigate methods to vaccinate contact networks -- i.e. removing nodes
in such a way that disease spreading is hindered as much as possible -- with
respect to their cost-efficiency. Any real implementation of such protocols
would come with costs related both to the vaccination itself, and gathering of
information about the network. Disregarding this, we argue, would lead to
erroneous evaluation of vaccination protocols. We use the
susceptible-infected-recovered model -- the generic model for diseases making
patients immune upon recovery -- as our disease-spreading scenario, and analyze
outbreaks on both empirical and model networks. For different relative costs,
different protocols dominate. For high vaccination costs and low costs of
gathering information, the so-called acquaintance vaccination is the most cost
efficient. For other parameter values, protocols designed for query-efficient
identification of the network's largest degrees are most efficient
- âŠ