46 research outputs found
Combinatorial Miller-Hagberg Algorithm for Randomization of Dense Networks
We propose a slightly revised Miller-Hagberg (MH) algorithm that efficiently
generates a random network from a given expected degree sequence. The revision
was to replace the approximated edge probability between a pair of nodes with a
combinatorically calculated edge probability that better captures the
likelihood of edge presence especially where edges are dense. The computational
complexity of this combinatorial MH algorithm is still in the same order as the
original one. We evaluated the proposed algorithm through several numerical
experiments. The results demonstrated that the proposed algorithm was
particularly good at accurately representing high-degree nodes in dense,
heterogeneous networks. This algorithm may be a useful alternative of other
more established network randomization methods, given that the data are
increasingly becoming larger and denser in today's network science research.Comment: 8 pages, 3 figures; to appear in the Proceedings of CompleNet 2018,
in pres
On the impossibility of constructing good population mean estimators in a realistic Respondent Driven Sampling model
Current methods for population mean estimation from data collected by
Respondent Driven Sampling (RDS) are based on the Horvitz-Thompson estimator
together with a set of assumptions on the sampling model under which the
inclusion probabilities can be determined from the information contained in the
data. In this paper, we argue that such set of assumptions are too simplistic
to be realistic and that under realistic sampling models, the situation is far
more complicated. Specifically, we study a realistic RDS sampling model that is
motivated by a real world RDS dataset. We show that, for this model, the
inclusion probabilities, which are necessary for the application of the
Horvitz-Thompson estimator, can not be determined by the information in the
sample alone. An implication is that, unless additional information about the
underlying population network is obtained, it is hopeless to conceive of a
general theory of population mean estimation from current RDS data.Comment: 13 pages, 2 figure
Information content of coevolutionary game landscapes
Coevolutionary game dynamics is the result of players that may change their
strategies and their network of interaction. For such games, and based on
interpreting strategies as configurations, strategy-to-payoff maps can be
defined for every interaction network, which opens up to derive game
landscapes. This paper presents an analysis of these game landscapes by their
information content. By this analysis, we particularly study the effect of a
rescaled payoff matrix generalizing social dilemmas and differences between
well-mixed and structured populations
Negative Examples for Sequential Importance Sampling of Binary Contingency Tables
The sequential importance sampling (SIS) algorithm has gained considerable
popularity for its empirical success. One of its noted applications is to the
binary contingency tables problem, an important problem in statistics, where
the goal is to estimate the number of 0/1 matrices with prescribed row and
column sums. We give a family of examples in which the SIS procedure, if run
for any subexponential number of trials, will underestimate the number of
tables by an exponential factor. This result holds for any of the usual design
choices in the SIS algorithm, namely the ordering of the columns and rows.
These are apparently the first theoretical results on the efficiency of the SIS
algorithm for binary contingency tables. Finally, we present experimental
evidence that the SIS algorithm is efficient for row and column sums that are
regular. Our work is a first step in determining the class of inputs for which
SIS is effective
A Parallel Algorithm for Generating a Random Graph with a Prescribed Degree Sequence
Random graphs (or networks) have gained a significant increase of interest
due to its popularity in modeling and simulating many complex real-world
systems. Degree sequence is one of the most important aspects of these systems.
Random graphs with a given degree sequence can capture many characteristics
like dependent edges and non-binomial degree distribution that are absent in
many classical random graph models such as the Erd\H{o}s-R\'{e}nyi graph model.
In addition, they have important applications in the uniform sampling of random
graphs, counting the number of graphs having the same degree sequence, as well
as in string theory, random matrix theory, and matching theory. In this paper,
we present an OpenMP-based shared-memory parallel algorithm for generating a
random graph with a prescribed degree sequence, which achieves a speedup of
20.5 with 32 cores. One of the steps in our parallel algorithm requires
checking the Erd\H{o}s-Gallai characterization, i.e., whether there exists a
graph obeying the given degree sequence, in parallel. This paper presents the
first non-trivial parallel algorithm for checking the Erd\H{o}s-Gallai
characterization, which achieves a speedup of 23 using 32 cores.Comment: 10 page
A Dynamic Programming Approach for Approximate Uniform Generation of Binary Matrices with Specified Margins
Consider the collection of all binary matrices having a specific sequence of
row and column sums and consider sampling binary matrices uniformly from this
collection. Practical algorithms for exact uniform sampling are not known, but
there are practical algorithms for approximate uniform sampling. Here it is
shown how dynamic programming and recent asymptotic enumeration results can be
used to simplify and improve a certain class of approximate uniform samplers.
The dynamic programming perspective suggests interesting generalizations.Comment: 27 pages, minor typographic corrections from previous version,
superseded by arXiv:1301.392
Expand and Contract: Sampling graphs with given degrees and other combinatorial families
Sampling from combinatorial families can be difficult. However, complicated
families can often be embedded within larger, simpler ones, for which easy
sampling algorithms are known. We take advantage of such a relationship to
describe a sampling algorithm for the smaller family, via a Markov chain
started at a random sample of the larger family. The utility of the method is
demonstrated via several examples, with particular emphasis on sampling
labelled graphs with given degree sequence, a well-studied problem for which
existing algorithms leave much room for improvement. For graphs with given
degrees, with maximum degree where is the number of edges, we
obtain an asymptotically uniform sample in steps, which substantially
improves upon existing algorithms
Fast uniform generation of random graphs with given degree sequences
In this paper we provide an algorithm that generates a graph with given
degree sequence uniformly at random. Provided that , where
is the maximal degree and is the number of edges,the algorithm
runs in expected time . Our algorithm significantly improves the
previously most efficient uniform sampler, which runs in expected time
for the same family of degree sequences. Our method uses a
novel ingredient which progressively relaxes restrictions on an object being
generated uniformly at random, and we use this to give fast algorithms for
uniform sampling of graphs with other degree sequences as well. Using the same
method, we also obtain algorithms with expected run time which is (i) linear
for power-law degree sequences in cases where the previous best was
, and (ii) for -regular graphs when , where the previous best was
Efficient importance sampling for binary contingency tables
Importance sampling has been reported to produce algorithms with excellent
empirical performance in counting problems. However, the theoretical support
for its efficiency in these applications has been very limited. In this paper,
we propose a methodology that can be used to design efficient importance
sampling algorithms for counting and test their efficiency rigorously. We apply
our techniques after transforming the problem into a rare-event simulation
problem--thereby connecting complexity analysis of counting problems with
efficiency in the context of rare-event simulation. As an illustration of our
approach, we consider the problem of counting the number of binary tables with
fixed column and row sums, 's and 's, respectively, and total
marginal sums . Assuming that , and the 's are bounded, we show that a suitable importance
sampling algorithm, proposed by Chen et al. [J. Amer. Statist. Assoc. 100
(2005) 109--120], requires operations to
produce an estimate that has -relative error with probability
. In addition, if for some
, the same coverage can be guaranteed with
operations.Comment: Published in at http://dx.doi.org/10.1214/08-AAP558 the Annals of
Applied Probability (http://www.imstat.org/aap/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Characterizing Optimal Sampling of Binary Contingency Tables via the Configuration Model
A binary contingency table is an m x n array of binary entries with
prescribed row sums r=(r_1,...,r_m) and column sums c=(c_1,...,c_n). The
configuration model for uniformly sampling binary contingency tables proceeds
as follows. First, label N=\sum_{i=1}^{m} r_i tokens of type 1, arrange them in
m cells, and let the i-th cell contain r_i tokens. Next, label another set of
tokens of type 2 containing N=\sum_{j=1}^{n}c_j elements arranged in n cells,
and let the j-th cell contain c_j tokens. Finally, pair the type-1 tokens with
the type-2 tokens by generating a random permutation until the total pairing
corresponds to a binary contingency table. Generating one random permutation
takes O(N) time, which is optimal up to constant factors. A fundamental
question is whether a constant number of permutations is sufficient to obtain a
binary contingency table. In the current paper, we solve this problem by
showing a necessary and sufficient condition so that the probability that the
configuration model outputs a binary contingency table remains bounded away
from 0 as N goes to \infty. Our finding shows surprising differences from
recent results for binary symmetric contingency tables