2,334 research outputs found
Estimation of the Probit Model from Anonymized Micro Data
The demand of scientists for confidential micro data from official sources has created discussion of how to anonymize these data in such a way that they can be given to the scientific community. We report results from a German project which exploits various options of anonymization for producing such ”scientific-use- files”. The main concern in the project however is whether estimation of stochastic models from these perturbed data is possible and – more importantly – leads to reliable results. In this paper we concentrate on estimation of the probit model under the assumption that only anonymized data are available. In particular we assume that the binary dependent variable has undergone post-randomization (PRAM) and that the set of explanatory variables has been perturbed by addition of noise. We employ a maximum likelihood estimator which is consistent if only the dependent variable has been anonymized by PRAM. The errors-in-variables structure of the regressors then is handled by the simulation extrapolation (SIMEX) estimation procedure where we compare performance of quadratic and nonlinear (rational) extrapolation.anonymization, misclassification, noise addition, post-randomization, SIMEX procedure, statistical disclosure.
Distributed Symmetry Breaking in Hypergraphs
Fundamental local symmetry breaking problems such as Maximal Independent Set
(MIS) and coloring have been recognized as important by the community, and
studied extensively in (standard) graphs. In particular, fast (i.e.,
logarithmic run time) randomized algorithms are well-established for MIS and
-coloring in both the LOCAL and CONGEST distributed computing
models. On the other hand, comparatively much less is known on the complexity
of distributed symmetry breaking in {\em hypergraphs}. In particular, a key
question is whether a fast (randomized) algorithm for MIS exists for
hypergraphs.
In this paper, we study the distributed complexity of symmetry breaking in
hypergraphs by presenting distributed randomized algorithms for a variety of
fundamental problems under a natural distributed computing model for
hypergraphs. We first show that MIS in hypergraphs (of arbitrary dimension) can
be solved in rounds ( is the number of nodes of the
hypergraph) in the LOCAL model. We then present a key result of this paper ---
an -round hypergraph MIS algorithm in
the CONGEST model where is the maximum node degree of the hypergraph
and is any arbitrarily small constant.
To demonstrate the usefulness of hypergraph MIS, we present applications of
our hypergraph algorithm to solving problems in (standard) graphs. In
particular, the hypergraph MIS yields fast distributed algorithms for the {\em
balanced minimal dominating set} problem (left open in Harris et al. [ICALP
2013]) and the {\em minimal connected dominating set problem}. We also present
distributed algorithms for coloring, maximal matching, and maximal clique in
hypergraphs.Comment: Changes from the previous version: More references adde
A Faster Distributed Single-Source Shortest Paths Algorithm
We devise new algorithms for the single-source shortest paths (SSSP) problem
with non-negative edge weights in the CONGEST model of distributed computing.
While close-to-optimal solutions, in terms of the number of rounds spent by the
algorithm, have recently been developed for computing SSSP approximately, the
fastest known exact algorithms are still far away from matching the lower bound
of rounds by Peleg and Rubinovich [SIAM
Journal on Computing 2000], where is the number of nodes in the network
and is its diameter. The state of the art is Elkin's randomized algorithm
[STOC 2017] that performs rounds. We
significantly improve upon this upper bound with our two new randomized
algorithms for polynomially bounded integer edge weights, the first performing
rounds and the second performing rounds. Our bounds also compare favorably to the
independent result by Ghaffari and Li [STOC 2018]. As side results, we obtain a
-approximation -round algorithm for directed SSSP and a new work/depth trade-off for exact
SSSP on directed graphs in the PRAM model.Comment: Presented at the the 59th Annual IEEE Symposium on Foundations of
Computer Science (FOCS 2018
Parallel algorithms and concentration bounds for the Lovasz Local Lemma via witness DAGs
The Lov\'{a}sz Local Lemma (LLL) is a cornerstone principle in the
probabilistic method of combinatorics, and a seminal algorithm of Moser &
Tardos (2010) provides an efficient randomized algorithm to implement it. This
can be parallelized to give an algorithm that uses polynomially many processors
and runs in time on an EREW PRAM, stemming from
adaptive computations of a maximal independent set (MIS). Chung et al. (2014)
developed faster local and parallel algorithms, potentially running in time
, but these algorithms require more stringent conditions than the
LLL.
We give a new parallel algorithm that works under essentially the same
conditions as the original algorithm of Moser & Tardos but uses only a single
MIS computation, thus running in time on an EREW PRAM. This can
be derandomized to give an NC algorithm running in time as well,
speeding up a previous NC LLL algorithm of Chandrasekaran et al. (2013).
We also provide improved and tighter bounds on the run-times of the
sequential and parallel resampling-based algorithms originally developed by
Moser & Tardos. These apply to any problem instance in which the tighter
Shearer LLL criterion is satisfied
Massively Parallel Algorithms for Distance Approximation and Spanners
Over the past decade, there has been increasing interest in
distributed/parallel algorithms for processing large-scale graphs. By now, we
have quite fast algorithms -- usually sublogarithmic-time and often
-time, or even faster -- for a number of fundamental graph
problems in the massively parallel computation (MPC) model. This model is a
widely-adopted theoretical abstraction of MapReduce style settings, where a
number of machines communicate in an all-to-all manner to process large-scale
data. Contributing to this line of work on MPC graph algorithms, we present
round MPC algorithms for computing
-spanners in the strongly sublinear regime of local memory. To
the best of our knowledge, these are the first sublogarithmic-time MPC
algorithms for spanner construction. As primary applications of our spanners,
we get two important implications, as follows:
-For the MPC setting, we get an -round algorithm for
approximation of all pairs shortest paths (APSP) in the
near-linear regime of local memory. To the best of our knowledge, this is the
first sublogarithmic-time MPC algorithm for distance approximations.
-Our result above also extends to the Congested Clique model of distributed
computing, with the same round complexity and approximation guarantee. This
gives the first sub-logarithmic algorithm for approximating APSP in weighted
graphs in the Congested Clique model
Round Compression for Parallel Matching Algorithms
For over a decade now we have been witnessing the success of {\em massive
parallel computation} (MPC) frameworks, such as MapReduce, Hadoop, Dryad, or
Spark. One of the reasons for their success is the fact that these frameworks
are able to accurately capture the nature of large-scale computation. In
particular, compared to the classic distributed algorithms or PRAM models,
these frameworks allow for much more local computation. The fundamental
question that arises in this context is though: can we leverage this additional
power to obtain even faster parallel algorithms?
A prominent example here is the {\em maximum matching} problem---one of the
most classic graph problems. It is well known that in the PRAM model one can
compute a 2-approximate maximum matching in rounds. However, the
exact complexity of this problem in the MPC framework is still far from
understood. Lattanzi et al. showed that if each machine has
memory, this problem can also be solved -approximately in a constant number
of rounds. These techniques, as well as the approaches developed in the follow
up work, seem though to get stuck in a fundamental way at roughly
rounds once we enter the near-linear memory regime. It is thus entirely
possible that in this regime, which captures in particular the case of sparse
graph computations, the best MPC round complexity matches what one can already
get in the PRAM model, without the need to take advantage of the extra local
computation power.
In this paper, we finally refute that perplexing possibility. That is, we
break the above round complexity bound even in the case of {\em
slightly sublinear} memory per machine. In fact, our improvement here is {\em
almost exponential}: we are able to deliver a -approximation to
maximum matching, for any fixed constant , in
rounds
- …