8,768 research outputs found
Efficient Seeds Computation Revisited
The notion of the cover is a generalization of a period of a string, and
there are linear time algorithms for finding the shortest cover. The seed is a
more complicated generalization of periodicity, it is a cover of a superstring
of a given string, and the shortest seed problem is of much higher algorithmic
difficulty. The problem is not well understood, no linear time algorithm is
known. In the paper we give linear time algorithms for some of its versions ---
computing shortest left-seed array, longest left-seed array and checking for
seeds of a given length. The algorithm for the last problem is used to compute
the seed array of a string (i.e., the shortest seeds for all the prefixes of
the string) in time. We describe also a simpler alternative algorithm
computing efficiently the shortest seeds. As a by-product we obtain an
time algorithm checking if the shortest seed has length at
least and finding the corresponding seed. We also correct some important
details missing in the previously known shortest-seed algorithm (Iliopoulos et
al., 1996).Comment: 14 pages, accepted to CPM 201
Interactive Channel Capacity Revisited
We provide the first capacity approaching coding schemes that robustly
simulate any interactive protocol over an adversarial channel that corrupts any
fraction of the transmitted symbols. Our coding schemes achieve a
communication rate of over any
adversarial channel. This can be improved to for
random, oblivious, and computationally bounded channels, or if parties have
shared randomness unknown to the channel.
Surprisingly, these rates exceed the interactive channel capacity bound
which [Kol and Raz; STOC'13] recently proved for random errors. We conjecture
and to be the optimal rates for their respective settings
and therefore to capture the interactive channel capacity for random and
adversarial errors.
In addition to being very communication efficient, our randomized coding
schemes have multiple other advantages. They are computationally efficient,
extremely natural, and significantly simpler than prior (non-capacity
approaching) schemes. In particular, our protocols do not employ any coding but
allow the original protocol to be performed as-is, interspersed only by short
exchanges of hash values. When hash values do not match, the parties backtrack.
Our approach is, as we feel, by far the simplest and most natural explanation
for why and how robust interactive communication in a noisy environment is
possible
Low-shot learning with large-scale diffusion
This paper considers the problem of inferring image labels from images when
only a few annotated examples are available at training time. This setup is
often referred to as low-shot learning, where a standard approach is to
re-train the last few layers of a convolutional neural network learned on
separate classes for which training examples are abundant. We consider a
semi-supervised setting based on a large collection of images to support label
propagation. This is possible by leveraging the recent advances on large-scale
similarity graph construction.
We show that despite its conceptual simplicity, scaling label propagation up
to hundred millions of images leads to state of the art accuracy in the
low-shot learning regime
Recursive Online Enumeration of All Minimal Unsatisfiable Subsets
In various areas of computer science, we deal with a set of constraints to be
satisfied. If the constraints cannot be satisfied simultaneously, it is
desirable to identify the core problems among them. Such cores are called
minimal unsatisfiable subsets (MUSes). The more MUSes are identified, the more
information about the conflicts among the constraints is obtained. However, a
full enumeration of all MUSes is in general intractable due to the large number
(even exponential) of possible conflicts. Moreover, to identify MUSes
algorithms must test sets of constraints for their simultaneous satisfiabilty.
The type of the test depends on the application domains. The complexity of
tests can be extremely high especially for domains like temporal logics, model
checking, or SMT. In this paper, we propose a recursive algorithm that
identifies MUSes in an online manner (i.e., one by one) and can be terminated
at any time. The key feature of our algorithm is that it minimizes the number
of satisfiability tests and thus speeds up the computation. The algorithm is
applicable to an arbitrary constraint domain and its effectiveness demonstrates
itself especially in domains with expensive satisfiability checks. We benchmark
our algorithm against state of the art algorithm on Boolean and SMT constraint
domains and demonstrate that our algorithm really requires less satisfiability
tests and consequently finds more MUSes in given time limits
Reuse It Or Lose It: More Efficient Secure Computation Through Reuse of Encrypted Values
Two-party secure function evaluation (SFE) has become significantly more
feasible, even on resource-constrained devices, because of advances in
server-aided computation systems. However, there are still bottlenecks,
particularly in the input validation stage of a computation. Moreover, SFE
research has not yet devoted sufficient attention to the important problem of
retaining state after a computation has been performed so that expensive
processing does not have to be repeated if a similar computation is done again.
This paper presents PartialGC, an SFE system that allows the reuse of encrypted
values generated during a garbled-circuit computation. We show that using
PartialGC can reduce computation time by as much as 96% and bandwidth by as
much as 98% in comparison with previous outsourcing schemes for secure
computation. We demonstrate the feasibility of our approach with two sets of
experiments, one in which the garbled circuit is evaluated on a mobile device
and one in which it is evaluated on a server. We also use PartialGC to build a
privacy-preserving "friend finder" application for Android. The reuse of
previous inputs to allow stateful evaluation represents a new way of looking at
SFE and further reduces computational barriers.Comment: 20 pages, shorter conference version published in Proceedings of the
2014 ACM SIGSAC Conference on Computer and Communications Security, Pages
582-596, ACM New York, NY, US
Pseudorandom number generators revisited
Statistical Methods;mathematische statistiek
Sketch-based Influence Maximization and Computation: Scaling up with Guarantees
Propagation of contagion through networks is a fundamental process. It is
used to model the spread of information, influence, or a viral infection.
Diffusion patterns can be specified by a probabilistic model, such as
Independent Cascade (IC), or captured by a set of representative traces.
Basic computational problems in the study of diffusion are influence queries
(determining the potency of a specified seed set of nodes) and Influence
Maximization (identifying the most influential seed set of a given size).
Answering each influence query involves many edge traversals, and does not
scale when there are many queries on very large graphs. The gold standard for
Influence Maximization is the greedy algorithm, which iteratively adds to the
seed set a node maximizing the marginal gain in influence. Greedy has a
guaranteed approximation ratio of at least (1-1/e) and actually produces a
sequence of nodes, with each prefix having approximation guarantee with respect
to the same-size optimum. Since Greedy does not scale well beyond a few million
edges, for larger inputs one must currently use either heuristics or
alternative algorithms designed for a pre-specified small seed set size.
We develop a novel sketch-based design for influence computation. Our greedy
Sketch-based Influence Maximization (SKIM) algorithm scales to graphs with
billions of edges, with one to two orders of magnitude speedup over the best
greedy methods. It still has a guaranteed approximation ratio, and in practice
its quality nearly matches that of exact greedy. We also present influence
oracles, which use linear-time preprocessing to generate a small sketch for
each node, allowing the influence of any seed set to be quickly answered from
the sketches of its nodes.Comment: 10 pages, 5 figures. Appeared at the 23rd Conference on Information
and Knowledge Management (CIKM 2014) in Shanghai, Chin
Optimizing spread dynamics on graphs by message passing
Cascade processes are responsible for many important phenomena in natural and
social sciences. Simple models of irreversible dynamics on graphs, in which
nodes activate depending on the state of their neighbors, have been
successfully applied to describe cascades in a large variety of contexts. Over
the last decades, many efforts have been devoted to understand the typical
behaviour of the cascades arising from initial conditions extracted at random
from some given ensemble. However, the problem of optimizing the trajectory of
the system, i.e. of identifying appropriate initial conditions to maximize (or
minimize) the final number of active nodes, is still considered to be
practically intractable, with the only exception of models that satisfy a sort
of diminishing returns property called submodularity. Submodular models can be
approximately solved by means of greedy strategies, but by definition they lack
cooperative characteristics which are fundamental in many real systems. Here we
introduce an efficient algorithm based on statistical physics for the
optimization of trajectories in cascade processes on graphs. We show that for a
wide class of irreversible dynamics, even in the absence of submodularity, the
spread optimization problem can be solved efficiently on large networks.
Analytic and algorithmic results on random graphs are complemented by the
solution of the spread maximization problem on a real-world network (the
Epinions consumer reviews network).Comment: Replacement for "The Spread Optimization Problem
- …