41,028 research outputs found
Exponential Space Improvement for minwise Based Algorithms
In this paper we introduce a general framework that exponentially improves the space, the degree of independence, and the time needed by min-wise based algorithms. The authors, in SODA 2011, we introduced an exponential time improvement for min-wise based algorithms by defining and constructing an almost k-min-wise independent family of hash functions. Here we develop an alternative approach that achieves both exponential time and exponential space improvement. The new approach relaxes the need for approximately min-wise hash functions, hence gets around the Omega(log(1/epsilon)) independence lower bound in [Patrascu 2010]. This is done by defining and constructing a d-k-min-wise independent family of hash functions. Surprisingly, for most cases only 8-wise independence is needed for the additional improvement. Moreover, as the degree of independence is a small constant, our function can be implemented efficiently.
Informally, under this definition, all subsets of size d of any fixed set X have an equal probability to have hash values among the minimal k values in X, where the probability is over the random choice of hash function from the family. This property measures the randomness of the family, as choosing a truly random function, obviously, satisfies the definition for d=k=|X|. We define and give an efficient time and space construction of approximately d-k-min-wise independent family of hash functions for the case where d=2, as this is sufficient for the additional exponential improvement.
We discuss how this construction can be used to improve many min-wise based algorithms. To our knowledge such definitions, for hash functions, were never studied and no construction was given before.
As an example we show how to apply it for similarity and rarity estimation over data streams. Other min-wise based algorithms, can be adjusted in the same way
Positive Semidefinite Metric Learning Using Boosting-like Algorithms
The success of many machine learning and pattern recognition methods relies
heavily upon the identification of an appropriate distance metric on the input
data. It is often beneficial to learn such a metric from the input training
data, instead of using a default one such as the Euclidean distance. In this
work, we propose a boosting-based technique, termed BoostMetric, for learning a
quadratic Mahalanobis distance metric. Learning a valid Mahalanobis distance
metric requires enforcing the constraint that the matrix parameter to the
metric remains positive definite. Semidefinite programming is often used to
enforce this constraint, but does not scale well and easy to implement.
BoostMetric is instead based on the observation that any positive semidefinite
matrix can be decomposed into a linear combination of trace-one rank-one
matrices. BoostMetric thus uses rank-one positive semidefinite matrices as weak
learners within an efficient and scalable boosting-based learning process. The
resulting methods are easy to implement, efficient, and can accommodate various
types of constraints. We extend traditional boosting algorithms in that its
weak learner is a positive semidefinite matrix with trace and rank being one
rather than a classifier or regressor. Experiments on various datasets
demonstrate that the proposed algorithms compare favorably to those
state-of-the-art methods in terms of classification accuracy and running time.Comment: 30 pages, appearing in Journal of Machine Learning Researc
Invariant Generation through Strategy Iteration in Succinctly Represented Control Flow Graphs
We consider the problem of computing numerical invariants of programs, for
instance bounds on the values of numerical program variables. More
specifically, we study the problem of performing static analysis by abstract
interpretation using template linear constraint domains. Such invariants can be
obtained by Kleene iterations that are, in order to guarantee termination,
accelerated by widening operators. In many cases, however, applying this form
of extrapolation leads to invariants that are weaker than the strongest
inductive invariant that can be expressed within the abstract domain in use.
Another well-known source of imprecision of traditional abstract interpretation
techniques stems from their use of join operators at merge nodes in the control
flow graph. The mentioned weaknesses may prevent these methods from proving
safety properties. The technique we develop in this article addresses both of
these issues: contrary to Kleene iterations accelerated by widening operators,
it is guaranteed to yield the strongest inductive invariant that can be
expressed within the template linear constraint domain in use. It also eschews
join operators by distinguishing all paths of loop-free code segments. Formally
speaking, our technique computes the least fixpoint within a given template
linear constraint domain of a transition relation that is succinctly expressed
as an existentially quantified linear real arithmetic formula. In contrast to
previously published techniques that rely on quantifier elimination, our
algorithm is proved to have optimal complexity: we prove that the decision
problem associated with our fixpoint problem is in the second level of the
polynomial-time hierarchy.Comment: 35 pages, conference version published at ESOP 2011, this version is
a CoRR version of our submission to Logical Methods in Computer Scienc
Poisson noise reduction with non-local PCA
Photon-limited imaging arises when the number of photons collected by a
sensor array is small relative to the number of detector elements. Photon
limitations are an important concern for many applications such as spectral
imaging, night vision, nuclear medicine, and astronomy. Typically a Poisson
distribution is used to model these observations, and the inherent
heteroscedasticity of the data combined with standard noise removal methods
yields significant artifacts. This paper introduces a novel denoising algorithm
for photon-limited images which combines elements of dictionary learning and
sparse patch-based representations of images. The method employs both an
adaptation of Principal Component Analysis (PCA) for Poisson noise and recently
developed sparsity-regularized convex optimization algorithms for
photon-limited images. A comprehensive empirical evaluation of the proposed
method helps characterize the performance of this approach relative to other
state-of-the-art denoising methods. The results reveal that, despite its
conceptual simplicity, Poisson PCA-based denoising appears to be highly
competitive in very low light regimes.Comment: erratum: Image man is wrongly name pepper in the journal versio
Non-oblivious Strategy Improvement
We study strategy improvement algorithms for mean-payoff and parity games. We
describe a structural property of these games, and we show that these
structures can affect the behaviour of strategy improvement. We show how
awareness of these structures can be used to accelerate strategy improvement
algorithms. We call our algorithms non-oblivious because they remember
properties of the game that they have discovered in previous iterations. We
show that non-oblivious strategy improvement algorithms perform well on
examples that are known to be hard for oblivious strategy improvement. Hence,
we argue that previous strategy improvement algorithms fail because they ignore
the structural properties of the game that they are solving
Efficient Algorithms for Privately Releasing Marginals via Convex Relaxations
Consider a database of people, each represented by a bit-string of length
corresponding to the setting of binary attributes. A -way marginal
query is specified by a subset of attributes, and a -dimensional
binary vector specifying their values. The result for this query is a
count of the number of people in the database whose attribute vector restricted
to agrees with .
Privately releasing approximate answers to a set of -way marginal queries
is one of the most important and well-motivated problems in differential
privacy. Information theoretically, the error complexity of marginal queries is
well-understood: the per-query additive error is known to be at least
and at most
. However, no polynomial
time algorithm with error complexity as low as the information theoretic upper
bound is known for small . In this work we present a polynomial time
algorithm that, for any distribution on marginal queries, achieves average
error at most . This error
bound is as good as the best known information theoretic upper bounds for
. This bound is an improvement over previous work on efficiently releasing
marginals when is small and when error is desirable. Using private
boosting we are also able to give nearly matching worst-case error bounds.
Our algorithms are based on the geometric techniques of Nikolov, Talwar, and
Zhang. The main new ingredients are convex relaxations and careful use of the
Frank-Wolfe algorithm for constrained convex minimization. To design our
relaxations, we rely on the Grothendieck inequality from functional analysis
Social-Aware Forwarding Improves Routing Performance in Pocket Switched Networks
Several social-aware forwarding strategies have been recently introduced in
opportunistic networks, and proved effective in considerably in- creasing
routing performance through extensive simulation studies based on real-world
data. However, this performance improvement comes at the expense of storing a
considerable amount of state information (e.g, history of past encounters) at
the nodes. Hence, whether the benefits on routing performance comes directly
from the social-aware forwarding mechanism, or indirectly by the fact state
information is exploited is not clear. Thus, the question of whether
social-aware forwarding by itself is effective in improving opportunistic
network routing performance remained unaddressed so far. In this paper, we give
a first, positive answer to the above question, by investigating the expected
message delivery time as the size of the net- work grows larger
- …