22,990 research outputs found
Efficient Optimally Lazy Algorithms for Minimal-Interval Semantics
Minimal-interval semantics associates with each query over a document a set
of intervals, called witnesses, that are incomparable with respect to inclusion
(i.e., they form an antichain): witnesses define the minimal regions of the
document satisfying the query. Minimal-interval semantics makes it easy to
define and compute several sophisticated proximity operators, provides snippets
for user presentation, and can be used to rank documents. In this paper we
provide algorithms for computing conjunction and disjunction that are linear in
the number of intervals and logarithmic in the number of operands; for
additional operators, such as ordered conjunction and Brouwerian difference, we
provide linear algorithms. In all cases, space is linear in the number of
operands. More importantly, we define a formal notion of optimal laziness, and
either prove it, or prove its impossibility, for each algorithm. We cast our
results in a general framework of antichains of intervals on total orders,
making our algorithms directly applicable to other domains.Comment: 24 pages, 4 figures. A preliminary (now outdated) version was
presented at SPIRE 200
Bounded regret in stochastic multi-armed bandits
We study the stochastic multi-armed bandit problem when one knows the value
of an optimal arm, as a well as a positive lower bound on the
smallest positive gap . We propose a new randomized policy that attains
a regret {\em uniformly bounded over time} in this setting. We also prove
several lower bounds, which show in particular that bounded regret is not
possible if one only knows , and bounded regret of order is
not possible if one only knows $\mu^{(\star)}
Reducing statistical time-series problems to binary classification
We show how binary classification methods developed to work on i.i.d. data
can be used for solving statistical problems that are seemingly unrelated to
classification and concern highly-dependent time series. Specifically, the
problems of time-series clustering, homogeneity testing and the three-sample
problem are addressed. The algorithms that we construct for solving these
problems are based on a new metric between time-series distributions, which can
be evaluated using binary classification methods. Universal consistency of the
proposed algorithms is proven under most general assumptions. The theoretical
results are illustrated with experiments on synthetic and real-world data.Comment: In proceedings of NIPS 2012, pp. 2069-207
Quantifying Homology Classes
We develop a method for measuring homology classes. This involves three
problems. First, we define the size of a homology class, using ideas from
relative homology. Second, we define an optimal basis of a homology group to be
the basis whose elements' size have the minimal sum. We provide a greedy
algorithm to compute the optimal basis and measure classes in it. The algorithm
runs in time, where is the size of the simplicial
complex and is the Betti number of the homology group. Third, we
discuss different ways of localizing homology classes and prove some hardness
results
- …