309,657 research outputs found
Dispersion for Data-Driven Algorithm Design, Online Learning, and Private Optimization
Data-driven algorithm design, that is, choosing the best algorithm for a
specific application, is a crucial problem in modern data science.
Practitioners often optimize over a parameterized algorithm family, tuning
parameters based on problems from their domain. These procedures have
historically come with no guarantees, though a recent line of work studies
algorithm selection from a theoretical perspective. We advance the foundations
of this field in several directions: we analyze online algorithm selection,
where problems arrive one-by-one and the goal is to minimize regret, and
private algorithm selection, where the goal is to find good parameters over a
set of problems without revealing sensitive information contained therein. We
study important algorithm families, including SDP-rounding schemes for problems
formulated as integer quadratic programs, and greedy techniques for canonical
subset selection problems. In these cases, the algorithm's performance is a
volatile and piecewise Lipschitz function of its parameters, since tweaking the
parameters can completely change the algorithm's behavior. We give a sufficient
and general condition, dispersion, defining a family of piecewise Lipschitz
functions that can be optimized online and privately, which includes the
functions measuring the performance of the algorithms we study. Intuitively, a
set of piecewise Lipschitz functions is dispersed if no small region contains
many of the functions' discontinuities. We present general techniques for
online and private optimization of the sum of dispersed piecewise Lipschitz
functions. We improve over the best-known regret bounds for a variety of
problems, prove regret bounds for problems not previously studied, and give
matching lower bounds. We also give matching upper and lower bounds on the
utility loss due to privacy. Moreover, we uncover dispersion in auction design
and pricing problems
Lower bounds and aggregation in density estimation
In this paper we prove the optimality of an aggregation procedure. We prove
lower bounds for aggregation of model selection type of density estimators
for the Kullback-Leiber divergence (KL), the Hellinger's distance and the
-distance. The lower bound, with respect to the KL distance, can be
achieved by the on-line type estimate suggested, among others, by Yang (2000).
Combining these results, we state that is an optimal rate of
aggregation in the sense of Tsybakov (2003), where is the sample size
Using Probability to Reason about Soft Deadlines
Soft deadlines are significant in systems in which a bound on the response time is important, but the failure to meet the response time is not a disaster. Soft deadlines occur, for example, in telephony and switching networks. We investigate how to put probabilistic bounds on the time-complexity of a concurrent logic program by combining (on-line) profiling with an (off-line) probabilistic complexity analysis. The profiling collects information on the likelihood of case selection and the analysis uses this information to infer the probability of an agent terminating within k steps. Although the approach does not reason about synchronization, we believe that its simplicity and good (essentially quadratic) complexity mean that it is a promising first step in reasoning about soft deadlines
Random Access in Persistent Strings and Segment Selection
We consider compact representations of collections of similar strings that
support random access queries. The collection of strings is given by a rooted
tree where edges are labeled by an edit operation (inserting, deleting, or
replacing a character) and a node represents the string obtained by applying
the sequence of edit operations on the path from the root to the node. The goal
is to compactly represent the entire collection while supporting fast random
access to any part of a string in the collection. This problem captures natural
scenarios such as representing the past history of an edited document or
representing highly-repetitive collections. Given a tree with nodes, we
show how to represent the corresponding collection in space and query time. This improves the previous time-space trade-offs
for the problem. Additionally, we show a lower bound proving that the query
time is optimal for any solution using near-linear space.
To achieve our bounds for random access in persistent strings we show how to
reduce the problem to the following natural geometric selection problem on line
segments. Consider a set of horizontal line segments in the plane. Given
parameters and , a segment selection query returns the th smallest
segment (the segment with the th smallest -coordinate) among the segments
crossing the vertical line through -coordinate . The segment selection
problem is to preprocess a set of horizontal line segments into a compact data
structure that supports fast segment selection queries. We present a solution
that uses space and support segment selection queries in time, where is the number of segments. Furthermore, we prove that
that this query time is also optimal for any solution using near-linear space.Comment: Extended abstract at ISAAC 202
- …