Search CORE

309,657 research outputs found

Dispersion for Data-Driven Algorithm Design, Online Learning, and Private Optimization

Author: Balcan Maria-Florina
Dick Travis
Vitercik Ellen
Publication venue
Publication date: 22/10/2018
Field of study

Data-driven algorithm design, that is, choosing the best algorithm for a specific application, is a crucial problem in modern data science. Practitioners often optimize over a parameterized algorithm family, tuning parameters based on problems from their domain. These procedures have historically come with no guarantees, though a recent line of work studies algorithm selection from a theoretical perspective. We advance the foundations of this field in several directions: we analyze online algorithm selection, where problems arrive one-by-one and the goal is to minimize regret, and private algorithm selection, where the goal is to find good parameters over a set of problems without revealing sensitive information contained therein. We study important algorithm families, including SDP-rounding schemes for problems formulated as integer quadratic programs, and greedy techniques for canonical subset selection problems. In these cases, the algorithm's performance is a volatile and piecewise Lipschitz function of its parameters, since tweaking the parameters can completely change the algorithm's behavior. We give a sufficient and general condition, dispersion, defining a family of piecewise Lipschitz functions that can be optimized online and privately, which includes the functions measuring the performance of the algorithms we study. Intuitively, a set of piecewise Lipschitz functions is dispersed if no small region contains many of the functions' discontinuities. We present general techniques for online and private optimization of the sum of dispersed piecewise Lipschitz functions. We improve over the best-known regret bounds for a variety of problems, prove regret bounds for problems not previously studied, and give matching lower bounds. We also give matching upper and lower bounds on the utility loss due to privacy. Moreover, we uncover dispersion in auction design and pricing problems

arXiv.org e-Print Archive

Crossref

Lower bounds and aggregation in density estimation

Author: Lecué Guillaume
Publication venue
Publication date: 01/01/2006
Field of study

In this paper we prove the optimality of an aggregation procedure. We prove lower bounds for aggregation of model selection type of

M

density estimators for the Kullback-Leiber divergence (KL), the Hellinger's distance and the

L\_1

-distance. The lower bound, with respect to the KL distance, can be achieved by the on-line type estimate suggested, among others, by Yang (2000). Combining these results, we state that

\log M/n

is an optimal rate of aggregation in the sense of Tsybakov (2003), where

n

is the sample size

arXiv.org e-Print Archive

CiteSeerX

Hal-Diderot

HAL - UPEC / UPEM

Using Probability to Reason about Soft Deadlines

Author: Bryans Jeremy W.
King Andy
Publication venue: University of Kent, School of Computing
Publication date: 01/01/1998
Field of study

Soft deadlines are significant in systems in which a bound on the response time is important, but the failure to meet the response time is not a disaster. Soft deadlines occur, for example, in telephony and switching networks. We investigate how to put probabilistic bounds on the time-complexity of a concurrent logic program by combining (on-line) profiling with an (off-line) probabilistic complexity analysis. The profiling collects information on the likelihood of case selection and the analysis uses this information to infer the probability of an agent terminating within k steps. Although the approach does not reason about synchronization, we believe that its simplicity and good (essentially quadratic) complexity mean that it is a promising first step in reasoning about soft deadlines

CiteSeerX

Kent Academic Repository

Random Access in Persistent Strings and Segment Selection

Author: Bille Philip
Gørtz Inge Li
Publication venue
Publication date: 11/02/2021
Field of study

We consider compact representations of collections of similar strings that support random access queries. The collection of strings is given by a rooted tree where edges are labeled by an edit operation (inserting, deleting, or replacing a character) and a node represents the string obtained by applying the sequence of edit operations on the path from the root to the node. The goal is to compactly represent the entire collection while supporting fast random access to any part of a string in the collection. This problem captures natural scenarios such as representing the past history of an edited document or representing highly-repetitive collections. Given a tree with

n

nodes, we show how to represent the corresponding collection in

O(n)

space and

O(\log n/ \log \log n)

query time. This improves the previous time-space trade-offs for the problem. Additionally, we show a lower bound proving that the query time is optimal for any solution using near-linear space. To achieve our bounds for random access in persistent strings we show how to reduce the problem to the following natural geometric selection problem on line segments. Consider a set of horizontal line segments in the plane. Given parameters

i

and

j

, a segment selection query returns the

j

th smallest segment (the segment with the

j

th smallest

y

-coordinate) among the segments crossing the vertical line through

x

-coordinate

i

. The segment selection problem is to preprocess a set of horizontal line segments into a compact data structure that supports fast segment selection queries. We present a solution that uses

O(n)

space and support segment selection queries in

O(\log n/ \log \log n)

time, where

n

is the number of segments. Furthermore, we prove that that this query time is also optimal for any solution using near-linear space.Comment: Extended abstract at ISAAC 202

arXiv.org e-Print Archive

Online Research Database In Technology