309,657 research outputs found

    Dispersion for Data-Driven Algorithm Design, Online Learning, and Private Optimization

    Full text link
    Data-driven algorithm design, that is, choosing the best algorithm for a specific application, is a crucial problem in modern data science. Practitioners often optimize over a parameterized algorithm family, tuning parameters based on problems from their domain. These procedures have historically come with no guarantees, though a recent line of work studies algorithm selection from a theoretical perspective. We advance the foundations of this field in several directions: we analyze online algorithm selection, where problems arrive one-by-one and the goal is to minimize regret, and private algorithm selection, where the goal is to find good parameters over a set of problems without revealing sensitive information contained therein. We study important algorithm families, including SDP-rounding schemes for problems formulated as integer quadratic programs, and greedy techniques for canonical subset selection problems. In these cases, the algorithm's performance is a volatile and piecewise Lipschitz function of its parameters, since tweaking the parameters can completely change the algorithm's behavior. We give a sufficient and general condition, dispersion, defining a family of piecewise Lipschitz functions that can be optimized online and privately, which includes the functions measuring the performance of the algorithms we study. Intuitively, a set of piecewise Lipschitz functions is dispersed if no small region contains many of the functions' discontinuities. We present general techniques for online and private optimization of the sum of dispersed piecewise Lipschitz functions. We improve over the best-known regret bounds for a variety of problems, prove regret bounds for problems not previously studied, and give matching lower bounds. We also give matching upper and lower bounds on the utility loss due to privacy. Moreover, we uncover dispersion in auction design and pricing problems

    Lower bounds and aggregation in density estimation

    Full text link
    In this paper we prove the optimality of an aggregation procedure. We prove lower bounds for aggregation of model selection type of MM density estimators for the Kullback-Leiber divergence (KL), the Hellinger's distance and the L_1L\_1-distance. The lower bound, with respect to the KL distance, can be achieved by the on-line type estimate suggested, among others, by Yang (2000). Combining these results, we state that logM/n\log M/n is an optimal rate of aggregation in the sense of Tsybakov (2003), where nn is the sample size

    Using Probability to Reason about Soft Deadlines

    Get PDF
    Soft deadlines are significant in systems in which a bound on the response time is important, but the failure to meet the response time is not a disaster. Soft deadlines occur, for example, in telephony and switching networks. We investigate how to put probabilistic bounds on the time-complexity of a concurrent logic program by combining (on-line) profiling with an (off-line) probabilistic complexity analysis. The profiling collects information on the likelihood of case selection and the analysis uses this information to infer the probability of an agent terminating within k steps. Although the approach does not reason about synchronization, we believe that its simplicity and good (essentially quadratic) complexity mean that it is a promising first step in reasoning about soft deadlines

    Random Access in Persistent Strings and Segment Selection

    Full text link
    We consider compact representations of collections of similar strings that support random access queries. The collection of strings is given by a rooted tree where edges are labeled by an edit operation (inserting, deleting, or replacing a character) and a node represents the string obtained by applying the sequence of edit operations on the path from the root to the node. The goal is to compactly represent the entire collection while supporting fast random access to any part of a string in the collection. This problem captures natural scenarios such as representing the past history of an edited document or representing highly-repetitive collections. Given a tree with nn nodes, we show how to represent the corresponding collection in O(n)O(n) space and O(logn/loglogn)O(\log n/ \log \log n) query time. This improves the previous time-space trade-offs for the problem. Additionally, we show a lower bound proving that the query time is optimal for any solution using near-linear space. To achieve our bounds for random access in persistent strings we show how to reduce the problem to the following natural geometric selection problem on line segments. Consider a set of horizontal line segments in the plane. Given parameters ii and jj, a segment selection query returns the jjth smallest segment (the segment with the jjth smallest yy-coordinate) among the segments crossing the vertical line through xx-coordinate ii. The segment selection problem is to preprocess a set of horizontal line segments into a compact data structure that supports fast segment selection queries. We present a solution that uses O(n)O(n) space and support segment selection queries in O(logn/loglogn)O(\log n/ \log \log n) time, where nn is the number of segments. Furthermore, we prove that that this query time is also optimal for any solution using near-linear space.Comment: Extended abstract at ISAAC 202
    corecore