Search CORE

3,585 research outputs found

On Local Regret

Author: Bowling Michael
Zinkevich Martin
Publication venue
Publication date: 01/01/2012
Field of study

Online learning aims to perform nearly as well as the best hypothesis in hindsight. For some hypothesis classes, though, even finding the best hypothesis offline is challenging. In such offline cases, local search techniques are often employed and only local optimality guaranteed. For online decision-making with such hypothesis classes, we introduce local regret, a generalization of regret that aims to perform nearly as well as only nearby hypotheses. We then present a general algorithm to minimize local regret with arbitrary locality graphs. We also show how the graph structure can be exploited to drastically speed learning. These algorithms are then demonstrated on a diverse set of online problems: online disjunct learning, online Max-SAT, and online decision tree learning.Comment: This is the longer version of the same-titled paper appearing in the Proceedings of the Twenty-Ninth International Conference on Machine Learning (ICML), 201

arXiv.org e-Print Archive

CiteSeerX

A Duality Based 2-Approximation Algorithm for Maximum Agreement Forest

Author: Olver Neil
Schalekamp Frans
Stougie Leen
van der Ster Suzanne
van Zuylen Anke
Publication venue
Publication date: 01/01/2018
Field of study

We give a 2-approximation algorithm for the Maximum Agreement Forest problem on two rooted binary trees. This NP-hard problem has been studied extensively in the past two decades, since it can be used to compute the rooted Subtree Prune-and-Regraft (rSPR) distance between two phylogenetic trees. Our algorithm is combinatorial and its running time is quadratic in the input size. To prove the approximation guarantee, we construct a feasible dual solution for a novel linear programming formulation. In addition, we show this linear program is stronger than previously known formulations, and we give a compact formulation, showing that it can be solved in polynomial tim

arXiv.org e-Print Archive

VU Research Portal

CWI's Institutional Repository

Prefix Discrepancy, Smoothed Analysis, and Combinatorial Vector Balancing

Author: Bansal Nikhil
Jiang Haotian
Meka Raghu
Singla Sahil
Sinha Makrand
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 13th Innovations in Theoretical Computer Science Conference (ITCS 2022)
Publication date: 13/11/2021
Field of study

A well-known result of Banaszczyk in discrepancy theory concerns the prefix discrepancy problem (also known as the signed series problem): given a sequence of

T

unit vectors in

\mathbb{R}^d

, find

\pm

signs for each of them such that the signed sum vector along any prefix has a small

\ell_\infty

-norm? This problem is central to proving upper bounds for the Steinitz problem, and the popular Koml\'os problem is a special case where one is only concerned with the final signed sum vector instead of all prefixes. Banaszczyk gave an

O(\sqrt{\log d+ \log T})

bound for the prefix discrepancy problem. We investigate the tightness of Banaszczyk's bound and consider natural generalizations of prefix discrepancy: We first consider a smoothed analysis setting, where a small amount of additive noise perturbs the input vectors. We show an exponential improvement in

T

compared to Banaszczyk's bound. Using a primal-dual approach and a careful chaining argument, we show that one can achieve a bound of

O(\sqrt{\log d+ \log\!\log T})

with high probability in the smoothed setting. Moreover, this smoothed analysis bound is the best possible without further improvement on Banaszczyk's bound in the worst case. We also introduce a generalization of the prefix discrepancy problem where the discrepancy constraints correspond to paths on a DAG on

T

vertices. We show that an analog of Banaszczyk's

O(\sqrt{\log d+ \log T})

bound continues to hold in this setting for adversarially given unit vectors and that the

\sqrt{\log T}

factor is unavoidable for DAGs. We also show that the dependence on

T

cannot be improved significantly in the smoothed case for DAGs. We conclude by exploring a more general notion of vector balancing, which we call combinatorial vector balancing. We obtain near-optimal bounds in this setting, up to poly-logarithmic factors.Comment: 22 pages. Appear in ITCS 202

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Phase transition in the sample complexity of likelihood-based phylogeny inference

Author: Roch Sebastien
Sly Allan
Publication venue
Publication date: 18/07/2017
Field of study

Reconstructing evolutionary trees from molecular sequence data is a fundamental problem in computational biology. Stochastic models of sequence evolution are closely related to spin systems that have been extensively studied in statistical physics and that connection has led to important insights on the theoretical properties of phylogenetic reconstruction algorithms as well as the development of new inference methods. Here, we study maximum likelihood, a classical statistical technique which is perhaps the most widely used in phylogenetic practice because of its superior empirical accuracy. At the theoretical level, except for its consistency, that is, the guarantee of eventual correct reconstruction as the size of the input data grows, much remains to be understood about the statistical properties of maximum likelihood in this context. In particular, the best bounds on the sample complexity or sequence-length requirement of maximum likelihood, that is, the amount of data required for correct reconstruction, are exponential in the number,

n

, of tips---far from known lower bounds based on information-theoretic arguments. Here we close the gap by proving a new upper bound on the sequence-length requirement of maximum likelihood that matches up to constants the known lower bound for some standard models of evolution. More specifically, for the

r

-state symmetric model of sequence evolution on a binary phylogeny with bounded edge lengths, we show that the sequence-length requirement behaves logarithmically in

n

when the expected amount of mutation per edge is below what is known as the Kesten-Stigum threshold. In general, the sequence-length requirement is polynomial in

n

. Our results imply moreover that the maximum likelihood estimator can be computed efficiently on randomly generated data provided sequences are as above.Comment: To appear in Probability Theory and Related Field

arXiv.org e-Print Archive

Princeton University Open Access Repository

Computing Optimal Steiner Trees in Polynomial Space

Author: Fomin Fedor
Grandoni Fabrizio
Kratsch Dieter
Lokshtanov Daniel
Saurabh Saket
Publication venue
Publication date: 18/06/2018
Field of study

Given an n-node edge-weighted graph and a subset of k terminal nodes, the NP-hard (weighted) Steiner tree problem is to compute a minimum-weight tree which spans the terminals. All the known algorithms for this problem which improve on trivial O(1.62 n )-time enumeration are based on dynamic programming, and require exponential space. Motivated by the fact that exponential-space algorithms are typically impractical, in this paper we address the problem of designing faster polynomial-space algorithms. Our first contribution is a simple O((27/4) k n O(logk))-time polynomial-space algorithm for the problem. This algorithm is based on a variant of the classical tree-separator theorem: every Steiner tree has a node whose removal partitions the tree in two forests, containing at most 2k/3 terminals each. Exploiting separators of logarithmic size which evenly partition the terminals, we are able to reduce the running time to

O(4^{k}n^{O(\log^{2} k)})

. This improves on trivial enumeration for roughly k<n/3, which covers most of the cases of practical interest. Combining the latter algorithm (for small k) with trivial enumeration (for large k) we obtain a O(1.59 n )-time polynomial-space algorithm for the weighted Steiner tree problem. As a second contribution of this paper, we present a O(1.55 n )-time polynomial-space algorithm for the cardinality version of the problem, where all edge weights are one. This result is based on a improved branching strategy. The refined branching is based on a charging mechanism which shows that, for large values of k, convenient local configurations of terminals and non-terminals exist. The analysis of the algorithm relies on the Measure & Conquer approach: the non-standard measure used here is a linear combination of the number of nodes and number of non-terminals. Using a recent result in Nederlof (International colloquium on automata, languages and programming (ICALP), pp.713-725, 2009), the running time can be reduced to O(1.36 n ). The previous best algorithm for the cardinality case runs in O(1.42 n ) time and exponential spac

RERO DOC Digital Library

Constant-time dynamic (∆+1)-coloring

Author: Henzinger M.
Peng P.
Publication venue: Schloss Dagstuhl – Leibniz Center for Informatics
Publication date: 01/01/2020
Field of study

We give a fully dynamic (Las-Vegas style) algorithm with constant expected amortized time per update that maintains a proper (∆ + 1)-vertex coloring of a graph with maximum degree at most ∆. This improves upon the previous O(log ∆)-time algorithm by Bhattacharya et al. (SODA 2018). We show that our result does not only have optimal running time, but is also optimal in the sense that already deciding whether a ∆-coloring exists in a dynamically changing graph with maximum degree at most ∆ takes Ω(log n) time per operation

Dagstuhl Research Online Publication Server

White Rose Research Online