276 research outputs found

    Minimizing Convex Functions with Integral Minimizers

    Full text link
    Given a separation oracle SO\mathsf{SO} for a convex function ff that has an integral minimizer inside a box with radius RR, we show how to find an exact minimizer of ff using at most (a) O(n(n+log(R)))O(n (n + \log(R))) calls to SO\mathsf{SO} and poly(n,log(R))\mathsf{poly}(n, \log(R)) arithmetic operations, or (b) O(nlog(nR))O(n \log(nR)) calls to SO\mathsf{SO} and exp(n)poly(log(R))\exp(n) \cdot \mathsf{poly}(\log(R)) arithmetic operations. When the set of minimizers of ff has integral extreme points, our algorithm outputs an integral minimizer of ff. This improves upon the previously best oracle complexity of O(n2(n+log(R)))O(n^2 (n + \log(R))) for polynomial time algorithms obtained by [Gr\"otschel, Lov\'asz and Schrijver, Prog. Comb. Opt. 1984, Springer 1988] over thirty years ago. For the Submodular Function Minimization problem, our result immediately implies a strongly polynomial algorithm that makes at most O(n3)O(n^3) calls to an evaluation oracle, and an exponential time algorithm that makes at most O(n2log(n))O(n^2 \log(n)) calls to an evaluation oracle. These improve upon the previously best O(n3log2(n))O(n^3 \log^2(n)) oracle complexity for strongly polynomial algorithms given in [Lee, Sidford and Wong, FOCS 2015] and [Dadush, V\'egh and Zambelli, SODA 2018], and an exponential time algorithm with oracle complexity O(n3log(n))O(n^3 \log(n)) given in the former work. Our result is achieved via a reduction to the Shortest Vector Problem in lattices. We show how an approximately shortest vector of certain lattice can be used to effectively reduce the dimension of the problem. Our analysis of the oracle complexity is based on a potential function that captures simultaneously the size of the search set and the density of the lattice, which we analyze via technical tools from convex geometry.Comment: This version of the paper simplifies and generalizes the results in an earlier version which will appear in SODA 202

    Algorithms and Adaptivity Gaps for Stochastic k-TSP

    Get PDF
    Given a metric (V,d)(V,d) and a rootV\textsf{root} \in V, the classic \textsf{k-TSP} problem is to find a tour originating at the root\textsf{root} of minimum length that visits at least kk nodes in VV. In this work, motivated by applications where the input to an optimization problem is uncertain, we study two stochastic versions of \textsf{k-TSP}. In Stoch-Reward kk-TSP, originally defined by Ene-Nagarajan-Saket [ENS17], each vertex vv in the given metric (V,d)(V,d) contains a stochastic reward RvR_v. The goal is to adaptively find a tour of minimum expected length that collects at least reward kk; here "adaptively" means our next decision may depend on previous outcomes. Ene et al. give an O(logk)O(\log k)-approximation adaptive algorithm for this problem, and left open if there is an O(1)O(1)-approximation algorithm. We totally resolve their open question and even give an O(1)O(1)-approximation \emph{non-adaptive} algorithm for this problem. We also introduce and obtain similar results for the Stoch-Cost kk-TSP problem. In this problem each vertex vv has a stochastic cost CvC_v, and the goal is to visit and select at least kk vertices to minimize the expected \emph{sum} of tour length and cost of selected vertices. This problem generalizes the Price of Information framework [Singla18] from deterministic probing costs to metric probing costs. Our techniques are based on two crucial ideas: "repetitions" and "critical scaling". We show using Freedman's and Jogdeo-Samuels' inequalities that for our problems, if we truncate the random variables at an ideal threshold and repeat, then their expected values form a good surrogate. Unfortunately, this ideal threshold is adaptive as it depends on how far we are from achieving our target kk, so we truncate at various different scales and identify a "critical" scale.Comment: ITCS 202

    Forward and Inverse Approximation Theory for Linear Temporal Convolutional Networks

    Full text link
    We present a theoretical analysis of the approximation properties of convolutional architectures when applied to the modeling of temporal sequences. Specifically, we prove an approximation rate estimate (Jackson-type result) and an inverse approximation theorem (Bernstein-type result), which together provide a comprehensive characterization of the types of sequential relationships that can be efficiently captured by a temporal convolutional architecture. The rate estimate improves upon a previous result via the introduction of a refined complexity measure, whereas the inverse approximation theorem is new

    Approximation theory of transformer networks for sequence modeling

    Full text link
    The transformer is a widely applied architecture in sequence modeling applications, but the theoretical understanding of its working principles is limited. In this work, we investigate the ability of transformers to approximate sequential relationships. We first prove a universal approximation theorem for the transformer hypothesis space. From its derivation, we identify a novel notion of regularity under which we can prove an explicit approximation rate estimate. This estimate reveals key structural properties of the transformer and suggests the types of sequence relationships that the transformer is adapted to approximating. In particular, it allows us to concretely discuss the structural bias between the transformer and classical sequence modeling methods, such as recurrent neural networks. Our findings are supported by numerical experiments

    Sparse Submodular Function Minimization

    Full text link
    In this paper we study the problem of minimizing a submodular function f:2VRf : 2^V \rightarrow \mathbb{R} that is guaranteed to have a kk-sparse minimizer. We give a deterministic algorithm that computes an additive ϵ\epsilon-approximate minimizer of such ff in O~(poly(k)log(f/ϵ))\widetilde{O}(\mathsf{poly}(k) \log(|f|/\epsilon)) parallel depth using a polynomial number of queries to an evaluation oracle of ff, where f=maxSVf(S)|f| = \max_{S \subseteq V} |f(S)|. Further, we give a randomized algorithm that computes an exact minimizer of ff with high probability using O~(Vpoly(k))\widetilde{O}(|V| \cdot \mathsf{poly}(k)) queries and polynomial time. When k=O~(1)k = \widetilde{O}(1), our algorithms use either nearly-constant parallel depth or a nearly-linear number of evaluation oracle queries. All previous algorithms for this problem either use Ω(V)\Omega(|V|) parallel depth or Ω(V2)\Omega(|V|^2) queries. In contrast to state-of-the-art weakly-polynomial and strongly-polynomial time algorithms for SFM, our algorithms use first-order optimization methods, e.g., mirror descent and follow the regularized leader. We introduce what we call {\em sparse dual certificates}, which encode information on the structure of sparse minimizers, and both our parallel and sequential algorithms provide new algorithmic tools for allowing first-order optimization methods to efficiently compute them. Correspondingly, our algorithm does not invoke fast matrix multiplication or general linear system solvers and in this sense is more combinatorial than previous state-of-the-art methods.Comment: Accepted to FOCS 202

    A Unified PTAS for Prize Collecting TSP and Steiner Tree Problem in Doubling Metrics

    Get PDF
    We present a unified (randomized) polynomial-time approximation scheme (PTAS) for the prize collecting traveling salesman problem (PCTSP) and the prize collecting Steiner tree problem (PCSTP) in doubling metrics. Given a metric space and a penalty function on a subset of points known as terminals, a solution is a subgraph on points in the metric space, whose cost is the weight of its edges plus the penalty due to terminals not covered by the subgraph. Under our unified framework, the solution subgraph needs to be Eulerian for PCTSP, while it needs to be a tree for PCSTP. Before our work, even a QPTAS for the problems in doubling metrics is not known. Our unified PTAS is based on the previous dynamic programming frameworks proposed in [Talwar STOC 2004] and [Bartal, Gottlieb, Krauthgamer STOC 2012]. However, since it is unknown which part of the optimal cost is due to edge lengths and which part is due to penalties of uncovered terminals, we need to develop new techniques to apply previous divide-and-conquer strategies and sparse instance decompositions

    Parallel Submodular Function Minimization

    Full text link
    We consider the parallel complexity of submodular function minimization (SFM). We provide a pair of methods which obtain two new query versus depth trade-offs a submodular function defined on subsets of nn elements that has integer values between M-M and MM. The first method has depth 22 and query complexity nO(M)n^{O(M)} and the second method has depth O~(n1/3M2/3)\widetilde{O}(n^{1/3} M^{2/3}) and query complexity O(poly(n,M))O(\mathrm{poly}(n, M)). Despite a line of work on improved parallel lower bounds for SFM, prior to our work the only known algorithms for parallel SFM either followed from more general methods for sequential SFM or highly-parallel minimization of convex 2\ell_2-Lipschitz functions. Interestingly, to obtain our second result we provide the first highly-parallel algorithm for minimizing \ell_\infty-Lipschitz function over the hypercube which obtains near-optimal depth for obtaining constant accuracy
    corecore