Search CORE

276 research outputs found

Minimizing Convex Functions with Integral Minimizers

Author: Jiang Haotian
Publication venue
Publication date: 22/11/2020
Field of study

Given a separation oracle

\mathsf{SO}

for a convex function

f

that has an integral minimizer inside a box with radius

R

, we show how to find an exact minimizer of

f

using at most (a)

O(n (n + \log(R)))

calls to

\mathsf{SO}

and

\mathsf{poly}(n, \log(R))

arithmetic operations, or (b)

O(n \log(nR))

calls to

\mathsf{SO}

and

\exp(n) \cdot \mathsf{poly}(\log(R))

arithmetic operations. When the set of minimizers of

f

has integral extreme points, our algorithm outputs an integral minimizer of

f

. This improves upon the previously best oracle complexity of

O(n^2 (n + \log(R)))

for polynomial time algorithms obtained by [Gr\"otschel, Lov\'asz and Schrijver, Prog. Comb. Opt. 1984, Springer 1988] over thirty years ago. For the Submodular Function Minimization problem, our result immediately implies a strongly polynomial algorithm that makes at most

O(n^3)

calls to an evaluation oracle, and an exponential time algorithm that makes at most

O(n^2 \log(n))

calls to an evaluation oracle. These improve upon the previously best

O(n^3 \log^2(n))

oracle complexity for strongly polynomial algorithms given in [Lee, Sidford and Wong, FOCS 2015] and [Dadush, V\'egh and Zambelli, SODA 2018], and an exponential time algorithm with oracle complexity

O(n^3 \log(n))

given in the former work. Our result is achieved via a reduction to the Shortest Vector Problem in lattices. We show how an approximately shortest vector of certain lattice can be used to effectively reduce the dimension of the problem. Our analysis of the oracle complexity is based on a potential function that captures simultaneously the size of the search set and the density of the lattice, which we analyze via technical tools from convex geometry.Comment: This version of the paper simplifies and generalizes the results in an earlier version which will appear in SODA 202

arXiv.org e-Print Archive

Algorithms and Adaptivity Gaps for Stochastic k-TSP

Author: Jiang Haotian
Li Jian
Liu Daogao
Singla Sahil
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 11th Innovations in Theoretical Computer Science Conference (ITCS 2020)
Publication date: 06/11/2019
Field of study

Given a metric

(V,d)

and a

\textsf{root} \in V

, the classic \textsf{k-TSP} problem is to find a tour originating at the

\textsf{root}

of minimum length that visits at least

k

nodes in

V

. In this work, motivated by applications where the input to an optimization problem is uncertain, we study two stochastic versions of \textsf{k-TSP}. In Stoch-Reward

k

-TSP, originally defined by Ene-Nagarajan-Saket [ENS17], each vertex

v

in the given metric

(V,d)

contains a stochastic reward

R_v

. The goal is to adaptively find a tour of minimum expected length that collects at least reward

k

; here "adaptively" means our next decision may depend on previous outcomes. Ene et al. give an

O(\log k)

-approximation adaptive algorithm for this problem, and left open if there is an

O(1)

-approximation algorithm. We totally resolve their open question and even give an

O(1)

-approximation \emph{non-adaptive} algorithm for this problem. We also introduce and obtain similar results for the Stoch-Cost

k

-TSP problem. In this problem each vertex

v

has a stochastic cost

C_v

, and the goal is to visit and select at least

k

vertices to minimize the expected \emph{sum} of tour length and cost of selected vertices. This problem generalizes the Price of Information framework [Singla18] from deterministic probing costs to metric probing costs. Our techniques are based on two crucial ideas: "repetitions" and "critical scaling". We show using Freedman's and Jogdeo-Samuels' inequalities that for our problems, if we truncate the random variables at an ideal threshold and repeat, then their expected values form a good surrogate. Unfortunately, this ideal threshold is adaptive as it depends on how far we are from achieving our target

k

, so we truncate at various different scales and identify a "critical" scale.Comment: ITCS 202

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Forward and Inverse Approximation Theory for Linear Temporal Convolutional Networks

Author: Jiang Haotian
Li Qianxiao
Publication venue
Publication date: 29/05/2023
Field of study

We present a theoretical analysis of the approximation properties of convolutional architectures when applied to the modeling of temporal sequences. Specifically, we prove an approximation rate estimate (Jackson-type result) and an inverse approximation theorem (Bernstein-type result), which together provide a comprehensive characterization of the types of sequential relationships that can be efficiently captured by a temporal convolutional architecture. The rate estimate improves upon a previous result via the introduction of a refined complexity measure, whereas the inverse approximation theorem is new

arXiv.org e-Print Archive

Approximation theory of transformer networks for sequence modeling

Author: Jiang Haotian
Li Qianxiao
Publication venue
Publication date: 29/05/2023
Field of study

The transformer is a widely applied architecture in sequence modeling applications, but the theoretical understanding of its working principles is limited. In this work, we investigate the ability of transformers to approximate sequential relationships. We first prove a universal approximation theorem for the transformer hypothesis space. From its derivation, we identify a novel notion of regularity under which we can prove an explicit approximation rate estimate. This estimate reveals key structural properties of the transformer and suggests the types of sequence relationships that the transformer is adapted to approximating. In particular, it allows us to concretely discuss the structural bias between the transformer and classical sequence modeling methods, such as recurrent neural networks. Our findings are supported by numerical experiments

arXiv.org e-Print Archive

Sparse Submodular Function Minimization

Author: Graur Andrei
Jiang Haotian
Sidford Aaron
Publication venue
Publication date: 28/09/2023
Field of study

In this paper we study the problem of minimizing a submodular function

f : 2^V \rightarrow \mathbb{R}

that is guaranteed to have a

k

-sparse minimizer. We give a deterministic algorithm that computes an additive

\epsilon

-approximate minimizer of such

f

\widetilde{O}(\mathsf{poly}(k) \log(|f|/\epsilon))

parallel depth using a polynomial number of queries to an evaluation oracle of

f

, where

|f| = \max_{S \subseteq V} |f(S)|

. Further, we give a randomized algorithm that computes an exact minimizer of

f

with high probability using

\widetilde{O}(|V| \cdot \mathsf{poly}(k))

queries and polynomial time. When

k = \widetilde{O}(1)

, our algorithms use either nearly-constant parallel depth or a nearly-linear number of evaluation oracle queries. All previous algorithms for this problem either use

\Omega(|V|)

parallel depth or

\Omega(|V|^2)

queries. In contrast to state-of-the-art weakly-polynomial and strongly-polynomial time algorithms for SFM, our algorithms use first-order optimization methods, e.g., mirror descent and follow the regularized leader. We introduce what we call {\em sparse dual certificates}, which encode information on the structure of sparse minimizers, and both our parallel and sequential algorithms provide new algorithmic tools for allowing first-order optimization methods to efficiently compute them. Correspondingly, our algorithm does not invoke fast matrix multiplication or general linear system solvers and in this sense is more combinatorial than previous state-of-the-art methods.Comment: Accepted to FOCS 202

arXiv.org e-Print Archive

A Unified PTAS for Prize Collecting TSP and Steiner Tree Problem in Doubling Metrics

Author: Chan T-H. Hubert
Jiang Haotian
Jiang Shaofeng H.-C.
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 26th Annual European Symposium on Algorithms (ESA 2018)
Publication date: 01/01/2018
Field of study

We present a unified (randomized) polynomial-time approximation scheme (PTAS) for the prize collecting traveling salesman problem (PCTSP) and the prize collecting Steiner tree problem (PCSTP) in doubling metrics. Given a metric space and a penalty function on a subset of points known as terminals, a solution is a subgraph on points in the metric space, whose cost is the weight of its edges plus the penalty due to terminals not covered by the subgraph. Under our unified framework, the solution subgraph needs to be Eulerian for PCTSP, while it needs to be a tree for PCSTP. Before our work, even a QPTAS for the problems in doubling metrics is not known. Our unified PTAS is based on the previous dynamic programming frameworks proposed in [Talwar STOC 2004] and [Bartal, Gottlieb, Krauthgamer STOC 2012]. However, since it is unknown which part of the optimal cost is due to edge lengths and which part is due to penalties of uncovered terminals, we need to develop new techniques to apply previous divide-and-conquer strategies and sparse instance decompositions

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Parallel Submodular Function Minimization

Author: Chakrabarty Deeparnab
Graur Andrei
Jiang Haotian
Sidford Aaron
Publication venue
Publication date: 08/09/2023
Field of study

We consider the parallel complexity of submodular function minimization (SFM). We provide a pair of methods which obtain two new query versus depth trade-offs a submodular function defined on subsets of

n

elements that has integer values between

-M

and

M

. The first method has depth

2

and query complexity

n^{O(M)}

and the second method has depth

\widetilde{O}(n^{1/3} M^{2/3})

and query complexity

O(\mathrm{poly}(n, M))

. Despite a line of work on improved parallel lower bounds for SFM, prior to our work the only known algorithms for parallel SFM either followed from more general methods for sequential SFM or highly-parallel minimization of convex

\ell_2

-Lipschitz functions. Interestingly, to obtain our second result we provide the first highly-parallel algorithm for minimizing

\ell_\infty

-Lipschitz function over the hypercube which obtains near-optimal depth for obtaining constant accuracy

arXiv.org e-Print Archive