110 research outputs found

    Sublinear Algorithms and Lower Bounds for Metric TSP Cost Estimation

    Get PDF
    We consider the problem of designing sublinear time algorithms for estimating the cost of a minimum metric traveling salesman (TSP) tour. Specifically, given access to a n×nn \times n distance matrix DD that specifies pairwise distances between nn points, the goal is to estimate the TSP cost by performing only sublinear (in the size of DD) queries. For the closely related problem of estimating the weight of a metric minimum spanning tree (MST), it is known that for any ε>0\varepsilon > 0, there exists an O~(n/εO(1))\tilde{O}(n/\varepsilon^{O(1)}) time algorithm that returns a (1+ε)(1 + \varepsilon)-approximate estimate of the MST cost. This result immediately implies an O~(n/εO(1))\tilde{O}(n/\varepsilon^{O(1)}) time algorithm to estimate the TSP cost to within a (2+ε)(2 + \varepsilon) factor for any ε>0\varepsilon > 0. However, no o(n2)o(n^2) time algorithms are known to approximate metric TSP to a factor that is strictly better than 22. On the other hand, there were also no known barriers that rule out the existence of (1+ε)(1 + \varepsilon)-approximate estimation algorithms for metric TSP with O~(n)\tilde{O}(n) time for any fixed ε>0\varepsilon > 0. In this paper, we make progress on both algorithms and lower bounds for estimating metric TSP cost. We also show that the problem of estimating metric TSP cost is closely connected to the problem of estimating the size of a maximum matching in a graph.Comment: ICALP 202

    Streaming Verification of Graph Properties

    Get PDF
    Streaming interactive proofs (SIPs) are a framework for outsourced computation. A computationally limited streaming client (the verifier) hands over a large data set to an untrusted server (the prover) in the cloud and the two parties run a protocol to confirm the correctness of result with high probability. SIPs are particularly interesting for problems that are hard to solve (or even approximate) well in a streaming setting. The most notable of these problems is finding maximum matchings, which has received intense interest in recent years but has strong lower bounds even for constant factor approximations. In this paper, we present efficient streaming interactive proofs that can verify maximum matchings exactly. Our results cover all flavors of matchings (bipartite/non-bipartite and weighted). In addition, we also present streaming verifiers for approximate metric TSP. In particular, these are the first efficient results for weighted matchings and for metric TSP in any streaming verification model.Comment: 26 pages, 2 figure, 1 tabl

    Certified Computation from Unreliable Datasets

    Full text link
    A wide range of learning tasks require human input in labeling massive data. The collected data though are usually low quality and contain inaccuracies and errors. As a result, modern science and business face the problem of learning from unreliable data sets. In this work, we provide a generic approach that is based on \textit{verification} of only few records of the data set to guarantee high quality learning outcomes for various optimization objectives. Our method, identifies small sets of critical records and verifies their validity. We show that many problems only need poly(1/ε)\text{poly}(1/\varepsilon) verifications, to ensure that the output of the computation is at most a factor of (1±ε)(1 \pm \varepsilon) away from the truth. For any given instance, we provide an \textit{instance optimal} solution that verifies the minimum possible number of records to approximately certify correctness. Then using this instance optimal formulation of the problem we prove our main result: "every function that satisfies some Lipschitz continuity condition can be certified with a small number of verifications". We show that the required Lipschitz continuity condition is satisfied even by some NP-complete problems, which illustrates the generality and importance of this theorem. In case this certification step fails, an invalid record will be identified. Removing these records and repeating until success, guarantees that the result will be accurate and will depend only on the verified records. Surprisingly, as we show, for several computation tasks more efficient methods are possible. These methods always guarantee that the produced result is not affected by the invalid records, since any invalid record that affects the output will be detected and verified

    Doctor of Philosophy

    Get PDF
    dissertationThe contributions of this dissertation are centered around designing new algorithms in the general area of sublinear algorithms such as streaming, core sets and sublinear verification, with a special interest in problems arising from data analysis including data summarization, clustering, matrix problems and massive graphs. In the first part, we focus on summaries and coresets, which are among the main techniques for designing sublinear algorithms for massive data sets. We initiate the study of coresets for uncertain data and study coresets for various types of range counting queries on uncertain data. We focus mainly on the indecisive model of locational uncertainty since it comes up frequently in real-world applications when multiple readings of the same object are made. In this model, each uncertain point has a probability density describing its location, defined as kk distinct locations. Our goal is to construct a subset of the uncertain points, including their locational uncertainty, so that range counting queries can be answered by examining only this subset. For each type of query we provide coreset constructions with approximation-size trade-offs. We show that random sampling can be used to construct each type of coreset, and we also provide significantly improved bounds using discrepancy-based techniques on axis-aligned range queries. In the second part, we focus on designing sublinear-space algorithms for approximate computations on massive graphs. In particular, we consider graph MAXCUT and correlation clustering problems and develop sampling based approaches to construct truly sublinear (o(n)o(n)) sized coresets for graphs that have polynomial (i.e., nδn^{\delta} for any δ>0\delta >0) average degree. Our technique is based on analyzing properties of random induced subprograms of the linear program formulations of the problems. We demonstrate this technique with two examples. Firstly, we present a sublinear sized core set to approximate the value of the MAX CUT in a graph to a (1+ϵ)(1+\epsilon) factor. To the best of our knowledge, all the known methods in this regime rely crucially on near-regularity assumptions. Secondly, we apply the same framework to construct a sublinear-sized coreset for correlation clustering. Our coreset construction also suggests 2-pass streaming algorithms for computing the MAX CUT and correlation clustering objective values which are left as future work at the time of writing this dissertation. Finally, we focus on streaming verification algorithms as another model for designing sublinear algorithms. We give the first polylog space and sublinear (in number of edges) communication protocols for any streaming verification problems in graphs. We present efficient streaming interactive proofs that can verify maximum matching exactly. Our results cover all flavors of matchings (bipartite/ nonbipartite and weighted). In addition, we also present streaming verifiers for approximate metric TSP and exact triangle counting, as well as for graph primitives such as the number of connected components, bipartiteness, minimum spanning tree and connectivity. In particular, these are the first results for weighted matchings and for metric TSP in any streaming verification model. Our streaming verifiers use only polylogarithmic space while exchanging only polylogarithmic communication with the prover in addition to the output size of the relevant solution. We also initiate a study of streaming interactive proofs (SIPs) for problems in data analysis and present efficient SIPs for some fundamental problems. We present protocols for clustering and shape fitting including minimum enclosing ball (MEB), width of a point set, kk-centers and kk-slab problem. We also present protocols for fundamental matrix analysis problems: We provide an improved protocol for rectangular matrix problems, which in turn can be used to verify kk (approximate) eigenvectors of an n×nn \times n integer matrix AA. In general our solutions use polylogarithmic rounds of communication and polylogarithmic total communication and verifier space

    05201 Abstracts Collection -- Design and Analysis of Randomized and Approximation Algorithms

    Get PDF
    From 15.05.05 to 20.05.05, the Dagstuhl Seminar 05201 ``Design and Analysis of Randomized and Approximation Algorithms\u27\u27 was held in the International Conference and Research Center (IBFI), Schloss Dagstuhl. During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given during the seminar as well as abstracts of seminar results and ideas are put together in this paper. The first section describes the seminar topics and goals in general. Links to extended abstracts or full papers are provided, if available

    Sublinear Algorithms for (1.5+E)-Approximate Matching

    Get PDF
    We study sublinear time algorithms for estimating the size of maximummatching. After a long line of research, the problem was finally settled byBehnezhad [FOCS'22], in the regime where one is willing to pay an approximationfactor of 22. Very recently, Behnezhad et al.[SODA'23] improved theapproximation factor to (2−12O(1/γ))(2-\frac{1}{2^{O(1/\gamma)}}) using n1+γn^{1+\gamma}time. This improvement over the factor 22 is, however, minuscule and theyasked if even 1.991.99-approximation is possible in n2−Ω(1)n^{2-\Omega(1)} time. Wegive a strong affirmative answer to this open problem by showing(1.5+ϵ)(1.5+\epsilon)-approximation algorithms that run inn2−Θ(ϵ2)n^{2-\Theta(\epsilon^{2})} time. Our approach is conceptually simple anddiverges from all previous sublinear-time matching algorithms: we show asublinear time algorithm for computing a variant of the edge-degree constrainedsubgraph (EDCS), a concept that has previously been exploited in dynamic[Bernstein Stein ICALP'15, SODA'16], distributed [Assadi et al. SODA'19] andstreaming [Bernstein ICALP'20] settings, but never before in the sublinearsetting. Independent work: Behnezhad, Roghani and Rubinstein [BRR'23]independently showed sublinear algorithms similar to our Theorem 1.2 in bothadjacency list and matrix models. Furthermore, in [BRR'23], they showadditional results on strictly better-than-1.5 approximate matching algorithmsin both upper and lower bound sides.<br

    Sublinear Algorithms for (1.5+ϵ)(1.5+\epsilon)-Approximate Matching

    Full text link
    We study sublinear time algorithms for estimating the size of maximum matching. After a long line of research, the problem was finally settled by Behnezhad [FOCS'22], in the regime where one is willing to pay an approximation factor of 22. Very recently, Behnezhad et al.[SODA'23] improved the approximation factor to (2−12O(1/γ))(2-\frac{1}{2^{O(1/\gamma)}}) using n1+γn^{1+\gamma} time. This improvement over the factor 22 is, however, minuscule and they asked if even 1.991.99-approximation is possible in n2−Ω(1)n^{2-\Omega(1)} time. We give a strong affirmative answer to this open problem by showing (1.5+ϵ)(1.5+\epsilon)-approximation algorithms that run in n2−Θ(ϵ2)n^{2-\Theta(\epsilon^{2})} time. Our approach is conceptually simple and diverges from all previous sublinear-time matching algorithms: we show a sublinear time algorithm for computing a variant of the edge-degree constrained subgraph (EDCS), a concept that has previously been exploited in dynamic [Bernstein Stein ICALP'15, SODA'16], distributed [Assadi et al. SODA'19] and streaming [Bernstein ICALP'20] settings, but never before in the sublinear setting. Independent work: Behnezhad, Roghani and Rubinstein [BRR'23] independently showed sublinear algorithms similar to our Theorem 1.2 in both adjacency list and matrix models. Furthermore, in [BRR'23], they show additional results on strictly better-than-1.5 approximate matching algorithms in both upper and lower bound sides

    Sublinear Algorithm And Lower Bound For Combinatorial Problems

    Get PDF
    As the scale of the problems we want to solve in real life becomes larger, the input sizes of the problems we want to solve could be much larger than the memory of a single computer. In these cases, the classical algorithms may no longer be feasible options, even when they run in linear time and linear space, as the input size is too large. In this thesis, we study various combinatorial problems in different computation models that process large input sizes using limited resources. In particular, we consider the query model, streaming model, and massively parallel computation model. In addition, we also study the tradeoffs between the adaptivity and performance of algorithms in these models.We first consider two graph problems, vertex coloring problem and metric traveling salesman problem (TSP). The main results are structure results for these problems, which give frameworks for achieving sublinear algorithms of these problems in different models. We also show that the sublinear algorithms for (∆ + 1)-coloring problem are tight. We then consider the graph sparsification problem, which is an important technique for designing sublinear algorithms. We give proof of the existence of a linear size hypergraph cut sparsifier, along with a polynomial algorithm that calculates one. We also consider sublinear algorithms for this problem in the streaming and query models. Finally, we study the round complexity of submodular function minimization (SFM). In particular, we give a polynomial lower bound on the number of rounds we need to compute s − t max flow - a special case of SFM - in the streaming model. We also prove a polynomial lower bound on the number of rounds we need to solve the general SFM problem in polynomial queries
    • …
    corecore