110 research outputs found
Sublinear Algorithms and Lower Bounds for Metric TSP Cost Estimation
We consider the problem of designing sublinear time algorithms for estimating
the cost of a minimum metric traveling salesman (TSP) tour. Specifically, given
access to a distance matrix that specifies pairwise distances
between points, the goal is to estimate the TSP cost by performing only
sublinear (in the size of ) queries. For the closely related problem of
estimating the weight of a metric minimum spanning tree (MST), it is known that
for any , there exists an
time algorithm that returns a -approximate estimate of the
MST cost. This result immediately implies an
time algorithm to estimate the TSP cost to within a factor
for any . However, no time algorithms are known to
approximate metric TSP to a factor that is strictly better than . On the
other hand, there were also no known barriers that rule out the existence of
-approximate estimation algorithms for metric TSP with
time for any fixed . In this paper, we make
progress on both algorithms and lower bounds for estimating metric TSP cost. We
also show that the problem of estimating metric TSP cost is closely connected
to the problem of estimating the size of a maximum matching in a graph.Comment: ICALP 202
Streaming Verification of Graph Properties
Streaming interactive proofs (SIPs) are a framework for outsourced
computation. A computationally limited streaming client (the verifier) hands
over a large data set to an untrusted server (the prover) in the cloud and the
two parties run a protocol to confirm the correctness of result with high
probability. SIPs are particularly interesting for problems that are hard to
solve (or even approximate) well in a streaming setting. The most notable of
these problems is finding maximum matchings, which has received intense
interest in recent years but has strong lower bounds even for constant factor
approximations.
In this paper, we present efficient streaming interactive proofs that can
verify maximum matchings exactly. Our results cover all flavors of matchings
(bipartite/non-bipartite and weighted). In addition, we also present streaming
verifiers for approximate metric TSP. In particular, these are the first
efficient results for weighted matchings and for metric TSP in any streaming
verification model.Comment: 26 pages, 2 figure, 1 tabl
Certified Computation from Unreliable Datasets
A wide range of learning tasks require human input in labeling massive data.
The collected data though are usually low quality and contain inaccuracies and
errors. As a result, modern science and business face the problem of learning
from unreliable data sets.
In this work, we provide a generic approach that is based on
\textit{verification} of only few records of the data set to guarantee high
quality learning outcomes for various optimization objectives. Our method,
identifies small sets of critical records and verifies their validity. We show
that many problems only need verifications, to
ensure that the output of the computation is at most a factor of away from the truth. For any given instance, we provide an
\textit{instance optimal} solution that verifies the minimum possible number of
records to approximately certify correctness. Then using this instance optimal
formulation of the problem we prove our main result: "every function that
satisfies some Lipschitz continuity condition can be certified with a small
number of verifications". We show that the required Lipschitz continuity
condition is satisfied even by some NP-complete problems, which illustrates the
generality and importance of this theorem.
In case this certification step fails, an invalid record will be identified.
Removing these records and repeating until success, guarantees that the result
will be accurate and will depend only on the verified records. Surprisingly, as
we show, for several computation tasks more efficient methods are possible.
These methods always guarantee that the produced result is not affected by the
invalid records, since any invalid record that affects the output will be
detected and verified
Doctor of Philosophy
dissertationThe contributions of this dissertation are centered around designing new algorithms in the general area of sublinear algorithms such as streaming, core sets and sublinear verification, with a special interest in problems arising from data analysis including data summarization, clustering, matrix problems and massive graphs. In the first part, we focus on summaries and coresets, which are among the main techniques for designing sublinear algorithms for massive data sets. We initiate the study of coresets for uncertain data and study coresets for various types of range counting queries on uncertain data. We focus mainly on the indecisive model of locational uncertainty since it comes up frequently in real-world applications when multiple readings of the same object are made. In this model, each uncertain point has a probability density describing its location, defined as distinct locations. Our goal is to construct a subset of the uncertain points, including their locational uncertainty, so that range counting queries can be answered by examining only this subset. For each type of query we provide coreset constructions with approximation-size trade-offs. We show that random sampling can be used to construct each type of coreset, and we also provide significantly improved bounds using discrepancy-based techniques on axis-aligned range queries. In the second part, we focus on designing sublinear-space algorithms for approximate computations on massive graphs. In particular, we consider graph MAXCUT and correlation clustering problems and develop sampling based approaches to construct truly sublinear () sized coresets for graphs that have polynomial (i.e., for any ) average degree. Our technique is based on analyzing properties of random induced subprograms of the linear program formulations of the problems. We demonstrate this technique with two examples. Firstly, we present a sublinear sized core set to approximate the value of the MAX CUT in a graph to a factor. To the best of our knowledge, all the known methods in this regime rely crucially on near-regularity assumptions. Secondly, we apply the same framework to construct a sublinear-sized coreset for correlation clustering. Our coreset construction also suggests 2-pass streaming algorithms for computing the MAX CUT and correlation clustering objective values which are left as future work at the time of writing this dissertation. Finally, we focus on streaming verification algorithms as another model for designing sublinear algorithms. We give the first polylog space and sublinear (in number of edges) communication protocols for any streaming verification problems in graphs. We present efficient streaming interactive proofs that can verify maximum matching exactly. Our results cover all flavors of matchings (bipartite/ nonbipartite and weighted). In addition, we also present streaming verifiers for approximate metric TSP and exact triangle counting, as well as for graph primitives such as the number of connected components, bipartiteness, minimum spanning tree and connectivity. In particular, these are the first results for weighted matchings and for metric TSP in any streaming verification model. Our streaming verifiers use only polylogarithmic space while exchanging only polylogarithmic communication with the prover in addition to the output size of the relevant solution. We also initiate a study of streaming interactive proofs (SIPs) for problems in data analysis and present efficient SIPs for some fundamental problems. We present protocols for clustering and shape fitting including minimum enclosing ball (MEB), width of a point set, -centers and -slab problem. We also present protocols for fundamental matrix analysis problems: We provide an improved protocol for rectangular matrix problems, which in turn can be used to verify (approximate) eigenvectors of an integer matrix . In general our solutions use polylogarithmic rounds of communication and polylogarithmic total communication and verifier space
05201 Abstracts Collection -- Design and Analysis of Randomized and Approximation Algorithms
From 15.05.05 to 20.05.05, the Dagstuhl Seminar 05201 ``Design and Analysis of Randomized and Approximation Algorithms\u27\u27 was held
in the International Conference and Research Center (IBFI),
Schloss Dagstuhl.
During the seminar, several participants presented their current
research, and ongoing work and open problems were discussed. Abstracts of
the presentations given during the seminar as well as abstracts of
seminar results and ideas are put together in this paper. The first section
describes the seminar topics and goals in general.
Links to extended abstracts or full papers are provided, if available
Sublinear Algorithms for (1.5+E)-Approximate Matching
We study sublinear time algorithms for estimating the size of maximummatching. After a long line of research, the problem was finally settled byBehnezhad [FOCS'22], in the regime where one is willing to pay an approximationfactor of . Very recently, Behnezhad et al.[SODA'23] improved theapproximation factor to using time. This improvement over the factor is, however, minuscule and theyasked if even -approximation is possible in time. Wegive a strong affirmative answer to this open problem by showing-approximation algorithms that run in time. Our approach is conceptually simple anddiverges from all previous sublinear-time matching algorithms: we show asublinear time algorithm for computing a variant of the edge-degree constrainedsubgraph (EDCS), a concept that has previously been exploited in dynamic[Bernstein Stein ICALP'15, SODA'16], distributed [Assadi et al. SODA'19] andstreaming [Bernstein ICALP'20] settings, but never before in the sublinearsetting. Independent work: Behnezhad, Roghani and Rubinstein [BRR'23]independently showed sublinear algorithms similar to our Theorem 1.2 in bothadjacency list and matrix models. Furthermore, in [BRR'23], they showadditional results on strictly better-than-1.5 approximate matching algorithmsin both upper and lower bound sides.<br
Sublinear Algorithms for -Approximate Matching
We study sublinear time algorithms for estimating the size of maximum
matching. After a long line of research, the problem was finally settled by
Behnezhad [FOCS'22], in the regime where one is willing to pay an approximation
factor of . Very recently, Behnezhad et al.[SODA'23] improved the
approximation factor to using
time. This improvement over the factor is, however, minuscule and they
asked if even -approximation is possible in time. We
give a strong affirmative answer to this open problem by showing
-approximation algorithms that run in
time. Our approach is conceptually simple and
diverges from all previous sublinear-time matching algorithms: we show a
sublinear time algorithm for computing a variant of the edge-degree constrained
subgraph (EDCS), a concept that has previously been exploited in dynamic
[Bernstein Stein ICALP'15, SODA'16], distributed [Assadi et al. SODA'19] and
streaming [Bernstein ICALP'20] settings, but never before in the sublinear
setting. Independent work: Behnezhad, Roghani and Rubinstein [BRR'23]
independently showed sublinear algorithms similar to our Theorem 1.2 in both
adjacency list and matrix models. Furthermore, in [BRR'23], they show
additional results on strictly better-than-1.5 approximate matching algorithms
in both upper and lower bound sides
Sublinear Algorithm And Lower Bound For Combinatorial Problems
As the scale of the problems we want to solve in real life becomes larger, the input sizes of the problems we want to solve could be much larger than the memory of a single computer. In these cases, the classical algorithms may no longer be feasible options, even when they run in linear time and linear space, as the input size is too large.
In this thesis, we study various combinatorial problems in different computation models that process large input sizes using limited resources. In particular, we consider the query model, streaming model, and massively parallel computation model. In addition, we also study the tradeoffs between the adaptivity and performance of algorithms in these models.We first consider two graph problems, vertex coloring problem and metric traveling salesman problem (TSP). The main results are structure results for these problems, which give frameworks for achieving sublinear algorithms of these problems in different models. We also show that the sublinear algorithms for (∆ + 1)-coloring problem are tight. We then consider the graph sparsification problem, which is an important technique for designing sublinear algorithms. We give proof of the existence of a linear size hypergraph cut sparsifier, along with a polynomial algorithm that calculates one. We also consider sublinear algorithms for this problem in the streaming and query models. Finally, we study the round complexity of submodular function minimization (SFM). In particular, we give a polynomial lower bound on the number of rounds we need to compute s − t max flow - a special case of SFM - in the streaming model. We also prove a polynomial lower bound on the number of rounds we need to solve the general SFM problem in polynomial queries
- …