41,772 research outputs found
An exponential lower bound for Individualization-Refinement algorithms for Graph Isomorphism
The individualization-refinement paradigm provides a strong toolbox for
testing isomorphism of two graphs and indeed, the currently fastest
implementations of isomorphism solvers all follow this approach. While these
solvers are fast in practice, from a theoretical point of view, no general
lower bounds concerning the worst case complexity of these tools are known. In
fact, it is an open question whether individualization-refinement algorithms
can achieve upper bounds on the running time similar to the more theoretical
techniques based on a group theoretic approach.
In this work we give a negative answer to this question and construct a
family of graphs on which algorithms based on the individualization-refinement
paradigm require exponential time. Contrary to a previous construction of
Miyazaki, that only applies to a specific implementation within the
individualization-refinement framework, our construction is immune to changing
the cell selector, or adding various heuristic invariants to the algorithm.
Furthermore, our graphs also provide exponential lower bounds in the case when
the -dimensional Weisfeiler-Leman algorithm is used to replace the standard
color refinement operator and the arguments even work when the entire
automorphism group of the inputs is initially provided to the algorithm.Comment: 21 page
On the Optimality of Pseudo-polynomial Algorithms for Integer Programming
In the classic Integer Programming (IP) problem, the objective is to decide
whether, for a given matrix and an -vector , there is a non-negative integer -vector such that . Solving
(IP) is an important step in numerous algorithms and it is important to obtain
an understanding of the precise complexity of this problem as a function of
natural parameters of the input.
The classic pseudo-polynomial time algorithm of Papadimitriou [J. ACM 1981]
for instances of (IP) with a constant number of constraints was only recently
improved upon by Eisenbrand and Weismantel [SODA 2018] and Jansen and Rohwedder
[ArXiv 2018]. We continue this line of work and show that under the Exponential
Time Hypothesis (ETH), the algorithm of Jansen and Rohwedder is nearly optimal.
We also show that when the matrix is assumed to be non-negative, a
component of Papadimitriou's original algorithm is already nearly optimal under
ETH.
This motivates us to pick up the line of research initiated by Cunningham and
Geelen [IPCO 2007] who studied the complexity of solving (IP) with non-negative
matrices in which the number of constraints may be unbounded, but the
branch-width of the column-matroid corresponding to the constraint matrix is a
constant. We prove a lower bound on the complexity of solving (IP) for such
instances and obtain optimal results with respect to a closely related
parameter, path-width. Specifically, we prove matching upper and lower bounds
for (IP) when the path-width of the corresponding column-matroid is a constant.Comment: 29 pages, To appear in ESA 201
Supporting User-Defined Functions on Uncertain Data
Uncertain data management has become crucial in many sensing and scientific applications. As user-defined functions (UDFs) become widely used in these applications, an important task is to capture result uncertainty for queries that evaluate UDFs on uncertain data. In this work, we provide a general framework for supporting UDFs on uncertain data. Specifically, we propose a learning approach based on Gaussian processes (GPs) to compute approximate output distributions of a UDF when evaluated on uncertain input, with guaranteed error bounds. We also devise an online algorithm to compute such output distributions, which employs a suite of optimizations to improve accuracy and performance. Our evaluation using both real-world and synthetic functions shows that our proposed GP approach can outperform the state-of-the-art sampling approach with up to two orders of magnitude improvement for a variety of UDFs. 1
Dispersion for Data-Driven Algorithm Design, Online Learning, and Private Optimization
Data-driven algorithm design, that is, choosing the best algorithm for a
specific application, is a crucial problem in modern data science.
Practitioners often optimize over a parameterized algorithm family, tuning
parameters based on problems from their domain. These procedures have
historically come with no guarantees, though a recent line of work studies
algorithm selection from a theoretical perspective. We advance the foundations
of this field in several directions: we analyze online algorithm selection,
where problems arrive one-by-one and the goal is to minimize regret, and
private algorithm selection, where the goal is to find good parameters over a
set of problems without revealing sensitive information contained therein. We
study important algorithm families, including SDP-rounding schemes for problems
formulated as integer quadratic programs, and greedy techniques for canonical
subset selection problems. In these cases, the algorithm's performance is a
volatile and piecewise Lipschitz function of its parameters, since tweaking the
parameters can completely change the algorithm's behavior. We give a sufficient
and general condition, dispersion, defining a family of piecewise Lipschitz
functions that can be optimized online and privately, which includes the
functions measuring the performance of the algorithms we study. Intuitively, a
set of piecewise Lipschitz functions is dispersed if no small region contains
many of the functions' discontinuities. We present general techniques for
online and private optimization of the sum of dispersed piecewise Lipschitz
functions. We improve over the best-known regret bounds for a variety of
problems, prove regret bounds for problems not previously studied, and give
matching lower bounds. We also give matching upper and lower bounds on the
utility loss due to privacy. Moreover, we uncover dispersion in auction design
and pricing problems
Structurally Parameterized d-Scattered Set
In -Scattered Set we are given an (edge-weighted) graph and are asked to
select at least vertices, so that the distance between any pair is at least
, thus generalizing Independent Set. We provide upper and lower bounds on
the complexity of this problem with respect to various standard graph
parameters. In particular, we show the following:
- For any , an -time algorithm, where
is the treewidth of the input graph.
- A tight SETH-based lower bound matching this algorithm's performance. These
generalize known results for Independent Set.
- -Scattered Set is W[1]-hard parameterized by vertex cover (for
edge-weighted graphs), or feedback vertex set (for unweighted graphs), even if
is an additional parameter.
- A single-exponential algorithm parameterized by vertex cover for unweighted
graphs, complementing the above-mentioned hardness.
- A -time algorithm parameterized by tree-depth
(), as well as a matching ETH-based lower bound, both for
unweighted graphs.
We complement these mostly negative results by providing an FPT approximation
scheme parameterized by treewidth. In particular, we give an algorithm which,
for any error parameter , runs in time
and returns a
-scattered set of size , if a -scattered set of the same
size exists
Analysis of Noisy Evolutionary Optimization When Sampling Fails
In noisy evolutionary optimization, sampling is a common strategy to deal
with noise. By the sampling strategy, the fitness of a solution is evaluated
multiple times (called \emph{sample size}) independently, and its true fitness
is then approximated by the average of these evaluations. Previous studies on
sampling are mainly empirical. In this paper, we first investigate the effect
of sample size from a theoretical perspective. By analyzing the (1+1)-EA on the
noisy LeadingOnes problem, we show that as the sample size increases, the
running time can reduce from exponential to polynomial, but then return to
exponential. This suggests that a proper sample size is crucial in practice.
Then, we investigate what strategies can work when sampling with any fixed
sample size fails. By two illustrative examples, we prove that using parent or
offspring populations can be better. Finally, we construct an artificial noisy
example to show that when using neither sampling nor populations is effective,
adaptive sampling (i.e., sampling with an adaptive sample size) can work. This,
for the first time, provides a theoretical support for the use of adaptive
sampling
- âŠ