8 research outputs found
SOCP relaxation bounds for the optimal subset selection problem applied to robust linear regression
This paper deals with the problem of finding the globally optimal subset of h
elements from a larger set of n elements in d space dimensions so as to
minimize a quadratic criterion, with an special emphasis on applications to
computing the Least Trimmed Squares Estimator (LTSE) for robust regression. The
computation of the LTSE is a challenging subset selection problem involving a
nonlinear program with continuous and binary variables, linked in a highly
nonlinear fashion. The selection of a globally optimal subset using the branch
and bound (BB) algorithm is limited to problems in very low dimension,
tipically d<5, as the complexity of the problem increases exponentially with d.
We introduce a bold pruning strategy in the BB algorithm that results in a
significant reduction in computing time, at the price of a negligeable accuracy
lost. The novelty of our algorithm is that the bounds at nodes of the BB tree
come from pseudo-convexifications derived using a linearization technique with
approximate bounds for the nonlinear terms. The approximate bounds are computed
solving an auxiliary semidefinite optimization problem. We show through a
computational study that our algorithm performs well in a wide set of the most
difficult instances of the LTSE problem.Comment: 12 pages, 3 figures, 2 table
Least quantile regression via modern optimization
We address the Least Quantile of Squares (LQS) (and in particular the Least
Median of Squares) regression problem using modern optimization methods. We
propose a Mixed Integer Optimization (MIO) formulation of the LQS problem which
allows us to find a provably global optimal solution for the LQS problem. Our
MIO framework has the appealing characteristic that if we terminate the
algorithm early, we obtain a solution with a guarantee on its sub-optimality.
We also propose continuous optimization methods based on first-order
subdifferential methods, sequential linear optimization and hybrid combinations
of them to obtain near optimal solutions to the LQS problem. The MIO algorithm
is found to benefit significantly from high quality solutions delivered by our
continuous optimization based methods. We further show that the MIO approach
leads to (a) an optimal solution for any dataset, where the data-points
's are not necessarily in general position, (b) a simple
proof of the breakdown point of the LQS objective value that holds for any
dataset and (c) an extension to situations where there are polyhedral
constraints on the regression coefficient vector. We report computational
results with both synthetic and real-world datasets showing that the MIO
algorithm with warm starts from the continuous optimization methods solve small
() and medium () size problems to provable optimality in under
two hours, and outperform all publicly available methods for large-scale
(10,000) LQS problems.Comment: Published in at http://dx.doi.org/10.1214/14-AOS1223 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
All non-trivial variants of 3-LDT are equivalent
The popular 3-SUM conjecture states that there is no strongly subquadratic
time algorithm for checking if a given set of integers contains three distinct
elements that sum up to zero. A closely related problem is to check if a given
set of integers contains distinct such that .
This can be reduced to 3-SUM in almost-linear time, but surprisingly a reverse
reduction establishing 3-SUM hardness was not known.
We provide such a reduction, thus resolving an open question of Erickson. In
fact, we consider a more general problem called 3-LDT parameterized by integer
parameters and . In this problem, we need to
check if a given set of integers contains distinct elements
such that . For some combinations
of the parameters, every instance of this problem is a NO-instance or there
exists a simple almost-linear time algorithm. We call such variants trivial. We
prove that all non-trivial variants of 3-LDT are equivalent under subquadratic
reductions. Our main technical contribution is an efficient deterministic
procedure based on the famous Behrend's construction that partitions a given
set of integers into few subsets that avoid a chosen linear equation
Data Structures Meet Cryptography: 3SUM with Preprocessing
This paper shows several connections between data structure problems and
cryptography against preprocessing attacks. Our results span data structure
upper bounds, cryptographic applications, and data structure lower bounds, as
summarized next.
First, we apply Fiat--Naor inversion, a technique with cryptographic origins,
to obtain a data structure upper bound. In particular, our technique yields a
suite of algorithms with space and (online) time for a preprocessing
version of the -input 3SUM problem where .
This disproves a strong conjecture (Goldstein et al., WADS 2017) that there is
no data structure that solves this problem for and for any constant .
Secondly, we show equivalence between lower bounds for a broad class of
(static) data structure problems and one-way functions in the random oracle
model that resist a very strong form of preprocessing attack. Concretely, given
a random function (accessed as an oracle) we show how to
compile it into a function which resists -bit
preprocessing attacks that run in query time where
(assuming a corresponding data structure lower bound
on 3SUM). In contrast, a classical result of Hellman tells us that itself
can be more easily inverted, say with -bit preprocessing in
time. We also show that much stronger lower bounds follow from the hardness of
kSUM. Our results can be equivalently interpreted as security against
adversaries that are very non-uniform, or have large auxiliary input, or as
security in the face of a powerfully backdoored random oracle.
Thirdly, we give non-adaptive lower bounds for 3SUM and a range of geometric
problems which match the best known lower bounds for static data structure
problems
Segmentation and wake removal of seafaring vessels in optical satellite images
ABSTRACT This paper aims at the segmentation of seafaring vessels in optical satellite images, which allows an accurate length estimation. In maritime situation awareness, vessel length is an important parameter to classify a vessel. The proposed segmentation system consists of robust foreground-background separation, wake detection and ship-wake separation, simultaneous position and profile clustering and a special module for small vessel segmentation. We compared our system with a baseline implementation on 53 vessels that were observed with GeoEye-1. The results show that the relative L1 error in the length estimation is reduced from 3.9 to 0.5, which is an improvement of 87%. We learned that the wake removal is an important element for the accurate segmentation and length estimation of ships
On the Least Median Square Problem
We consider the exact and approximate computational complexity of the multivariate LMS linear regression estimator. The LMS estimator is among the most widely used robust linear statistical estimators. Given a set of n points in IR and a parameter k, the problem is equivalent to computing the slab bounded by two parallel hyperplanes of minimum separation that contains k of the points. We present algorithms for the exact and approximate versions of the multivariate LMS problem. We also provide nearly matching lower bounds on the computational complexity of these problems