8 research outputs found

    SOCP relaxation bounds for the optimal subset selection problem applied to robust linear regression

    Full text link
    This paper deals with the problem of finding the globally optimal subset of h elements from a larger set of n elements in d space dimensions so as to minimize a quadratic criterion, with an special emphasis on applications to computing the Least Trimmed Squares Estimator (LTSE) for robust regression. The computation of the LTSE is a challenging subset selection problem involving a nonlinear program with continuous and binary variables, linked in a highly nonlinear fashion. The selection of a globally optimal subset using the branch and bound (BB) algorithm is limited to problems in very low dimension, tipically d<5, as the complexity of the problem increases exponentially with d. We introduce a bold pruning strategy in the BB algorithm that results in a significant reduction in computing time, at the price of a negligeable accuracy lost. The novelty of our algorithm is that the bounds at nodes of the BB tree come from pseudo-convexifications derived using a linearization technique with approximate bounds for the nonlinear terms. The approximate bounds are computed solving an auxiliary semidefinite optimization problem. We show through a computational study that our algorithm performs well in a wide set of the most difficult instances of the LTSE problem.Comment: 12 pages, 3 figures, 2 table

    Least quantile regression via modern optimization

    Get PDF
    We address the Least Quantile of Squares (LQS) (and in particular the Least Median of Squares) regression problem using modern optimization methods. We propose a Mixed Integer Optimization (MIO) formulation of the LQS problem which allows us to find a provably global optimal solution for the LQS problem. Our MIO framework has the appealing characteristic that if we terminate the algorithm early, we obtain a solution with a guarantee on its sub-optimality. We also propose continuous optimization methods based on first-order subdifferential methods, sequential linear optimization and hybrid combinations of them to obtain near optimal solutions to the LQS problem. The MIO algorithm is found to benefit significantly from high quality solutions delivered by our continuous optimization based methods. We further show that the MIO approach leads to (a) an optimal solution for any dataset, where the data-points (yi,xi)(y_i,\mathbf{x}_i)'s are not necessarily in general position, (b) a simple proof of the breakdown point of the LQS objective value that holds for any dataset and (c) an extension to situations where there are polyhedral constraints on the regression coefficient vector. We report computational results with both synthetic and real-world datasets showing that the MIO algorithm with warm starts from the continuous optimization methods solve small (n=100n=100) and medium (n=500n=500) size problems to provable optimality in under two hours, and outperform all publicly available methods for large-scale (n=n={}10,000) LQS problems.Comment: Published in at http://dx.doi.org/10.1214/14-AOS1223 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    All non-trivial variants of 3-LDT are equivalent

    Full text link
    The popular 3-SUM conjecture states that there is no strongly subquadratic time algorithm for checking if a given set of integers contains three distinct elements that sum up to zero. A closely related problem is to check if a given set of integers contains distinct x1,x2,x3x_1, x_2, x_3 such that x1+x2=2x3x_1+x_2=2x_3. This can be reduced to 3-SUM in almost-linear time, but surprisingly a reverse reduction establishing 3-SUM hardness was not known. We provide such a reduction, thus resolving an open question of Erickson. In fact, we consider a more general problem called 3-LDT parameterized by integer parameters α1,α2,α3\alpha_1, \alpha_2, \alpha_3 and tt. In this problem, we need to check if a given set of integers contains distinct elements x1,x2,x3x_1, x_2, x_3 such that α1x1+α2x2+α3x3=t\alpha_1 x_1+\alpha_2 x_2 +\alpha_3 x_3 = t. For some combinations of the parameters, every instance of this problem is a NO-instance or there exists a simple almost-linear time algorithm. We call such variants trivial. We prove that all non-trivial variants of 3-LDT are equivalent under subquadratic reductions. Our main technical contribution is an efficient deterministic procedure based on the famous Behrend's construction that partitions a given set of integers into few subsets that avoid a chosen linear equation

    Data Structures Meet Cryptography: 3SUM with Preprocessing

    Full text link
    This paper shows several connections between data structure problems and cryptography against preprocessing attacks. Our results span data structure upper bounds, cryptographic applications, and data structure lower bounds, as summarized next. First, we apply Fiat--Naor inversion, a technique with cryptographic origins, to obtain a data structure upper bound. In particular, our technique yields a suite of algorithms with space SS and (online) time TT for a preprocessing version of the NN-input 3SUM problem where S3⋅T=O~(N6)S^3\cdot T = \widetilde{O}(N^6). This disproves a strong conjecture (Goldstein et al., WADS 2017) that there is no data structure that solves this problem for S=N2−δS=N^{2-\delta} and T=N1−δT = N^{1-\delta} for any constant δ>0\delta>0. Secondly, we show equivalence between lower bounds for a broad class of (static) data structure problems and one-way functions in the random oracle model that resist a very strong form of preprocessing attack. Concretely, given a random function F:[N]→[N]F: [N] \to [N] (accessed as an oracle) we show how to compile it into a function GF:[N2]→[N2]G^F: [N^2] \to [N^2] which resists SS-bit preprocessing attacks that run in query time TT where ST=O(N2−ε)ST=O(N^{2-\varepsilon}) (assuming a corresponding data structure lower bound on 3SUM). In contrast, a classical result of Hellman tells us that FF itself can be more easily inverted, say with N2/3N^{2/3}-bit preprocessing in N2/3N^{2/3} time. We also show that much stronger lower bounds follow from the hardness of kSUM. Our results can be equivalently interpreted as security against adversaries that are very non-uniform, or have large auxiliary input, or as security in the face of a powerfully backdoored random oracle. Thirdly, we give non-adaptive lower bounds for 3SUM and a range of geometric problems which match the best known lower bounds for static data structure problems

    Segmentation and wake removal of seafaring vessels in optical satellite images

    Get PDF
    ABSTRACT This paper aims at the segmentation of seafaring vessels in optical satellite images, which allows an accurate length estimation. In maritime situation awareness, vessel length is an important parameter to classify a vessel. The proposed segmentation system consists of robust foreground-background separation, wake detection and ship-wake separation, simultaneous position and profile clustering and a special module for small vessel segmentation. We compared our system with a baseline implementation on 53 vessels that were observed with GeoEye-1. The results show that the relative L1 error in the length estimation is reduced from 3.9 to 0.5, which is an improvement of 87%. We learned that the wake removal is an important element for the accurate segmentation and length estimation of ships

    On the Least Median Square Problem

    No full text
    We consider the exact and approximate computational complexity of the multivariate LMS linear regression estimator. The LMS estimator is among the most widely used robust linear statistical estimators. Given a set of n points in IR and a parameter k, the problem is equivalent to computing the slab bounded by two parallel hyperplanes of minimum separation that contains k of the points. We present algorithms for the exact and approximate versions of the multivariate LMS problem. We also provide nearly matching lower bounds on the computational complexity of these problems
    corecore