454,577 research outputs found

    Deterministic Time-Space Tradeoffs for k-SUM

    Get PDF
    Given a set of numbers, the kk-SUM problem asks for a subset of kk numbers that sums to zero. When the numbers are integers, the time and space complexity of kk-SUM is generally studied in the word-RAM model; when the numbers are reals, the complexity is studied in the real-RAM model, and space is measured by the number of reals held in memory at any point. We present a time and space efficient deterministic self-reduction for the kk-SUM problem which holds for both models, and has many interesting consequences. To illustrate: * 33-SUM is in deterministic time O(n2lglg(n)/lg(n))O(n^2 \lg\lg(n)/\lg(n)) and space O(nlg(n)lglg(n))O\left(\sqrt{\frac{n \lg(n)}{\lg\lg(n)}}\right). In general, any polylogarithmic-time improvement over quadratic time for 33-SUM can be converted into an algorithm with an identical time improvement but low space complexity as well. * 33-SUM is in deterministic time O(n2)O(n^2) and space O(n)O(\sqrt n), derandomizing an algorithm of Wang. * A popular conjecture states that 3-SUM requires n2o(1)n^{2-o(1)} time on the word-RAM. We show that the 3-SUM Conjecture is in fact equivalent to the (seemingly weaker) conjecture that every O(n.51)O(n^{.51})-space algorithm for 33-SUM requires at least n2o(1)n^{2-o(1)} time on the word-RAM. * For k4k \ge 4, kk-SUM is in deterministic O(nk2+2/k)O(n^{k - 2 + 2/k}) time and O(n)O(\sqrt{n}) space

    Massively-Parallel Feature Selection for Big Data

    Full text link
    We present the Parallel, Forward-Backward with Pruning (PFBP) algorithm for feature selection (FS) in Big Data settings (high dimensionality and/or sample size). To tackle the challenges of Big Data FS PFBP partitions the data matrix both in terms of rows (samples, training examples) as well as columns (features). By employing the concepts of pp-values of conditional independence tests and meta-analysis techniques PFBP manages to rely only on computations local to a partition while minimizing communication costs. Then, it employs powerful and safe (asymptotically sound) heuristics to make early, approximate decisions, such as Early Dropping of features from consideration in subsequent iterations, Early Stopping of consideration of features within the same iteration, or Early Return of the winner in each iteration. PFBP provides asymptotic guarantees of optimality for data distributions faithfully representable by a causal network (Bayesian network or maximal ancestral graph). Our empirical analysis confirms a super-linear speedup of the algorithm with increasing sample size, linear scalability with respect to the number of features and processing cores, while dominating other competitive algorithms in its class

    Fast Deterministic Selection

    Get PDF
    The Median of Medians (also known as BFPRT) algorithm, although a landmark theoretical achievement, is seldom used in practice because it and its variants are slower than simple approaches based on sampling. The main contribution of this paper is a fast linear-time deterministic selection algorithm QuickselectAdaptive based on a refined definition of MedianOfMedians. The algorithm's performance brings deterministic selection---along with its desirable properties of reproducible runs, predictable run times, and immunity to pathological inputs---in the range of practicality. We demonstrate results on independent and identically distributed random inputs and on normally-distributed inputs. Measurements show that QuickselectAdaptive is faster than state-of-the-art baselines.Comment: Pre-publication draf

    Parallel Wavelet Tree Construction

    Full text link
    We present parallel algorithms for wavelet tree construction with polylogarithmic depth, improving upon the linear depth of the recent parallel algorithms by Fuentes-Sepulveda et al. We experimentally show on a 40-core machine with two-way hyper-threading that we outperform the existing parallel algorithms by 1.3--5.6x and achieve up to 27x speedup over the sequential algorithm on a variety of real-world and artificial inputs. Our algorithms show good scalability with increasing thread count, input size and alphabet size. We also discuss extensions to variants of the standard wavelet tree.Comment: This is a longer version of the paper that appears in the Proceedings of the IEEE Data Compression Conference, 201

    Exploiting hybrid parallelism in the kinematic analysis of multibody systems based on group equations

    Get PDF
    Computational kinematics is a fundamental tool for the design, simulation, control, optimization and dynamic analysis of multibody systems. The analysis of complex multibody systems and the need for real time solutions requires the development of kinematic and dynamic formulations that reduces computational cost, the selection and efficient use of the most appropriated solvers and the exploiting of all the computer resources using parallel computing techniques. The topological approach based on group equations and natural coordinates reduces the computation time in comparison with well-known global formulations and enables the use of parallelism techniques which can be applied at different levels: simultaneous solution of equations, use of multithreading routines, or a combination of both. This paper studies and compares these topological formulation and parallel techniques to ascertain which combination performs better in two applications. The first application uses dedicated systems for the real time control of small multibody systems, defined by a few number of equations and small linear systems, so shared-memory parallelism in combination with linear algebra routines is analyzed in a small multicore and in Raspberry Pi. The control of a Stewart platform is used as a case study. The second application studies large multibody systems in which the kinematic analysis must be performed several times during the design of multibody systems. A simulator which allows us to control the formulation, the solver, the parallel techniques and size of the problem has been developed and tested in more powerful computational systems with larger multicores and GPU.This work was supported by the Spanish MINECO, as well as European Commission FEDER funds, under grant TIN2015-66972-C5-3-
    corecore