4,445 research outputs found

    Products of Euclidean Metrics and Applications to Proximity Questions among Curves

    Get PDF
    International audienceThe problem of Approximate Nearest Neighbor (ANN) search is fundamental in computer science and has benefited from significant progress in the past couple of decades. However, most work has been devoted to pointsets whereas complex shapes have not been sufficiently treated. Here, we focus on distance functions between discretized curves in Euclidean space: they appear in a wide range of applications, from road segments and molecular backbones to time-series in general dimension. For p-products of Euclidean metrics, for any p ≥ 1, we design simple and efficient data structures for ANN, based on randomized projections, which are of independent interest. They serve to solve proximity problems under a notion of distance between discretized curves, which generalizes both discrete Fréchet and Dynamic Time Warping distances. These are the most popular and practical approaches to comparing such curves. We offer the first data structures and query algorithms for ANN with arbitrarily good approximation factor, at the expense of increasing space usage and preprocessing time over existing methods. Query time complexity is comparable or significantly improved by our algorithms; our approach is especially efficient when the length of the curves is bounded. 2012 ACM Subject Classification Theory of computation → Data structures design and analysi

    Learning Algebraic Varieties from Samples

    Full text link
    We seek to determine a real algebraic variety from a fixed finite subset of points. Existing methods are studied and new methods are developed. Our focus lies on aspects of topology and algebraic geometry, such as dimension and defining polynomials. All algorithms are tested on a range of datasets and made available in a Julia package

    Products of Euclidean Metrics, Applied to Proximity Problems among Curves

    Get PDF
    International audienceApproximate Nearest Neighbor (ANN) search is a fundamental computational problem that has benefited from significant progress in the past couple of decades. However, most work has been devoted to pointsets, whereas complex shapes have not been sufficiently addressed. Here, we focus on distance functions between discretized curves in Euclidean space: They appear in a wide range of applications, from road segments and molecular backbones to time-series in general dimension. For ℓp-products of Euclidean metrics, for any constant p, we propose simple and efficient data structures for ANN based on randomized projections: These data structures are of independent interest. Furthermore, they serve to solve proximity questions under a notion of distance between discretized curves, which generalizes both discrete Fréchet and Dynamic Time Warping distance functions. These are two very popular and practical approaches to comparing such curves. We offer, for both approaches, the first data structures and query algorithms for ANN with arbitrarily good approximation factor, at the expense of increasing space usage and preprocessing time over existing methods. Query time complexity is comparable or significantly improved by our methods; our algorithm is especially efficient when the length of the curves is bounded. Finally, we focus on discrete Fréchet distance when the ambient space is high dimensional and derive complexity bounds in terms of doubling dimension as well as an improved approximate near neighbor search

    CASP-DM: Context Aware Standard Process for Data Mining

    Get PDF
    We propose an extension of the Cross Industry Standard Process for Data Mining (CRISPDM) which addresses specific challenges of machine learning and data mining for context and model reuse handling. This new general context-aware process model is mapped with CRISP-DM reference model proposing some new or enhanced outputs

    The Quaternion-Based Spatial Coordinate and Orientation Frame Alignment Problems

    Full text link
    We review the general problem of finding a global rotation that transforms a given set of points and/or coordinate frames (the "test" data) into the best possible alignment with a corresponding set (the "reference" data). For 3D point data, this "orthogonal Procrustes problem" is often phrased in terms of minimizing a root-mean-square deviation or RMSD corresponding to a Euclidean distance measure relating the two sets of matched coordinates. We focus on quaternion eigensystem methods that have been exploited to solve this problem for at least five decades in several different bodies of scientific literature where they were discovered independently. While numerical methods for the eigenvalue solutions dominate much of this literature, it has long been realized that the quaternion-based RMSD optimization problem can also be solved using exact algebraic expressions based on the form of the quartic equation solution published by Cardano in 1545; we focus on these exact solutions to expose the structure of the entire eigensystem for the traditional 3D spatial alignment problem. We then explore the structure of the less-studied orientation data context, investigating how quaternion methods can be extended to solve the corresponding 3D quaternion orientation frame alignment (QFA) problem, noting the interesting equivalence of this problem to the rotation-averaging problem, which also has been the subject of independent literature threads. We conclude with a brief discussion of the combined 3D translation-orientation data alignment problem. Appendices are devoted to a tutorial on quaternion frames, a related quaternion technique for extracting quaternions from rotation matrices, and a review of quaternion rotation-averaging methods relevant to the orientation-frame alignment problem. Supplementary Material covers extensions of quaternion methods to the 4D problem.Comment: This replaces an early draft that lacked a number of important references to previous work. There are also additional graphics elements. The extensions to 4D data and additional details are worked out in the Supplementary Material appended to the main tex
    • …
    corecore