4,607 research outputs found
Products of Euclidean Metrics and Applications to Proximity Questions among Curves
International audienceThe problem of Approximate Nearest Neighbor (ANN) search is fundamental in computer science and has benefited from significant progress in the past couple of decades. However, most work has been devoted to pointsets whereas complex shapes have not been sufficiently treated. Here, we focus on distance functions between discretized curves in Euclidean space: they appear in a wide range of applications, from road segments and molecular backbones to time-series in general dimension. For p-products of Euclidean metrics, for any p ≥ 1, we design simple and efficient data structures for ANN, based on randomized projections, which are of independent interest. They serve to solve proximity problems under a notion of distance between discretized curves, which generalizes both discrete Fréchet and Dynamic Time Warping distances. These are the most popular and practical approaches to comparing such curves. We offer the first data structures and query algorithms for ANN with arbitrarily good approximation factor, at the expense of increasing space usage and preprocessing time over existing methods. Query time complexity is comparable or significantly improved by our algorithms; our approach is especially efficient when the length of the curves is bounded. 2012 ACM Subject Classification Theory of computation → Data structures design and analysi
Learning Algebraic Varieties from Samples
We seek to determine a real algebraic variety from a fixed finite subset of
points. Existing methods are studied and new methods are developed. Our focus
lies on aspects of topology and algebraic geometry, such as dimension and
defining polynomials. All algorithms are tested on a range of datasets and made
available in a Julia package
Products of Euclidean Metrics, Applied to Proximity Problems among Curves
International audienceApproximate Nearest Neighbor (ANN) search is a fundamental computational problem that has benefited from significant progress in the past couple of decades. However, most work has been devoted to pointsets, whereas complex shapes have not been sufficiently addressed. Here, we focus on distance functions between discretized curves in Euclidean space: They appear in a wide range of applications, from road segments and molecular backbones to time-series in general dimension. For ℓp-products of Euclidean metrics, for any constant p, we propose simple and efficient data structures for ANN based on randomized projections: These data structures are of independent interest. Furthermore, they serve to solve proximity questions under a notion of distance between discretized curves, which generalizes both discrete Fréchet and Dynamic Time Warping distance functions. These are two very popular and practical approaches to comparing such curves. We offer, for both approaches, the first data structures and query algorithms for ANN with arbitrarily good approximation factor, at the expense of increasing space usage and preprocessing time over existing methods. Query time complexity is comparable or significantly improved by our methods; our algorithm is especially efficient when the length of the curves is bounded. Finally, we focus on discrete Fréchet distance when the ambient space is high dimensional and derive complexity bounds in terms of doubling dimension as well as an improved approximate near neighbor search
CASP-DM: Context Aware Standard Process for Data Mining
We propose an extension of the Cross Industry Standard Process for Data
Mining (CRISPDM) which addresses specific challenges of machine learning and
data mining for context and model reuse handling. This new general
context-aware process model is mapped with CRISP-DM reference model proposing
some new or enhanced outputs
The Quaternion-Based Spatial Coordinate and Orientation Frame Alignment Problems
We review the general problem of finding a global rotation that transforms a
given set of points and/or coordinate frames (the "test" data) into the best
possible alignment with a corresponding set (the "reference" data). For 3D
point data, this "orthogonal Procrustes problem" is often phrased in terms of
minimizing a root-mean-square deviation or RMSD corresponding to a Euclidean
distance measure relating the two sets of matched coordinates. We focus on
quaternion eigensystem methods that have been exploited to solve this problem
for at least five decades in several different bodies of scientific literature
where they were discovered independently. While numerical methods for the
eigenvalue solutions dominate much of this literature, it has long been
realized that the quaternion-based RMSD optimization problem can also be solved
using exact algebraic expressions based on the form of the quartic equation
solution published by Cardano in 1545; we focus on these exact solutions to
expose the structure of the entire eigensystem for the traditional 3D spatial
alignment problem. We then explore the structure of the less-studied
orientation data context, investigating how quaternion methods can be extended
to solve the corresponding 3D quaternion orientation frame alignment (QFA)
problem, noting the interesting equivalence of this problem to the
rotation-averaging problem, which also has been the subject of independent
literature threads. We conclude with a brief discussion of the combined 3D
translation-orientation data alignment problem. Appendices are devoted to a
tutorial on quaternion frames, a related quaternion technique for extracting
quaternions from rotation matrices, and a review of quaternion
rotation-averaging methods relevant to the orientation-frame alignment problem.
Supplementary Material covers extensions of quaternion methods to the 4D
problem.Comment: This replaces an early draft that lacked a number of important
references to previous work. There are also additional graphics elements. The
extensions to 4D data and additional details are worked out in the
Supplementary Material appended to the main tex
- …