98 research outputs found

    On the General Chain Pair Simplification Problem

    Get PDF
    The Chain Pair Simplification problem (CPS) was posed by Bereg et al. who were motivated by the problem of efficiently computing and visualizing the structural resemblance between a pair of protein backbones. In this problem, given two polygonal chains of lengths n and m, the goal is to simplify both of them simultaneously, so that the lengths of the resulting simplifications as well as the discrete Frechet distance between them are bounded. When the vertices of the simplifications are arbitrary (i.e., not necessarily from the original chains), the problem is called General CPS (GCPS). In this paper we consider for the first time the complexity of GCPS under both the discrete Frechet distance (GCPS-3F) and the Hausdorff distance (GCPS-2H). (In the former version, the quality of the two simplifications is measured by the discrete Fr\u27echet distance, and in the latter version it is measured by the Hausdorff distance.) We prove that GCPS-3F is polynomially solvable, by presenting an widetilde-O((n+m)^6 min{n,m}) time algorithm for the corresponding minimization problem. We also present an O((n+m)^4) 2-approximation algorithm for the problem. On the other hand, we show that GCPS-2H is NP-complete, and present an approximation algorithm for the problem

    FRESH: Fréchet similarity with hashing

    Get PDF
    This paper studies the r-range search problem for curves under the continuous Fréchet distance: given a dataset S of n polygonal curves and a threshold >0 , construct a data structure that, for any query curve q, efficiently returns all entries in S with distance at most r from q. We propose FRESH, an approximate and randomized approach for r-range search, that leverages on a locality sensitive hashing scheme for detecting candidate near neighbors of the query curve, and on a subsequent pruning step based on a cascade of curve simplifications. We experimentally compare FRESH to exact and deterministic solutions, and we show that high performance can be reached by suitably relaxing precision and recall

    STATISTICS IN THE BILLERA-HOLMES-VOGTMANN TREESPACE

    Get PDF
    This dissertation is an effort to adapt two classical non-parametric statistical techniques, kernel density estimation (KDE) and principal components analysis (PCA), to the Billera-Holmes-Vogtmann (BHV) metric space for phylogenetic trees. This adaption gives a more general framework for developing and testing various hypotheses about apparent differences or similarities between sets of phylogenetic trees than currently exists. For example, while the majority of gene histories found in a clade of organisms are expected to be generated by a common evolutionary process, numerous other coexisting processes (e.g. horizontal gene transfers, gene duplication and subsequent neofunctionalization) will cause some genes to exhibit a history quite distinct from the histories of the majority of genes. Such “outlying” gene trees are considered to be biologically interesting and identifying these genes has become an important problem in phylogenetics. The R sofware package kdetrees, developed in Chapter 2, contains an implementation of the kernel density estimation method. The primary theoretical difficulty involved in this adaptation concerns the normalizion of the kernel functions in the BHV metric space. This problem is addressed in Chapter 3. In both chapters, the software package is applied to both simulated and empirical datasets to demonstrate the properties of the method. A few first theoretical steps in adaption of principal components analysis to the BHV space are presented in Chapter 4. It becomes necessary to generalize the notion of a set of perpendicular vectors in Euclidean space to the BHV metric space, but there some ambiguity about how to best proceed. We show that convex hulls are one reasonable approach to the problem. The Nye-PCA- algorithm provides a method of projecting onto arbitrary convex hulls in BHV space, providing the core of a modified PCA-type method

    Efficient motif discovery in spatial trajectories using discrete Fréchet distance

    Get PDF
    National Research Foundation (NRF) Singapore under International Research Centres in Singapore Funding Initiativ
    corecore