777,307 research outputs found

    CloudTree: A Library to Extend Cloud Services for Trees

    Full text link
    In this work, we propose a library that enables on a cloud the creation and management of tree data structures from a cloud client. As a proof of concept, we implement a new cloud service CloudTree. With CloudTree, users are able to organize big data into tree data structures of their choice that are physically stored in a cloud. We use caching, prefetching, and aggregation techniques in the design and implementation of CloudTree to enhance performance. We have implemented the services of Binary Search Trees (BST) and Prefix Trees as current members in CloudTree and have benchmarked their performance using the Amazon Cloud. The idea and techniques in the design and implementation of a BST and prefix tree is generic and thus can also be used for other types of trees such as B-tree, and other link-based data structures such as linked lists and graphs. Preliminary experimental results show that CloudTree is useful and efficient for various big data applications

    Maximum Inner-Product Search using Tree Data-structures

    Full text link
    The problem of {\em efficiently} finding the best match for a query in a given set with respect to the Euclidean distance or the cosine similarity has been extensively studied in literature. However, a closely related problem of efficiently finding the best match with respect to the inner product has never been explored in the general setting to the best of our knowledge. In this paper we consider this general problem and contrast it with the existing best-match algorithms. First, we propose a general branch-and-bound algorithm using a tree data structure. Subsequently, we present a dual-tree algorithm for the case where there are multiple queries. Finally we present a new data structure for increasing the efficiency of the dual-tree algorithm. These branch-and-bound algorithms involve novel bounds suited for the purpose of best-matching with inner products. We evaluate our proposed algorithms on a variety of data sets from various applications, and exhibit up to five orders of magnitude improvement in query time over the naive search technique.Comment: Under submission in KDD 201

    Random Indexing K-tree

    Get PDF
    Random Indexing (RI) K-tree is the combination of two algorithms for clustering. Many large scale problems exist in document clustering. RI K-tree scales well with large inputs due to its low complexity. It also exhibits features that are useful for managing a changing collection. Furthermore, it solves previous issues with sparse document vectors when using K-tree. The algorithms and data structures are defined, explained and motivated. Specific modifications to K-tree are made for use with RI. Experiments have been executed to measure quality. The results indicate that RI K-tree improves document cluster quality over the original K-tree algorithm.Comment: 8 pages, ADCS 2009; Hyperref and cleveref LaTeX packages conflicted. Removed clevere

    Building Efficient and Compact Data Structures for Simplicial Complexes

    Get PDF
    The Simplex Tree (ST) is a recently introduced data structure that can represent abstract simplicial complexes of any dimension and allows efficient implementation of a large range of basic operations on simplicial complexes. In this paper, we show how to optimally compress the Simplex Tree while retaining its functionalities. In addition, we propose two new data structures called the Maximal Simplex Tree (MxST) and the Simplex Array List (SAL). We analyze the compressed Simplex Tree, the Maximal Simplex Tree, and the Simplex Array List under various settings.Comment: An extended abstract appeared in the proceedings of SoCG 201

    Extracting Tree-structures in CT data by Tracking Multiple Statistically Ranked Hypotheses

    Full text link
    In this work, we adapt a method based on multiple hypothesis tracking (MHT) that has been shown to give state-of-the-art vessel segmentation results in interactive settings, for the purpose of extracting trees. Regularly spaced tubular templates are fit to image data forming local hypotheses. These local hypotheses are used to construct the MHT tree, which is then traversed to make segmentation decisions. However, some critical parameters in this method are scale-dependent and have an adverse effect when tracking structures of varying dimensions. We propose to use statistical ranking of local hypotheses in constructing the MHT tree, which yields a probabilistic interpretation of scores across scales and helps alleviate the scale-dependence of MHT parameters. This enables our method to track trees starting from a single seed point. Our method is evaluated on chest CT data to extract airway trees and coronary arteries. In both cases, we show that our method performs significantly better than the original MHT method.Comment: Accepted for publication at the International Journal of Medical Physics and Practic
    corecore