777,307 research outputs found
CloudTree: A Library to Extend Cloud Services for Trees
In this work, we propose a library that enables on a cloud the creation and
management of tree data structures from a cloud client. As a proof of concept,
we implement a new cloud service CloudTree. With CloudTree, users are able to
organize big data into tree data structures of their choice that are physically
stored in a cloud. We use caching, prefetching, and aggregation techniques in
the design and implementation of CloudTree to enhance performance. We have
implemented the services of Binary Search Trees (BST) and Prefix Trees as
current members in CloudTree and have benchmarked their performance using the
Amazon Cloud. The idea and techniques in the design and implementation of a BST
and prefix tree is generic and thus can also be used for other types of trees
such as B-tree, and other link-based data structures such as linked lists and
graphs. Preliminary experimental results show that CloudTree is useful and
efficient for various big data applications
Maximum Inner-Product Search using Tree Data-structures
The problem of {\em efficiently} finding the best match for a query in a
given set with respect to the Euclidean distance or the cosine similarity has
been extensively studied in literature. However, a closely related problem of
efficiently finding the best match with respect to the inner product has never
been explored in the general setting to the best of our knowledge. In this
paper we consider this general problem and contrast it with the existing
best-match algorithms. First, we propose a general branch-and-bound algorithm
using a tree data structure. Subsequently, we present a dual-tree algorithm for
the case where there are multiple queries. Finally we present a new data
structure for increasing the efficiency of the dual-tree algorithm. These
branch-and-bound algorithms involve novel bounds suited for the purpose of
best-matching with inner products. We evaluate our proposed algorithms on a
variety of data sets from various applications, and exhibit up to five orders
of magnitude improvement in query time over the naive search technique.Comment: Under submission in KDD 201
Random Indexing K-tree
Random Indexing (RI) K-tree is the combination of two algorithms for
clustering. Many large scale problems exist in document clustering. RI K-tree
scales well with large inputs due to its low complexity. It also exhibits
features that are useful for managing a changing collection. Furthermore, it
solves previous issues with sparse document vectors when using K-tree. The
algorithms and data structures are defined, explained and motivated. Specific
modifications to K-tree are made for use with RI. Experiments have been
executed to measure quality. The results indicate that RI K-tree improves
document cluster quality over the original K-tree algorithm.Comment: 8 pages, ADCS 2009; Hyperref and cleveref LaTeX packages conflicted.
Removed clevere
Building Efficient and Compact Data Structures for Simplicial Complexes
The Simplex Tree (ST) is a recently introduced data structure that can
represent abstract simplicial complexes of any dimension and allows efficient
implementation of a large range of basic operations on simplicial complexes. In
this paper, we show how to optimally compress the Simplex Tree while retaining
its functionalities. In addition, we propose two new data structures called the
Maximal Simplex Tree (MxST) and the Simplex Array List (SAL). We analyze the
compressed Simplex Tree, the Maximal Simplex Tree, and the Simplex Array List
under various settings.Comment: An extended abstract appeared in the proceedings of SoCG 201
Extracting Tree-structures in CT data by Tracking Multiple Statistically Ranked Hypotheses
In this work, we adapt a method based on multiple hypothesis tracking (MHT)
that has been shown to give state-of-the-art vessel segmentation results in
interactive settings, for the purpose of extracting trees. Regularly spaced
tubular templates are fit to image data forming local hypotheses. These local
hypotheses are used to construct the MHT tree, which is then traversed to make
segmentation decisions. However, some critical parameters in this method are
scale-dependent and have an adverse effect when tracking structures of varying
dimensions. We propose to use statistical ranking of local hypotheses in
constructing the MHT tree, which yields a probabilistic interpretation of
scores across scales and helps alleviate the scale-dependence of MHT
parameters. This enables our method to track trees starting from a single seed
point. Our method is evaluated on chest CT data to extract airway trees and
coronary arteries. In both cases, we show that our method performs
significantly better than the original MHT method.Comment: Accepted for publication at the International Journal of Medical
Physics and Practic
- …
