21,499 research outputs found
Anytime Hierarchical Clustering
We propose a new anytime hierarchical clustering method that iteratively
transforms an arbitrary initial hierarchy on the configuration of measurements
along a sequence of trees we prove for a fixed data set must terminate in a
chain of nested partitions that satisfies a natural homogeneity requirement.
Each recursive step re-edits the tree so as to improve a local measure of
cluster homogeneity that is compatible with a number of commonly used (e.g.,
single, average, complete) linkage functions. As an alternative to the standard
batch algorithms, we present numerical evidence to suggest that appropriate
adaptations of this method can yield decentralized, scalable algorithms
suitable for distributed/parallel computation of clustering hierarchies and
online tracking of clustering trees applicable to large, dynamically changing
databases and anomaly detection.Comment: 13 pages, 6 figures, 5 tables, in preparation for submission to a
conferenc
Optimal Hierarchical Layouts for Cache-Oblivious Search Trees
This paper proposes a general framework for generating cache-oblivious
layouts for binary search trees. A cache-oblivious layout attempts to minimize
cache misses on any hierarchical memory, independent of the number of memory
levels and attributes at each level such as cache size, line size, and
replacement policy. Recursively partitioning a tree into contiguous subtrees
and prescribing an ordering amongst the subtrees, Hierarchical Layouts
generalize many commonly used layouts for trees such as in-order, pre-order and
breadth-first. They also generalize the various flavors of the van Emde Boas
layout, which have previously been used as cache-oblivious layouts.
Hierarchical Layouts thus unify all previous attempts at deriving layouts for
search trees.
The paper then derives a new locality measure (the Weighted Edge Product)
that mimics the probability of cache misses at multiple levels, and shows that
layouts that reduce this measure perform better. We analyze the various degrees
of freedom in the construction of Hierarchical Layouts, and investigate the
relative effect of each of these decisions in the construction of
cache-oblivious layouts. Optimizing the Weighted Edge Product for complete
binary search trees, we introduce the MinWEP layout, and show that it
outperforms previously used cache-oblivious layouts by almost 20%.Comment: Extended version with proofs added to the appendi
Capturing Hiproofs in HOL Light
Hierarchical proof trees (hiproofs for short) add structure to ordinary proof
trees, by allowing portions of trees to be hierarchically nested. The
additional structure can be used to abstract away from details, or to label
particular portions to explain their purpose. In this paper we present two
complementary methods for capturing hiproofs in HOL Light, along with a tool to
produce web-based visualisations. The first method uses tactic recording, by
modifying tactics to record their arguments and construct a hierarchical tree;
this allows a tactic proof script to be modified. The second method uses proof
recording, which extends the HOL Light kernel to record hierachical proof trees
alongside theorems. This method is less invasive, but requires care to manage
the size of the recorded objects. We have implemented both methods, resulting
in two systems: Tactician and HipCam
SenseCam image localisation using hierarchical SURF trees
The SenseCam is a wearable camera that automatically takes photos of the wearer's activities, generating thousands of images per day.
Automatically organising these images for efficient search and retrieval is a challenging task, but can be simplified by providing
semantic information with each photo, such as the wearer's location during capture time. We propose a method for automatically determining the wearer's location using an annotated image database, described using SURF interest point descriptors. We show that SURF out-performs SIFT in matching SenseCam images and that matching can be done efficiently using hierarchical trees of SURF descriptors. Additionally, by re-ranking the top images using bi-directional SURF matches, location matching performance is improved further
Partial match queries in relaxed K-dt trees
The study of partial match queries on random hierarchical multidimensional data structures dates back to Ph. Flajolet and C. Puech’s 1986 seminal paper on partial match retrieval. It was not until recently that fixed (as opposed to random) partial match queries were studied for random relaxed K-d trees, random standard K-d trees, and random 2-dimensional quad trees. Based on those results it seemed
natural to classify the general form of the cost of fixed partial match queries into two families: that of either random hierarchical structures or perfectly balanced structures, as conjectured by Duch, Lau and Martínez (On the Cost of Fixed Partial Queries in K-d trees Algorithmica, 75(4):684–723, 2016). Here we show that the conjecture just mentioned does not hold by introducing relaxed K-dt trees and providing the average-case analysis for random partial match queries as well as some advances on the average-case analysis for fixed partial match queries on them. In fact this cost –for fixed partial match queries– does not follow the conjectured forms.Peer ReviewedPostprint (author's final draft
Correlation, hierarchies, and networks in financial markets
We discuss some methods to quantitatively investigate the properties of
correlation matrices. Correlation matrices play an important role in portfolio
optimization and in several other quantitative descriptions of asset price
dynamics in financial markets. Specifically, we discuss how to define and
obtain hierarchical trees, correlation based trees and networks from a
correlation matrix. The hierarchical clustering and other procedures performed
on the correlation matrix to detect statistically reliable aspects of the
correlation matrix are seen as filtering procedures of the correlation matrix.
We also discuss a method to associate a hierarchically nested factor model to a
hierarchical tree obtained from a correlation matrix. The information retained
in filtering procedures and its stability with respect to statistical
fluctuations is quantified by using the Kullback-Leibler distance.Comment: 37 pages, 9 figures, 3 table
- …