1,395 research outputs found
Efficient learning of neighbor representations for boundary trees and forests
We introduce a semiparametric approach to neighbor-based classification. We
build off the recently proposed Boundary Trees algorithm by Mathy et al.(2015)
which enables fast neighbor-based classification, regression and retrieval in
large datasets. While boundary trees use an Euclidean measure of similarity,
the Differentiable Boundary Tree algorithm by Zoran et al.(2017) was introduced
to learn low-dimensional representations of complex input data, on which
semantic similarity can be calculated to train boundary trees. As is pointed
out by its authors, the differentiable boundary tree approach contains a few
limitations that prevents it from scaling to large datasets. In this paper, we
introduce Differentiable Boundary Sets, an algorithm that overcomes the
computational issues of the differentiable boundary tree scheme and also
improves its classification accuracy and data representability. Our algorithm
is efficiently implementable with existing tools and offers a significant
reduction in training time. We test and compare the algorithms on the well
known MNIST handwritten digits dataset and the newer Fashion-MNIST dataset by
Xiao et al.(2017).Comment: 9 pages, 2 figure
Fast Algorithms and Efficient Statistics: N-point Correlation Functions
We present here a new algorithm for the fast computation of N-point
correlation functions in large astronomical data sets. The algorithm is based
on kdtrees which are decorated with cached sufficient statistics thus allowing
for orders of magnitude speed-ups over the naive non-tree-based implementation
of correlation functions. We further discuss the use of controlled
approximations within the computation which allows for further acceleration. In
summary, our algorithm now makes it possible to compute exact, all-pairs,
measurements of the 2, 3 and 4-point correlation functions for cosmological
data sets like the Sloan Digital Sky Survey (SDSS; York et al. 2000) and the
next generation of Cosmic Microwave Background experiments (see Szapudi et al.
2000).Comment: To appear in Proceedings of MPA/MPE/ESO Conference "Mining the Sky",
July 31 - August 4, 2000, Garching, German
Accelerating Nearest Neighbor Search on Manycore Systems
We develop methods for accelerating metric similarity search that are
effective on modern hardware. Our algorithms factor into easily parallelizable
components, making them simple to deploy and efficient on multicore CPUs and
GPUs. Despite the simple structure of our algorithms, their search performance
is provably sublinear in the size of the database, with a factor dependent only
on its intrinsic dimensionality. We demonstrate that our methods provide
substantial speedups on a range of datasets and hardware platforms. In
particular, we present results on a 48-core server machine, on graphics
hardware, and on a multicore desktop
High Dimensional Clustering with -nets
Clustering, a fundamental task in data science and machine learning, groups a
set of objects in such a way that objects in the same cluster are closer to
each other than to those in other clusters. In this paper, we consider a
well-known structure, so-called -nets, which rigorously captures the
properties of clustering. We devise algorithms that improve the run-time of
approximating -nets in high-dimensional spaces with and
metrics from to , where .
These algorithms are also used to improve a framework that provides approximate
solutions to other high dimensional distance problems. Using this framework,
several important related problems can also be solved efficiently, e.g.,
-approximate th-nearest neighbor distance,
-approximate Min-Max clustering, -approximate
-center clustering. In addition, we build an algorithm that
-approximates greedy permutations in time where is the spread of the input. This
algorithm is used to -approximate -center with the same time
complexity.Comment: Accepted by AAAI201
Parallel Maximum Clique Algorithms with Applications to Network Analysis and Storage
We propose a fast, parallel maximum clique algorithm for large sparse graphs
that is designed to exploit characteristics of social and information networks.
The method exhibits a roughly linear runtime scaling over real-world networks
ranging from 1000 to 100 million nodes. In a test on a social network with 1.8
billion edges, the algorithm finds the largest clique in about 20 minutes. Our
method employs a branch and bound strategy with novel and aggressive pruning
techniques. For instance, we use the core number of a vertex in combination
with a good heuristic clique finder to efficiently remove the vast majority of
the search space. In addition, we parallelize the exploration of the search
tree. During the search, processes immediately communicate changes to upper and
lower bounds on the size of maximum clique, which occasionally results in a
super-linear speedup because vertices with large search spaces can be pruned by
other processes. We apply the algorithm to two problems: to compute temporal
strong components and to compress graphs.Comment: 11 page
Perseus: Randomized Point-based Value Iteration for POMDPs
Partially observable Markov decision processes (POMDPs) form an attractive
and principled framework for agent planning under uncertainty. Point-based
approximate techniques for POMDPs compute a policy based on a finite set of
points collected in advance from the agents belief space. We present a
randomized point-based value iteration algorithm called Perseus. The algorithm
performs approximate value backup stages, ensuring that in each backup stage
the value of each point in the belief set is improved; the key observation is
that a single backup may improve the value of many belief points. Contrary to
other point-based methods, Perseus backs up only a (randomly selected) subset
of points in the belief set, sufficient for improving the value of each belief
point in the set. We show how the same idea can be extended to dealing with
continuous action spaces. Experimental results show the potential of Perseus in
large scale POMDP problems
- …