Search CORE

1,395 research outputs found

Efficient learning of neighbor representations for boundary trees and forests

Author: Adikari Tharindu
Draper Stark C.
Publication venue
Publication date: 25/10/2018
Field of study

We introduce a semiparametric approach to neighbor-based classification. We build off the recently proposed Boundary Trees algorithm by Mathy et al.(2015) which enables fast neighbor-based classification, regression and retrieval in large datasets. While boundary trees use an Euclidean measure of similarity, the Differentiable Boundary Tree algorithm by Zoran et al.(2017) was introduced to learn low-dimensional representations of complex input data, on which semantic similarity can be calculated to train boundary trees. As is pointed out by its authors, the differentiable boundary tree approach contains a few limitations that prevents it from scaling to large datasets. In this paper, we introduce Differentiable Boundary Sets, an algorithm that overcomes the computational issues of the differentiable boundary tree scheme and also improves its classification accuracy and data representability. Our algorithm is efficiently implementable with existing tools and offers a significant reduction in training time. We test and compare the algorithms on the well known MNIST handwritten digits dataset and the newer Fashion-MNIST dataset by Xiao et al.(2017).Comment: 9 pages, 2 figure

arXiv.org e-Print Archive

Crossref

Fast Algorithms and Efficient Statistics: N-point Correlation Functions

Author: Alex Gray
Alex Szalay
Andrew Moore
Andy Connolly
Andy Genovese
II
Istvan Szapudi
Jeff Schneider
Larry Grone
Larry Wasserman
Nick Kanidoris
Robert Nichol
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2000
Field of study

We present here a new algorithm for the fast computation of N-point correlation functions in large astronomical data sets. The algorithm is based on kdtrees which are decorated with cached sufficient statistics thus allowing for orders of magnitude speed-ups over the naive non-tree-based implementation of correlation functions. We further discuss the use of controlled approximations within the computation which allows for further acceleration. In summary, our algorithm now makes it possible to compute exact, all-pairs, measurements of the 2, 3 and 4-point correlation functions for cosmological data sets like the Sloan Digital Sky Survey (SDSS; York et al. 2000) and the next generation of Cosmic Microwave Background experiments (see Szapudi et al. 2000).Comment: To appear in Proceedings of MPA/MPE/ESO Conference "Mining the Sky", July 31 - August 4, 2000, Garching, German

arXiv.org e-Print Archive

CiteSeerX

Crossref

CERN Document Server

Accelerating Nearest Neighbor Search on Manycore Systems

Author: Cayton Lawrence
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2011
Field of study

We develop methods for accelerating metric similarity search that are effective on modern hardware. Our algorithms factor into easily parallelizable components, making them simple to deploy and efficient on multicore CPUs and GPUs. Despite the simple structure of our algorithms, their search performance is provably sublinear in the size of the database, with a factor dependent only on its intrinsic dimensionality. We demonstrate that our methods provide substantial speedups on a range of datasets and hardware platforms. In particular, we present results on a 48-core server machine, on graphics hardware, and on a multicore desktop

arXiv.org e-Print Archive

CiteSeerX

Crossref

MPG.PuRe

High Dimensional Clustering with $r$ -nets

Author: Avarikioti Georgia
Ryser Alain
Wang Yuyi
Wattenhofer Roger
Publication venue
Publication date: 06/11/2018
Field of study

Clustering, a fundamental task in data science and machine learning, groups a set of objects in such a way that objects in the same cluster are closer to each other than to those in other clusters. In this paper, we consider a well-known structure, so-called

r

-nets, which rigorously captures the properties of clustering. We devise algorithms that improve the run-time of approximating

r

-nets in high-dimensional spaces with

\ell_1

and

\ell_2

metrics from

\tilde{O}(dn^{2-\Theta(\sqrt{\epsilon})})

\tilde{O}(dn + n^{2-\alpha})

, where

\alpha = \Omega({\epsilon^{1/3}}/{\log(1/\epsilon)})

. These algorithms are also used to improve a framework that provides approximate solutions to other high dimensional distance problems. Using this framework, several important related problems can also be solved efficiently, e.g.,

(1+\epsilon)

-approximate

k

th-nearest neighbor distance,

(4+\epsilon)

-approximate Min-Max clustering,

(4+\epsilon)

-approximate

k

-center clustering. In addition, we build an algorithm that

(1+\epsilon)

-approximates greedy permutations in time

\tilde{O}((dn + n^{2-\alpha}) \cdot \log{\Phi})

where

\Phi

is the spread of the input. This algorithm is used to

(2+\epsilon)

-approximate

k

-center with the same time complexity.Comment: Accepted by AAAI201

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Parallel Maximum Clique Algorithms with Applications to Network Analysis and Storage

Author: Ali Patwary
Assefaw H. Gebremedhin
David F. Gleich
Md. Mostofa
Ryan A. Rossi
Publication venue
Publication date: 25/12/2013
Field of study

We propose a fast, parallel maximum clique algorithm for large sparse graphs that is designed to exploit characteristics of social and information networks. The method exhibits a roughly linear runtime scaling over real-world networks ranging from 1000 to 100 million nodes. In a test on a social network with 1.8 billion edges, the algorithm finds the largest clique in about 20 minutes. Our method employs a branch and bound strategy with novel and aggressive pruning techniques. For instance, we use the core number of a vertex in combination with a good heuristic clique finder to efficiently remove the vast majority of the search space. In addition, we parallelize the exploration of the search tree. During the search, processes immediately communicate changes to upper and lower bounds on the size of maximum clique, which occasionally results in a super-linear speedup because vertices with large search spaces can be pruned by other processes. We apply the algorithm to two problems: to compute temporal strong components and to compress graphs.Comment: 11 page

arXiv.org e-Print Archive

CiteSeerX

Perseus: Randomized Point-based Value Iteration for POMDPs

Author: Spaan M. T. J.
Vlassis N.
Publication venue: 'AI Access Foundation'
Publication date: 09/09/2011
Field of study

Partially observable Markov decision processes (POMDPs) form an attractive and principled framework for agent planning under uncertainty. Point-based approximate techniques for POMDPs compute a policy based on a finite set of points collected in advance from the agents belief space. We present a randomized point-based value iteration algorithm called Perseus. The algorithm performs approximate value backup stages, ensuring that in each backup stage the value of each point in the belief set is improved; the key observation is that a single backup may improve the value of many belief points. Contrary to other point-based methods, Perseus backs up only a (randomly selected) subset of points in the belief set, sufficient for improving the value of each belief point in the set. We show how the same idea can be extended to dealing with continuous action spaces. Experimental results show the potential of Perseus in large scale POMDP problems

arXiv.org e-Print Archive

Crossref