15 research outputs found
Tree-Independent Dual-Tree Algorithms
Dual-tree algorithms are a widely used class of branch-and-bound algorithms.
Unfortunately, developing dual-tree algorithms for use with different trees and
problems is often complex and burdensome. We introduce a four-part logical
split: the tree, the traversal, the point-to-point base case, and the pruning
rule. We provide a meta-algorithm which allows development of dual-tree
algorithms in a tree-independent manner and easy extension to entirely new
types of trees. Representations are provided for five common algorithms; for
k-nearest neighbor search, this leads to a novel, tighter pruning bound. The
meta-algorithm also allows straightforward extensions to massively parallel
settings.Comment: accepted in ICML 201
Multibody Multipole Methods
A three-body potential function can account for interactions among triples of
particles which are uncaptured by pairwise interaction functions such as
Coulombic or Lennard-Jones potentials. Likewise, a multibody potential of order
can account for interactions among -tuples of particles uncaptured by
interaction functions of lower orders. To date, the computation of multibody
potential functions for a large number of particles has not been possible due
to its scaling cost. In this paper we describe a fast tree-code for
efficiently approximating multibody potentials that can be factorized as
products of functions of pairwise distances. For the first time, we show how to
derive a Barnes-Hut type algorithm for handling interactions among more than
two particles. Our algorithm uses two approximation schemes: 1) a deterministic
series expansion-based method; 2) a Monte Carlo-based approximation based on
the central limit theorem. Our approach guarantees a user-specified bound on
the absolute or relative error in the computed potential with an asymptotic
probability guarantee. We provide speedup results on a three-body dispersion
potential, the Axilrod-Teller potential.Comment: To appear in Journal of Computational Physic
Structure in the 3D Galaxy Distribution: I. Methods and Example Results
Three methods for detecting and characterizing structure in point data, such
as that generated by redshift surveys, are described: classification using
self-organizing maps, segmentation using Bayesian blocks, and density
estimation using adaptive kernels. The first two methods are new, and allow
detection and characterization of structures of arbitrary shape and at a wide
range of spatial scales. These methods should elucidate not only clusters, but
also the more distributed, wide-ranging filaments and sheets, and further allow
the possibility of detecting and characterizing an even broader class of
shapes. The methods are demonstrated and compared in application to three data
sets: a carefully selected volume-limited sample from the Sloan Digital Sky
Survey redshift data, a similarly selected sample from the Millennium
Simulation, and a set of points independently drawn from a uniform probability
distribution -- a so-called Poisson distribution. We demonstrate a few of the
many ways in which these methods elucidate large scale structure in the
distribution of galaxies in the nearby Universe.Comment: Re-posted after referee corrections along with partially re-written
introduction. 80 pages, 31 figures, ApJ in Press. For full sized figures
please download from: http://astrophysics.arc.nasa.gov/~mway/lss1.pd
The RODEO Approach for Nonparametric Density Estimation
Der von Lafferty und Wasserman (2008) entwickelte RODEO-Ansatz (Regularization of Derivative Expectation Operator) ist eine Regularisierungstechnik, die auf eine Vielzahl nichtparametrischer Kernel-Smoother angewendet werden kann. Die Idee des Ansatzes ist, die Reduktion der Verzerrung des Kernel-Smoothers, die mit einer Verringerung der Bandweiten einhergeht, entlang eines glatten Weges von abnehmenden Bandweite-Parameterwerten zu bestrafen. Der Einfluss von Dimensionen mit geringer lokaler Variation wird so effektiv ``herausgeglättet'', wodurch eine Art implizite Variablenauswahl stattfindet. Unter bestimmten Annahmen können so schnellere Konvergenzraten für den mittleren integrierten quadratischen Fehler des Kernel-Smoothers erreicht werden. Dies macht den RODEO-Ansatz vor allem für höhere Dimensionen attraktiv. In der vorliegenden Arbeit wird eine Implementierung präsentiert, die den RODEO-Ansatz mit lokal polynomialer Dichteschätzung kombiniert. Die Implementierung wurde durch das R-Paket lpderodeo realisiert. Ziel der vorliegenden Arbeit ist es, die Performance der Implementierung anhand einiger Beispiele zu evaluieren und mit einer Auswahl von acht weiteren nichtparametrischen Dichteschätzverfahren zu vergleichen. Die Ergebnisse legen nahe, dass RODEO-Ansatz im Vergleich zu den anderen Ansätzen schlechter ist. Darüber hinaus leidet die Implementierung aufgrund einer naiven Auswertungsabfolge unter relativ langen Rechenzeiten. Das wohl wichtigste Ergebnis dieser Arbeit ist jedoch die Tatsache, dass die von Liu, Lafferty und Wasserman (2007) entwickelte Theorie fehlerhaft ist. So führt bereits eine simple Rotation der Daten dazu, dass der Algorithmus nicht mehr richtig funktioniert.The regularization of derivative expectation operator (RODEO) approach developed by Lafferty and Wasserman (2008) is a regularization technique designed for a wide range of nonparametric kernel smoothers. The approach applies regularization by penalizing the bias reduction associated with a bandwidth reduction along a smooth path of decreasing bandwidth parameter values in order to avoid overfitting. Dimensions with small local variation are effectively smoothed out, thus implicitly carrying out variable selection. Under certain conditions, faster rates of converges of convergence for the mean integrated square error can be achieved, which makes the approach attractive for applications in high dimensions. In this paper we apply the RODEO approach to local polynomial density estimation. We implemented the approach in the R package lpderodeo. We apply our implementation to a few examples, and evaluate its performance in a comparative study using a sample of eight other approaches for nonparametric density estimation. Our findings suggest that the approach does not work well in comparison to the other considered approaches with regard to the applied performance metrics. Furthermore, our implementation suffers from long computation time due to a naive query. Our main finding, however, concerns the fact that the theoretical framework proposed by Liu, Lafferty, and Wasserman (2007) has severe shortcomings. In fact, we demonstrate that a simple rotation of the data makes the algorithm fail in practice