15 research outputs found

    Tree-Independent Dual-Tree Algorithms

    Full text link
    Dual-tree algorithms are a widely used class of branch-and-bound algorithms. Unfortunately, developing dual-tree algorithms for use with different trees and problems is often complex and burdensome. We introduce a four-part logical split: the tree, the traversal, the point-to-point base case, and the pruning rule. We provide a meta-algorithm which allows development of dual-tree algorithms in a tree-independent manner and easy extension to entirely new types of trees. Representations are provided for five common algorithms; for k-nearest neighbor search, this leads to a novel, tighter pruning bound. The meta-algorithm also allows straightforward extensions to massively parallel settings.Comment: accepted in ICML 201

    Multibody Multipole Methods

    Full text link
    A three-body potential function can account for interactions among triples of particles which are uncaptured by pairwise interaction functions such as Coulombic or Lennard-Jones potentials. Likewise, a multibody potential of order nn can account for interactions among nn-tuples of particles uncaptured by interaction functions of lower orders. To date, the computation of multibody potential functions for a large number of particles has not been possible due to its O(Nn)O(N^n) scaling cost. In this paper we describe a fast tree-code for efficiently approximating multibody potentials that can be factorized as products of functions of pairwise distances. For the first time, we show how to derive a Barnes-Hut type algorithm for handling interactions among more than two particles. Our algorithm uses two approximation schemes: 1) a deterministic series expansion-based method; 2) a Monte Carlo-based approximation based on the central limit theorem. Our approach guarantees a user-specified bound on the absolute or relative error in the computed potential with an asymptotic probability guarantee. We provide speedup results on a three-body dispersion potential, the Axilrod-Teller potential.Comment: To appear in Journal of Computational Physic

    Structure in the 3D Galaxy Distribution: I. Methods and Example Results

    Full text link
    Three methods for detecting and characterizing structure in point data, such as that generated by redshift surveys, are described: classification using self-organizing maps, segmentation using Bayesian blocks, and density estimation using adaptive kernels. The first two methods are new, and allow detection and characterization of structures of arbitrary shape and at a wide range of spatial scales. These methods should elucidate not only clusters, but also the more distributed, wide-ranging filaments and sheets, and further allow the possibility of detecting and characterizing an even broader class of shapes. The methods are demonstrated and compared in application to three data sets: a carefully selected volume-limited sample from the Sloan Digital Sky Survey redshift data, a similarly selected sample from the Millennium Simulation, and a set of points independently drawn from a uniform probability distribution -- a so-called Poisson distribution. We demonstrate a few of the many ways in which these methods elucidate large scale structure in the distribution of galaxies in the nearby Universe.Comment: Re-posted after referee corrections along with partially re-written introduction. 80 pages, 31 figures, ApJ in Press. For full sized figures please download from: http://astrophysics.arc.nasa.gov/~mway/lss1.pd

    The RODEO Approach for Nonparametric Density Estimation

    Get PDF
    Der von Lafferty und Wasserman (2008) entwickelte RODEO-Ansatz (Regularization of Derivative Expectation Operator) ist eine Regularisierungstechnik, die auf eine Vielzahl nichtparametrischer Kernel-Smoother angewendet werden kann. Die Idee des Ansatzes ist, die Reduktion der Verzerrung des Kernel-Smoothers, die mit einer Verringerung der Bandweiten einhergeht, entlang eines glatten Weges von abnehmenden Bandweite-Parameterwerten zu bestrafen. Der Einfluss von Dimensionen mit geringer lokaler Variation wird so effektiv ``herausgeglättet'', wodurch eine Art implizite Variablenauswahl stattfindet. Unter bestimmten Annahmen können so schnellere Konvergenzraten für den mittleren integrierten quadratischen Fehler des Kernel-Smoothers erreicht werden. Dies macht den RODEO-Ansatz vor allem für höhere Dimensionen attraktiv. In der vorliegenden Arbeit wird eine Implementierung präsentiert, die den RODEO-Ansatz mit lokal polynomialer Dichteschätzung kombiniert. Die Implementierung wurde durch das R-Paket lpderodeo realisiert. Ziel der vorliegenden Arbeit ist es, die Performance der Implementierung anhand einiger Beispiele zu evaluieren und mit einer Auswahl von acht weiteren nichtparametrischen Dichteschätzverfahren zu vergleichen. Die Ergebnisse legen nahe, dass RODEO-Ansatz im Vergleich zu den anderen Ansätzen schlechter ist. Darüber hinaus leidet die Implementierung aufgrund einer naiven Auswertungsabfolge unter relativ langen Rechenzeiten. Das wohl wichtigste Ergebnis dieser Arbeit ist jedoch die Tatsache, dass die von Liu, Lafferty und Wasserman (2007) entwickelte Theorie fehlerhaft ist. So führt bereits eine simple Rotation der Daten dazu, dass der Algorithmus nicht mehr richtig funktioniert.The regularization of derivative expectation operator (RODEO) approach developed by Lafferty and Wasserman (2008) is a regularization technique designed for a wide range of nonparametric kernel smoothers. The approach applies regularization by penalizing the bias reduction associated with a bandwidth reduction along a smooth path of decreasing bandwidth parameter values in order to avoid overfitting. Dimensions with small local variation are effectively smoothed out, thus implicitly carrying out variable selection. Under certain conditions, faster rates of converges of convergence for the mean integrated square error can be achieved, which makes the approach attractive for applications in high dimensions. In this paper we apply the RODEO approach to local polynomial density estimation. We implemented the approach in the R package lpderodeo. We apply our implementation to a few examples, and evaluate its performance in a comparative study using a sample of eight other approaches for nonparametric density estimation. Our findings suggest that the approach does not work well in comparison to the other considered approaches with regard to the applied performance metrics. Furthermore, our implementation suffers from long computation time due to a naive query. Our main finding, however, concerns the fact that the theoretical framework proposed by Liu, Lafferty, and Wasserman (2007) has severe shortcomings. In fact, we demonstrate that a simple rotation of the data makes the algorithm fail in practice
    corecore