Search CORE

22,380 research outputs found

K-nearest Neighbor Search by Random Projection Forests

Author: Li Zhenpeng
Wang Honggang
Wang Jin
Wang Yingjie
Yan Donghui
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 30/12/2018
Field of study

K-nearest neighbor (kNN) search has wide applications in many areas, including data mining, machine learning, statistics and many applied domains. Inspired by the success of ensemble methods and the flexibility of tree-based methodology, we propose random projection forests (rpForests), for kNN search. rpForests finds kNNs by aggregating results from an ensemble of random projection trees with each constructed recursively through a series of carefully chosen random projections. rpForests achieves a remarkable accuracy in terms of fast decay in the missing rate of kNNs and that of discrepancy in the kNN distances. rpForests has a very low computational complexity. The ensemble nature of rpForests makes it easily run in parallel on multicore or clustered computers; the running time is expected to be nearly inversely proportional to the number of cores or machines. We give theoretical insights by showing the exponential decay of the probability that neighboring points would be separated by ensemble random projection trees when the ensemble size increases. Our theory can be used to refine the choice of random projections in the growth of trees, and experiments show that the effect is remarkable.Comment: 15 pages, 4 figures, 2018 IEEE Big Data Conferenc

arXiv.org e-Print Archive

Crossref

B-urns

Author: Chauvin Brigitte
Gardy Danièle
Pouyanne Nicolas
Ton-That Dai-Hai
Publication venue
Publication date: 22/07/2015
Field of study

The fringe of a B-tree with parameter

m

is considered as a particular P\'olya urn with

m

colors. More precisely, the asymptotic behaviour of this fringe, when the number of stored keys tends to infinity, is studied through the composition vector of the fringe nodes. We establish its typical behaviour together with the fluctuations around it. The well known phase transition in P\'olya urns has the following effect on B-trees: for

m\leq 59

, the fluctuations are asymptotically Gaussian, though for

m\geq 60

, the composition vector is oscillating; after scaling, the fluctuations of such an urn strongly converge to a random variable

W

. This limit is

\mathbb C

-valued and it does not seem to follow any classical law. Several properties of

W

are shown: existence of exponential moments, characterization of its distribution as the solution of a smoothing equation, existence of a density relatively to the Lebesgue measure on

\mathbb C

, support of

W

. Moreover, a few representations of the composition vector for various values of

m

illustrate the different kinds of convergence

arXiv.org e-Print Archive

HAL UVSQ

Harmonic analysis of finite lamplighter random walks

Author: Scarabotti Fabio
Tolli Filippo
Publication venue
Publication date: 22/01/2007
Field of study

Recently, several papers have been devoted to the analysis of lamplighter random walks, in particular when the underlying graph is the infinite path

\mathbb{Z}

. In the present paper, we develop a spectral analysis for lamplighter random walks on finite graphs. In the general case, we use the

C_2

-symmetry to reduce the spectral computations to a series of eigenvalue problems on the underlying graph. In the case the graph has a transitive isometry group

G

, we also describe the spectral analysis in terms of the representation theory of the wreath product

C_2\wr G

. We apply our theory to the lamplighter random walks on the complete graph and on the discrete circle. These examples were already studied by Haggstrom and Jonasson by probabilistic methods.Comment: 29 page

arXiv.org e-Print Archive

Archivio della Ricerca - Università di Roma 3

Archivio della ricerca- Università di Roma La Sapienza

Determinantal Processes and Independence

Author: Hough J. Ben
Krishnapur Manjunath
Peres Yuval
Virág Bálint
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2006
Field of study

We give a probabilistic introduction to determinantal and permanental point processes. Determinantal processes arise in physics (fermions, eigenvalues of random matrices) and in combinatorics (nonintersecting paths, random spanning trees). They have the striking property that the number of points in a region

D

is a sum of independent Bernoulli random variables, with parameters which are eigenvalues of the relevant operator on

L^2(D)

. Moreover, any determinantal process can be represented as a mixture of determinantal projection processes. We give a simple explanation for these known facts, and establish analogous representations for permanental processes, with geometric variables replacing the Bernoulli variables. These representations lead to simple proofs of existence criteria and central limit theorems, and unify known results on the distribution of absolute values in certain processes with radially symmetric distributions.Comment: Published at http://dx.doi.org/10.1214/154957806000000078 in the Probability Surveys (http://www.i-journals.org/ps/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref

Mixed-Integer Convex Nonlinear Optimization with Gradient-Boosted Trees Embedded

Author: Krennrich Gerhard
Lee Robert M.
Letsios Dimitrios
Misener Ruth
Mistry Miten
Publication venue
Publication date: 25/09/2019
Field of study

Decision trees usefully represent sparse, high dimensional and noisy data. Having learned a function from this data, we may want to thereafter integrate the function into a larger decision-making problem, e.g., for picking the best chemical process catalyst. We study a large-scale, industrially-relevant mixed-integer nonlinear nonconvex optimization problem involving both gradient-boosted trees and penalty functions mitigating risk. This mixed-integer optimization problem with convex penalty terms broadly applies to optimizing pre-trained regression tree models. Decision makers may wish to optimize discrete models to repurpose legacy predictive models, or they may wish to optimize a discrete model that particularly well-represents a data set. We develop several heuristic methods to find feasible solutions, and an exact, branch-and-bound algorithm leveraging structural properties of the gradient-boosted trees and penalty functions. We computationally test our methods on concrete mixture design instance and a chemical catalysis industrial instance

arXiv.org e-Print Archive

Spiral - Imperial College Digital Repository

Martingales and Profile of Binary Search Trees

Author: Chauvin Brigitte
Klein Thierry
Marckert Jean-Francois
Rouault Alain
Publication venue
Publication date: 01/01/2004
Field of study

We are interested in the asymptotic analysis of the binary search tree (BST) under the random permutation model. Via an embedding in a continuous time model, we get new results, in particular the asymptotic behavior of the profile

arXiv.org e-Print Archive

CiteSeerX

Crossref

Scientific Publications of the University of Toulouse II Le Mirail

HAL-INSA Toulouse

HAL UVSQ