21,753 research outputs found

    K-nearest Neighbor Search by Random Projection Forests

    Full text link
    K-nearest neighbor (kNN) search has wide applications in many areas, including data mining, machine learning, statistics and many applied domains. Inspired by the success of ensemble methods and the flexibility of tree-based methodology, we propose random projection forests (rpForests), for kNN search. rpForests finds kNNs by aggregating results from an ensemble of random projection trees with each constructed recursively through a series of carefully chosen random projections. rpForests achieves a remarkable accuracy in terms of fast decay in the missing rate of kNNs and that of discrepancy in the kNN distances. rpForests has a very low computational complexity. The ensemble nature of rpForests makes it easily run in parallel on multicore or clustered computers; the running time is expected to be nearly inversely proportional to the number of cores or machines. We give theoretical insights by showing the exponential decay of the probability that neighboring points would be separated by ensemble random projection trees when the ensemble size increases. Our theory can be used to refine the choice of random projections in the growth of trees, and experiments show that the effect is remarkable.Comment: 15 pages, 4 figures, 2018 IEEE Big Data Conferenc

    B-urns

    Full text link
    The fringe of a B-tree with parameter mm is considered as a particular P\'olya urn with mm colors. More precisely, the asymptotic behaviour of this fringe, when the number of stored keys tends to infinity, is studied through the composition vector of the fringe nodes. We establish its typical behaviour together with the fluctuations around it. The well known phase transition in P\'olya urns has the following effect on B-trees: for m≤59m\leq 59, the fluctuations are asymptotically Gaussian, though for m≥60m\geq 60, the composition vector is oscillating; after scaling, the fluctuations of such an urn strongly converge to a random variable WW. This limit is C\mathbb C-valued and it does not seem to follow any classical law. Several properties of WW are shown: existence of exponential moments, characterization of its distribution as the solution of a smoothing equation, existence of a density relatively to the Lebesgue measure on C\mathbb C, support of WW. Moreover, a few representations of the composition vector for various values of mm illustrate the different kinds of convergence

    Harmonic analysis of finite lamplighter random walks

    Full text link
    Recently, several papers have been devoted to the analysis of lamplighter random walks, in particular when the underlying graph is the infinite path Z\mathbb{Z}. In the present paper, we develop a spectral analysis for lamplighter random walks on finite graphs. In the general case, we use the C2C_2-symmetry to reduce the spectral computations to a series of eigenvalue problems on the underlying graph. In the case the graph has a transitive isometry group GG, we also describe the spectral analysis in terms of the representation theory of the wreath product C2≀GC_2\wr G. We apply our theory to the lamplighter random walks on the complete graph and on the discrete circle. These examples were already studied by Haggstrom and Jonasson by probabilistic methods.Comment: 29 page

    Determinantal Processes and Independence

    Full text link
    We give a probabilistic introduction to determinantal and permanental point processes. Determinantal processes arise in physics (fermions, eigenvalues of random matrices) and in combinatorics (nonintersecting paths, random spanning trees). They have the striking property that the number of points in a region DD is a sum of independent Bernoulli random variables, with parameters which are eigenvalues of the relevant operator on L2(D)L^2(D). Moreover, any determinantal process can be represented as a mixture of determinantal projection processes. We give a simple explanation for these known facts, and establish analogous representations for permanental processes, with geometric variables replacing the Bernoulli variables. These representations lead to simple proofs of existence criteria and central limit theorems, and unify known results on the distribution of absolute values in certain processes with radially symmetric distributions.Comment: Published at http://dx.doi.org/10.1214/154957806000000078 in the Probability Surveys (http://www.i-journals.org/ps/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Mixed-Integer Convex Nonlinear Optimization with Gradient-Boosted Trees Embedded

    Get PDF
    Decision trees usefully represent sparse, high dimensional and noisy data. Having learned a function from this data, we may want to thereafter integrate the function into a larger decision-making problem, e.g., for picking the best chemical process catalyst. We study a large-scale, industrially-relevant mixed-integer nonlinear nonconvex optimization problem involving both gradient-boosted trees and penalty functions mitigating risk. This mixed-integer optimization problem with convex penalty terms broadly applies to optimizing pre-trained regression tree models. Decision makers may wish to optimize discrete models to repurpose legacy predictive models, or they may wish to optimize a discrete model that particularly well-represents a data set. We develop several heuristic methods to find feasible solutions, and an exact, branch-and-bound algorithm leveraging structural properties of the gradient-boosted trees and penalty functions. We computationally test our methods on concrete mixture design instance and a chemical catalysis industrial instance

    Martingales and Profile of Binary Search Trees

    Full text link
    We are interested in the asymptotic analysis of the binary search tree (BST) under the random permutation model. Via an embedding in a continuous time model, we get new results, in particular the asymptotic behavior of the profile
    • …
    corecore