22,380 research outputs found
K-nearest Neighbor Search by Random Projection Forests
K-nearest neighbor (kNN) search has wide applications in many areas,
including data mining, machine learning, statistics and many applied domains.
Inspired by the success of ensemble methods and the flexibility of tree-based
methodology, we propose random projection forests (rpForests), for kNN search.
rpForests finds kNNs by aggregating results from an ensemble of random
projection trees with each constructed recursively through a series of
carefully chosen random projections. rpForests achieves a remarkable accuracy
in terms of fast decay in the missing rate of kNNs and that of discrepancy in
the kNN distances. rpForests has a very low computational complexity. The
ensemble nature of rpForests makes it easily run in parallel on multicore or
clustered computers; the running time is expected to be nearly inversely
proportional to the number of cores or machines. We give theoretical insights
by showing the exponential decay of the probability that neighboring points
would be separated by ensemble random projection trees when the ensemble size
increases. Our theory can be used to refine the choice of random projections in
the growth of trees, and experiments show that the effect is remarkable.Comment: 15 pages, 4 figures, 2018 IEEE Big Data Conferenc
B-urns
The fringe of a B-tree with parameter is considered as a particular
P\'olya urn with colors. More precisely, the asymptotic behaviour of this
fringe, when the number of stored keys tends to infinity, is studied through
the composition vector of the fringe nodes. We establish its typical behaviour
together with the fluctuations around it. The well known phase transition in
P\'olya urns has the following effect on B-trees: for , the
fluctuations are asymptotically Gaussian, though for , the
composition vector is oscillating; after scaling, the fluctuations of such an
urn strongly converge to a random variable . This limit is -valued and it does not seem to follow any classical law. Several properties
of are shown: existence of exponential moments, characterization of its
distribution as the solution of a smoothing equation, existence of a density
relatively to the Lebesgue measure on , support of . Moreover, a
few representations of the composition vector for various values of
illustrate the different kinds of convergence
Harmonic analysis of finite lamplighter random walks
Recently, several papers have been devoted to the analysis of lamplighter
random walks, in particular when the underlying graph is the infinite path
. In the present paper, we develop a spectral analysis for
lamplighter random walks on finite graphs. In the general case, we use the
-symmetry to reduce the spectral computations to a series of eigenvalue
problems on the underlying graph. In the case the graph has a transitive
isometry group , we also describe the spectral analysis in terms of the
representation theory of the wreath product . We apply our theory to
the lamplighter random walks on the complete graph and on the discrete circle.
These examples were already studied by Haggstrom and Jonasson by probabilistic
methods.Comment: 29 page
Determinantal Processes and Independence
We give a probabilistic introduction to determinantal and permanental point
processes. Determinantal processes arise in physics (fermions, eigenvalues of
random matrices) and in combinatorics (nonintersecting paths, random spanning
trees). They have the striking property that the number of points in a region
is a sum of independent Bernoulli random variables, with parameters which
are eigenvalues of the relevant operator on . Moreover, any
determinantal process can be represented as a mixture of determinantal
projection processes. We give a simple explanation for these known facts, and
establish analogous representations for permanental processes, with geometric
variables replacing the Bernoulli variables. These representations lead to
simple proofs of existence criteria and central limit theorems, and unify known
results on the distribution of absolute values in certain processes with
radially symmetric distributions.Comment: Published at http://dx.doi.org/10.1214/154957806000000078 in the
Probability Surveys (http://www.i-journals.org/ps/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Mixed-Integer Convex Nonlinear Optimization with Gradient-Boosted Trees Embedded
Decision trees usefully represent sparse, high dimensional and noisy data.
Having learned a function from this data, we may want to thereafter integrate
the function into a larger decision-making problem, e.g., for picking the best
chemical process catalyst. We study a large-scale, industrially-relevant
mixed-integer nonlinear nonconvex optimization problem involving both
gradient-boosted trees and penalty functions mitigating risk. This
mixed-integer optimization problem with convex penalty terms broadly applies to
optimizing pre-trained regression tree models. Decision makers may wish to
optimize discrete models to repurpose legacy predictive models, or they may
wish to optimize a discrete model that particularly well-represents a data set.
We develop several heuristic methods to find feasible solutions, and an exact,
branch-and-bound algorithm leveraging structural properties of the
gradient-boosted trees and penalty functions. We computationally test our
methods on concrete mixture design instance and a chemical catalysis industrial
instance
Martingales and Profile of Binary Search Trees
We are interested in the asymptotic analysis of the binary search tree (BST)
under the random permutation model. Via an embedding in a continuous time
model, we get new results, in particular the asymptotic behavior of the
profile
- …