187,389 research outputs found
Non-uniform Feature Sampling for Decision Tree Ensembles
We study the effectiveness of non-uniform randomized feature selection in
decision tree classification. We experimentally evaluate two feature selection
methodologies, based on information extracted from the provided dataset:
\emph{leverage scores-based} and \emph{norm-based} feature selection.
Experimental evaluation of the proposed feature selection techniques indicate
that such approaches might be more effective compared to naive uniform feature
selection and moreover having comparable performance to the random forest
algorithm [3]Comment: 7 pages, 7 figures, 1 tabl
Poor performance of broadleaf plantations and possible remedial silvicultural systems - a review
Peer-reviewedOver the last two decades planting of broadleaves has been part of forest policy. In addition to the provision of a range of ecosystem services, it is intended that this resource will have a direct economic stimulus through the supply of quality hardwood. A number of challenges must be
met in order to achieve this objective, particularly as current observations would indicate that many first rotation broadleaf plantations comprise a relatively high proportion of poor quality stems. A literature review has been carried out on the probable causes of poor performance
in broadleaf crops. Silvicultural systems to rehabilitate poor quality stands are discussed. Subsequent papers will deal with these silvicultural systems in more detail.COFOR
Kinetic Solvers with Adaptive Mesh in Phase Space
An Adaptive Mesh in Phase Space (AMPS) methodology has been developed for
solving multi-dimensional kinetic equations by the discrete velocity method. A
Cartesian mesh for both configuration (r) and velocity (v) spaces is produced
using a tree of trees data structure. The mesh in r-space is automatically
generated around embedded boundaries and dynamically adapted to local solution
properties. The mesh in v-space is created on-the-fly for each cell in r-space.
Mappings between neighboring v-space trees implemented for the advection
operator in configuration space. We have developed new algorithms for solving
the full Boltzmann and linear Boltzmann equations with AMPS. Several recent
innovations were used to calculate the discrete Boltzmann collision integral
with dynamically adaptive mesh in velocity space: importance sampling,
multi-point projection method, and the variance reduction method. We have
developed an efficient algorithm for calculating the linear Boltzmann collision
integral for elastic and inelastic collisions in a Lorentz gas. New AMPS
technique has been demonstrated for simulations of hypersonic rarefied gas
flows, ion and electron kinetics in weakly ionized plasma, radiation and light
particle transport through thin films, and electron streaming in
semiconductors. We have shown that AMPS allows minimizing the number of cells
in phase space to reduce computational cost and memory usage for solving
challenging kinetic problems
uBoost: A boosting method for producing uniform selection efficiencies from multivariate classifiers
The use of multivariate classifiers, especially neural networks and decision
trees, has become commonplace in particle physics. Typically, a series of
classifiers is trained rather than just one to enhance the performance; this is
known as boosting. This paper presents a novel method of boosting that produces
a uniform selection efficiency in a user-defined multivariate space. Such a
technique is ideally suited for amplitude analyses or other situations where
optimizing a single integrated figure of merit is not what is desired
Detection of Uniform and Non-Uniform Differential Item Functioning by Item Focussed Trees
Detection of differential item functioning by use of the logistic modelling
approach has a long tradition. One big advantage of the approach is that it can
be used to investigate non-uniform DIF as well as uniform DIF. The classical
approach allows to detect DIF by distinguishing between multiple groups. We
propose an alternative method that is a combination of recursive partitioning
methods (or trees) and logistic regression methodology to detect uniform and
non-uniform DIF in a nonparametric way. The output of the method are trees that
visualize in a simple way the structure of DIF in an item showing which
variables are interacting in which way when generating DIF. In addition we
consider a logistic regression method in which DIF can by induced by a vector
of covariates, which may include categorical but also continuous covariates.
The methods are investigated in simulation studies and illustrated by two
applications.Comment: 32 pages, 13 figures, 7 table
ABC random forests for Bayesian parameter inference
This preprint has been reviewed and recommended by Peer Community In
Evolutionary Biology (http://dx.doi.org/10.24072/pci.evolbiol.100036).
Approximate Bayesian computation (ABC) has grown into a standard methodology
that manages Bayesian inference for models associated with intractable
likelihood functions. Most ABC implementations require the preliminary
selection of a vector of informative statistics summarizing raw data.
Furthermore, in almost all existing implementations, the tolerance level that
separates acceptance from rejection of simulated parameter values needs to be
calibrated. We propose to conduct likelihood-free Bayesian inferences about
parameters with no prior selection of the relevant components of the summary
statistics and bypassing the derivation of the associated tolerance level. The
approach relies on the random forest methodology of Breiman (2001) applied in a
(non parametric) regression setting. We advocate the derivation of a new random
forest for each component of the parameter vector of interest. When compared
with earlier ABC solutions, this method offers significant gains in terms of
robustness to the choice of the summary statistics, does not depend on any type
of tolerance level, and is a good trade-off in term of quality of point
estimator precision and credible interval estimations for a given computing
time. We illustrate the performance of our methodological proposal and compare
it with earlier ABC methods on a Normal toy example and a population genetics
example dealing with human population evolution. All methods designed here have
been incorporated in the R package abcrf (version 1.7) available on CRAN.Comment: Main text: 24 pages, 6 figures Supplementary Information: 14 pages, 5
figure
On PAC-Bayesian Bounds for Random Forests
Existing guarantees in terms of rigorous upper bounds on the generalization
error for the original random forest algorithm, one of the most frequently used
machine learning methods, are unsatisfying. We discuss and evaluate various
PAC-Bayesian approaches to derive such bounds. The bounds do not require
additional hold-out data, because the out-of-bag samples from the bagging in
the training process can be exploited. A random forest predicts by taking a
majority vote of an ensemble of decision trees. The first approach is to bound
the error of the vote by twice the error of the corresponding Gibbs classifier
(classifying with a single member of the ensemble selected at random). However,
this approach does not take into account the effect of averaging out of errors
of individual classifiers when taking the majority vote. This effect provides a
significant boost in performance when the errors are independent or negatively
correlated, but when the correlations are strong the advantage from taking the
majority vote is small. The second approach based on PAC-Bayesian C-bounds
takes dependencies between ensemble members into account, but it requires
estimating correlations between the errors of the individual classifiers. When
the correlations are high or the estimation is poor, the bounds degrade. In our
experiments, we compute generalization bounds for random forests on various
benchmark data sets. Because the individual decision trees already perform
well, their predictions are highly correlated and the C-bounds do not lead to
satisfactory results. For the same reason, the bounds based on the analysis of
Gibbs classifiers are typically superior and often reasonably tight. Bounds
based on a validation set coming at the cost of a smaller training set gave
better performance guarantees, but worse performance in most experiments
Compact Oblivious Routing
Oblivious routing is an attractive paradigm for large distributed systems in which centralized control and frequent reconfigurations are infeasible or undesired (e.g., costly). Over the last almost 20 years, much progress has been made in devising oblivious routing schemes that guarantee close to optimal load and also algorithms for constructing such schemes efficiently have been designed. However, a common drawback of existing oblivious routing schemes is that they are not compact: they require large routing tables (of polynomial size), which does not scale.
This paper presents the first oblivious routing scheme which guarantees close to optimal load and is compact at the same time - requiring routing tables of polylogarithmic size. Our algorithm maintains the polylogarithmic competitive ratio of existing algorithms, and is hence particularly well-suited for emerging large-scale networks
- …