187,389 research outputs found

    Non-uniform Feature Sampling for Decision Tree Ensembles

    Full text link
    We study the effectiveness of non-uniform randomized feature selection in decision tree classification. We experimentally evaluate two feature selection methodologies, based on information extracted from the provided dataset: (i)(i) \emph{leverage scores-based} and (ii)(ii) \emph{norm-based} feature selection. Experimental evaluation of the proposed feature selection techniques indicate that such approaches might be more effective compared to naive uniform feature selection and moreover having comparable performance to the random forest algorithm [3]Comment: 7 pages, 7 figures, 1 tabl

    Poor performance of broadleaf plantations and possible remedial silvicultural systems - a review

    Get PDF
    Peer-reviewedOver the last two decades planting of broadleaves has been part of forest policy. In addition to the provision of a range of ecosystem services, it is intended that this resource will have a direct economic stimulus through the supply of quality hardwood. A number of challenges must be met in order to achieve this objective, particularly as current observations would indicate that many first rotation broadleaf plantations comprise a relatively high proportion of poor quality stems. A literature review has been carried out on the probable causes of poor performance in broadleaf crops. Silvicultural systems to rehabilitate poor quality stands are discussed. Subsequent papers will deal with these silvicultural systems in more detail.COFOR

    Kinetic Solvers with Adaptive Mesh in Phase Space

    Full text link
    An Adaptive Mesh in Phase Space (AMPS) methodology has been developed for solving multi-dimensional kinetic equations by the discrete velocity method. A Cartesian mesh for both configuration (r) and velocity (v) spaces is produced using a tree of trees data structure. The mesh in r-space is automatically generated around embedded boundaries and dynamically adapted to local solution properties. The mesh in v-space is created on-the-fly for each cell in r-space. Mappings between neighboring v-space trees implemented for the advection operator in configuration space. We have developed new algorithms for solving the full Boltzmann and linear Boltzmann equations with AMPS. Several recent innovations were used to calculate the discrete Boltzmann collision integral with dynamically adaptive mesh in velocity space: importance sampling, multi-point projection method, and the variance reduction method. We have developed an efficient algorithm for calculating the linear Boltzmann collision integral for elastic and inelastic collisions in a Lorentz gas. New AMPS technique has been demonstrated for simulations of hypersonic rarefied gas flows, ion and electron kinetics in weakly ionized plasma, radiation and light particle transport through thin films, and electron streaming in semiconductors. We have shown that AMPS allows minimizing the number of cells in phase space to reduce computational cost and memory usage for solving challenging kinetic problems

    uBoost: A boosting method for producing uniform selection efficiencies from multivariate classifiers

    Get PDF
    The use of multivariate classifiers, especially neural networks and decision trees, has become commonplace in particle physics. Typically, a series of classifiers is trained rather than just one to enhance the performance; this is known as boosting. This paper presents a novel method of boosting that produces a uniform selection efficiency in a user-defined multivariate space. Such a technique is ideally suited for amplitude analyses or other situations where optimizing a single integrated figure of merit is not what is desired

    Detection of Uniform and Non-Uniform Differential Item Functioning by Item Focussed Trees

    Get PDF
    Detection of differential item functioning by use of the logistic modelling approach has a long tradition. One big advantage of the approach is that it can be used to investigate non-uniform DIF as well as uniform DIF. The classical approach allows to detect DIF by distinguishing between multiple groups. We propose an alternative method that is a combination of recursive partitioning methods (or trees) and logistic regression methodology to detect uniform and non-uniform DIF in a nonparametric way. The output of the method are trees that visualize in a simple way the structure of DIF in an item showing which variables are interacting in which way when generating DIF. In addition we consider a logistic regression method in which DIF can by induced by a vector of covariates, which may include categorical but also continuous covariates. The methods are investigated in simulation studies and illustrated by two applications.Comment: 32 pages, 13 figures, 7 table

    ABC random forests for Bayesian parameter inference

    Get PDF
    This preprint has been reviewed and recommended by Peer Community In Evolutionary Biology (http://dx.doi.org/10.24072/pci.evolbiol.100036). Approximate Bayesian computation (ABC) has grown into a standard methodology that manages Bayesian inference for models associated with intractable likelihood functions. Most ABC implementations require the preliminary selection of a vector of informative statistics summarizing raw data. Furthermore, in almost all existing implementations, the tolerance level that separates acceptance from rejection of simulated parameter values needs to be calibrated. We propose to conduct likelihood-free Bayesian inferences about parameters with no prior selection of the relevant components of the summary statistics and bypassing the derivation of the associated tolerance level. The approach relies on the random forest methodology of Breiman (2001) applied in a (non parametric) regression setting. We advocate the derivation of a new random forest for each component of the parameter vector of interest. When compared with earlier ABC solutions, this method offers significant gains in terms of robustness to the choice of the summary statistics, does not depend on any type of tolerance level, and is a good trade-off in term of quality of point estimator precision and credible interval estimations for a given computing time. We illustrate the performance of our methodological proposal and compare it with earlier ABC methods on a Normal toy example and a population genetics example dealing with human population evolution. All methods designed here have been incorporated in the R package abcrf (version 1.7) available on CRAN.Comment: Main text: 24 pages, 6 figures Supplementary Information: 14 pages, 5 figure

    On PAC-Bayesian Bounds for Random Forests

    Full text link
    Existing guarantees in terms of rigorous upper bounds on the generalization error for the original random forest algorithm, one of the most frequently used machine learning methods, are unsatisfying. We discuss and evaluate various PAC-Bayesian approaches to derive such bounds. The bounds do not require additional hold-out data, because the out-of-bag samples from the bagging in the training process can be exploited. A random forest predicts by taking a majority vote of an ensemble of decision trees. The first approach is to bound the error of the vote by twice the error of the corresponding Gibbs classifier (classifying with a single member of the ensemble selected at random). However, this approach does not take into account the effect of averaging out of errors of individual classifiers when taking the majority vote. This effect provides a significant boost in performance when the errors are independent or negatively correlated, but when the correlations are strong the advantage from taking the majority vote is small. The second approach based on PAC-Bayesian C-bounds takes dependencies between ensemble members into account, but it requires estimating correlations between the errors of the individual classifiers. When the correlations are high or the estimation is poor, the bounds degrade. In our experiments, we compute generalization bounds for random forests on various benchmark data sets. Because the individual decision trees already perform well, their predictions are highly correlated and the C-bounds do not lead to satisfactory results. For the same reason, the bounds based on the analysis of Gibbs classifiers are typically superior and often reasonably tight. Bounds based on a validation set coming at the cost of a smaller training set gave better performance guarantees, but worse performance in most experiments

    Compact Oblivious Routing

    Get PDF
    Oblivious routing is an attractive paradigm for large distributed systems in which centralized control and frequent reconfigurations are infeasible or undesired (e.g., costly). Over the last almost 20 years, much progress has been made in devising oblivious routing schemes that guarantee close to optimal load and also algorithms for constructing such schemes efficiently have been designed. However, a common drawback of existing oblivious routing schemes is that they are not compact: they require large routing tables (of polynomial size), which does not scale. This paper presents the first oblivious routing scheme which guarantees close to optimal load and is compact at the same time - requiring routing tables of polylogarithmic size. Our algorithm maintains the polylogarithmic competitive ratio of existing algorithms, and is hence particularly well-suited for emerging large-scale networks
    • …
    corecore