Search CORE

187,389 research outputs found

Non-uniform Feature Sampling for Decision Tree Ensembles

Author: Kyrillidis Anastasios
Zouzias Anastasios
Publication venue
Publication date: 24/03/2014
Field of study

We study the effectiveness of non-uniform randomized feature selection in decision tree classification. We experimentally evaluate two feature selection methodologies, based on information extracted from the provided dataset:

(i)

\emph{leverage scores-based} and

(ii)

\emph{norm-based} feature selection. Experimental evaluation of the proposed feature selection techniques indicate that such approaches might be more effective compared to naive uniform feature selection and moreover having comparable performance to the random forest algorithm [3]Comment: 7 pages, 7 figures, 1 tabl

arXiv.org e-Print Archive

Crossref

Poor performance of broadleaf plantations and possible remedial silvicultural systems - a review

Author: Hawe Jerry
Short Ian
Publication venue: Society of Irish Foresters
Publication date: 01/12/2012
Field of study

Peer-reviewedOver the last two decades planting of broadleaves has been part of forest policy. In addition to the provision of a range of ecosystem services, it is intended that this resource will have a direct economic stimulus through the supply of quality hardwood. A number of challenges must be met in order to achieve this objective, particularly as current observations would indicate that many first rotation broadleaf plantations comprise a relatively high proportion of poor quality stems. A literature review has been carried out on the probable causes of poor performance in broadleaf crops. Silvicultural systems to rehabilitate poor quality stands are discussed. Subsequent papers will deal with these silvicultural systems in more detail.COFOR

T-Stór

Irish Universities

Kinetic Solvers with Adaptive Mesh in Phase Space

Author: Arslanbekov Robert R.
Frolova Anna A.
Kolobov Vladimir I.
Publication venue: 'American Physical Society (APS)'
Publication date: 01/01/2012
Field of study

An Adaptive Mesh in Phase Space (AMPS) methodology has been developed for solving multi-dimensional kinetic equations by the discrete velocity method. A Cartesian mesh for both configuration (r) and velocity (v) spaces is produced using a tree of trees data structure. The mesh in r-space is automatically generated around embedded boundaries and dynamically adapted to local solution properties. The mesh in v-space is created on-the-fly for each cell in r-space. Mappings between neighboring v-space trees implemented for the advection operator in configuration space. We have developed new algorithms for solving the full Boltzmann and linear Boltzmann equations with AMPS. Several recent innovations were used to calculate the discrete Boltzmann collision integral with dynamically adaptive mesh in velocity space: importance sampling, multi-point projection method, and the variance reduction method. We have developed an efficient algorithm for calculating the linear Boltzmann collision integral for elastic and inelastic collisions in a Lorentz gas. New AMPS technique has been demonstrated for simulations of hypersonic rarefied gas flows, ion and electron kinetics in weakly ionized plasma, radiation and light particle transport through thin films, and electron streaming in semiconductors. We have shown that AMPS allows minimizing the number of cells in phase space to reduce computational cost and memory usage for solving challenging kinetic problems

arXiv.org e-Print Archive

Crossref

uBoost: A boosting method for producing uniform selection efficiencies from multivariate classifiers

Author: Stevens Justin
Williams Mike
Publication venue: 'IOP Publishing'
Publication date: 01/06/2013
Field of study

The use of multivariate classifiers, especially neural networks and decision trees, has become commonplace in particle physics. Typically, a series of classifiers is trained rather than just one to enhance the performance; this is known as boosting. This paper presents a novel method of boosting that produces a uniform selection efficiency in a user-defined multivariate space. Such a technique is ideally suited for amplitude analyses or other situations where optimizing a single integrated figure of merit is not what is desired

arXiv.org e-Print Archive

DSpace@MIT

Crossref

Detection of Uniform and Non-Uniform Differential Item Functioning by Item Focussed Trees

Author: Berger Moritz
Tutz Gerhard
Publication venue
Publication date: 23/11/2015
Field of study

Detection of differential item functioning by use of the logistic modelling approach has a long tradition. One big advantage of the approach is that it can be used to investigate non-uniform DIF as well as uniform DIF. The classical approach allows to detect DIF by distinguishing between multiple groups. We propose an alternative method that is a combination of recursive partitioning methods (or trees) and logistic regression methodology to detect uniform and non-uniform DIF in a nonparametric way. The output of the method are trees that visualize in a simple way the structure of DIF in an item showing which variables are interacting in which way when generating DIF. In addition we consider a logistic regression method in which DIF can by induced by a vector of covariates, which may include categorical but also continuous covariates. The methods are investigated in simulation studies and illustrated by two applications.Comment: 32 pages, 13 figures, 7 table

arXiv.org e-Print Archive

Crossref

Open Access LMU

ABC random forests for Bayesian parameter inference

Author: Estoup Arnaud
Marin Jean-Michel
Pudlo Pierre
Raynal Louis
Ribatet Mathieu
Robert Christian P.
Publication venue: 'Peer Community In'
Publication date: 02/11/2018
Field of study

This preprint has been reviewed and recommended by Peer Community In Evolutionary Biology (http://dx.doi.org/10.24072/pci.evolbiol.100036). Approximate Bayesian computation (ABC) has grown into a standard methodology that manages Bayesian inference for models associated with intractable likelihood functions. Most ABC implementations require the preliminary selection of a vector of informative statistics summarizing raw data. Furthermore, in almost all existing implementations, the tolerance level that separates acceptance from rejection of simulated parameter values needs to be calibrated. We propose to conduct likelihood-free Bayesian inferences about parameters with no prior selection of the relevant components of the summary statistics and bypassing the derivation of the associated tolerance level. The approach relies on the random forest methodology of Breiman (2001) applied in a (non parametric) regression setting. We advocate the derivation of a new random forest for each component of the parameter vector of interest. When compared with earlier ABC solutions, this method offers significant gains in terms of robustness to the choice of the summary statistics, does not depend on any type of tolerance level, and is a good trade-off in term of quality of point estimator precision and credible interval estimations for a given computing time. We illustrate the performance of our methodological proposal and compare it with earlier ABC methods on a Normal toy example and a population genetics example dealing with human population evolution. All methods designed here have been incorporated in the R package abcrf (version 1.7) available on CRAN.Comment: Main text: 24 pages, 6 figures Supplementary Information: 14 pages, 5 figure

arXiv.org e-Print Archive

HAL AMU

INRIA a CCSD electronic archive server

HAL Descartes

HAL-IRD

Warwick Research Archives Portal Repository

HAL-CIRAD

On PAC-Bayesian Bounds for Random Forests

Author: Igel Christian
Lorenzen Stephan Sloth
Seldin Yevgeny
Publication venue
Publication date: 01/01/2019
Field of study

Existing guarantees in terms of rigorous upper bounds on the generalization error for the original random forest algorithm, one of the most frequently used machine learning methods, are unsatisfying. We discuss and evaluate various PAC-Bayesian approaches to derive such bounds. The bounds do not require additional hold-out data, because the out-of-bag samples from the bagging in the training process can be exploited. A random forest predicts by taking a majority vote of an ensemble of decision trees. The first approach is to bound the error of the vote by twice the error of the corresponding Gibbs classifier (classifying with a single member of the ensemble selected at random). However, this approach does not take into account the effect of averaging out of errors of individual classifiers when taking the majority vote. This effect provides a significant boost in performance when the errors are independent or negatively correlated, but when the correlations are strong the advantage from taking the majority vote is small. The second approach based on PAC-Bayesian C-bounds takes dependencies between ensemble members into account, but it requires estimating correlations between the errors of the individual classifiers. When the correlations are high or the estimation is poor, the bounds degrade. In our experiments, we compute generalization bounds for random forests on various benchmark data sets. Because the individual decision trees already perform well, their predictions are highly correlated and the C-bounds do not lead to satisfactory results. For the same reason, the bounds based on the analysis of Gibbs classifiers are typically superior and often reasonably tight. Bounds based on a validation set coming at the cost of a smaller training set gave better performance guarantees, but worse performance in most experiments

arXiv.org e-Print Archive

Copenhagen University Research Information System

Compact Oblivious Routing

Author: Schmid Stefan
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 27th Annual European Symposium on Algorithms (ESA 2019)
Publication date: 01/01/2019
Field of study

Oblivious routing is an attractive paradigm for large distributed systems in which centralized control and frequent reconfigurations are infeasible or undesired (e.g., costly). Over the last almost 20 years, much progress has been made in devising oblivious routing schemes that guarantee close to optimal load and also algorithms for constructing such schemes efficiently have been designed. However, a common drawback of existing oblivious routing schemes is that they are not compact: they require large routing tables (of polynomial size), which does not scale. This paper presents the first oblivious routing scheme which guarantees close to optimal load and is compact at the same time - requiring routing tables of polylogarithmic size. Our algorithm maintains the polylogarithmic competitive ratio of existing algorithms, and is hence particularly well-suited for emerging large-scale networks

Dagstuhl Research Online Publication Server