262 research outputs found
Classification Tree Pruning Under Covariate Shift
We consider the problem of \emph{pruning} a classification tree, that is,
selecting a suitable subtree that balances bias and variance, in common
situations with inhomogeneous training data. Namely, assuming access to mostly
data from a distribution , but little data from a desired
distribution with different -marginals, we present the first
efficient procedure for optimal pruning in such situations, when
cross-validation and other penalized variants are grossly inadequate.
Optimality is derived with respect to a notion of \emph{average discrepancy}
(averaged over space) which significantly relaxes a
recent notion -- termed \emph{transfer-exponent} -- shown to tightly capture
the limits of classification under such a distribution shift. Our relaxed
notion can be viewed as a measure of \emph{relative dimension} between
distributions, as it relates to existing notions of information such as the
Minkowski and Renyi dimensions.Comment: 38 pages, 8 figure
Risk Bounds for Embedded Variable Selection in Classification Trees
International audienceThe problems of model and variable selections for classification trees are jointly considered. A penalized criterion is proposed which explicitly takes into account the number of variables, and a risk bound inequality is provided for the tree classifier minimizing this criterion. This penalized criterion is compared to the one used during the pruning step of the CART algorithm. It is shown that the two criteria are similar under some specific margin assumptions. In practice, the tuning parameter of the CART penalty has to be calibrated by hold-out. Simulation studies are performed which confirm that the hold-out procedure mimics the form of the proposed penalized criterion
Recommended from our members
Combinatorial Optimization (hybrid meeting)
Combinatorial Optimization deals with optimization problems defined on combinatorial structures such as graphs and networks. Motivated by diverse practical problem setups, the topic has developed into a rich mathematical discipline with many connections to other fields of Mathematics (such as, e.g., Combinatorics, Convex Optimization and Geometry, and Real Algebraic Geometry). It also has strong ties to Theoretical Computer Science and Operations Research. A series of Oberwolfach Workshops have been crucial for establishing and developing the field. The workshop we report about was a particularly exciting event - due to the depth of results that were presented, the spectrum of developments that became apparent from the talks, the breadth of the connections to other mathematical fields that were explored, and last but not least because for many of the particiants it was the first opportunity to exchange ideas and to collaborate during an on-site workshop since almost two years
Recommended from our members
New Inference Concepts for Analysing Complex Data
The main purpose of this workshop was to assemble international leaders from statistics and machine learning to identify important research problems, to cross-fertilize between the disciplines, and to ultimately start coordinated research efforts toward better solutions. The workshop focused on discussing modern methods for analysis complex high dimensional data with applications to econometrics, finance, biomedicine, genomics etc
An Analogue-Digital Model of Computation: Turing Machines with Physical Oracles
We introduce an abstract analogue-digital model of computation that couples Turing machines to oracles that are physical processes. Since any oracle has the potential to boost the computational power of a Turing machine, the effect on the power of the Turing machine of adding a physical process raises interesting questions. Do physical processes add significantly to the power of Turing machines; can they break the Turing Barrier? Does the power of the Turing machine vary with different physical processes? Specifically, here, we take a physical oracle to be a physical experiment, controlled by the Turing machine, that measures some physical quantity. There are three protocols of communication between the Turing machine and the oracle that simulate the types of error propagation common to analogue-digital devices, namely: infinite precision, unbounded precision, and fixed precision. These three types of precision introduce three variants of the physical oracle model. On fixing one archetypal experiment, we show how to classify the computational power of the three models by establishing the lower and upper bounds. Using new techniques and ideas about timing, we give a complete classification.info:eu-repo/semantics/publishedVersio
Computations with oracles that measure vanishing quantities
info:eu-repo/semantics/publishedVersio
- …