Search CORE

262 research outputs found

Classification Tree Pruning Under Covariate Shift

Author: Galbraith Nicholas
Kpotufe Samory
Publication venue
Publication date: 21/06/2023
Field of study

We consider the problem of \emph{pruning} a classification tree, that is, selecting a suitable subtree that balances bias and variance, in common situations with inhomogeneous training data. Namely, assuming access to mostly data from a distribution

P_{X, Y}

, but little data from a desired distribution

Q_{X, Y}

with different

X

-marginals, we present the first efficient procedure for optimal pruning in such situations, when cross-validation and other penalized variants are grossly inadequate. Optimality is derived with respect to a notion of \emph{average discrepancy}

P_{X} \to Q_{X}

(averaged over

X

space) which significantly relaxes a recent notion -- termed \emph{transfer-exponent} -- shown to tightly capture the limits of classification under such a distribution shift. Our relaxed notion can be viewed as a measure of \emph{relative dimension} between distributions, as it relates to existing notions of information such as the Minkowski and Renyi dimensions.Comment: 38 pages, 8 figure

arXiv.org e-Print Archive

Risk Bounds for Embedded Variable Selection in Classification Trees

Author: Gey Servane
Mary-Huard Tristan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

International audienceThe problems of model and variable selections for classification trees are jointly considered. A penalized criterion is proposed which explicitly takes into account the number of variables, and a risk bound inequality is provided for the tree classifier minimizing this criterion. This penalized criterion is compared to the one used during the pruning step of the CART algorithm. It is shown that the two criteria are similar under some specific margin assumptions. In practice, the tuning parameter of the CART penalty has to be calibrated by hold-out. Simulation studies are performed which confirm that the hold-out procedure mimics the form of the proposed penalized criterion

HAL Descartes

Recommended from our members

Combinatorial Optimization (hybrid meeting)

Author
Publication venue: Zürich : EMS Publ. House
Publication date: 01/01/2021
Field of study

Combinatorial Optimization deals with optimization problems defined on combinatorial structures such as graphs and networks. Motivated by diverse practical problem setups, the topic has developed into a rich mathematical discipline with many connections to other fields of Mathematics (such as, e.g., Combinatorics, Convex Optimization and Geometry, and Real Algebraic Geometry). It also has strong ties to Theoretical Computer Science and Operations Research. A series of Oberwolfach Workshops have been crucial for establishing and developing the field. The workshop we report about was a particularly exciting event - due to the depth of results that were presented, the spectrum of developments that became apparent from the talks, the breadth of the connections to other mathematical fields that were explored, and last but not least because for many of the particiants it was the first opportunity to exchange ideas and to collaborate during an on-site workshop since almost two years

Repositorium für Naturwissenschaften und Technik

Recommended from our members

New Inference Concepts for Analysing Complex Data

Author
Publication venue: Zürich : EMS Publ. House
Publication date: 01/01/2004
Field of study

The main purpose of this workshop was to assemble international leaders from statistics and machine learning to identify important research problems, to cross-fertilize between the disciplines, and to ultimately start coordinated research efforts toward better solutions. The workshop focused on discussing modern methods for analysis complex high dimensional data with applications to econometrics, finance, biomedicine, genomics etc

Repositorium für Naturwissenschaften und Technik

An Analogue-Digital Model of Computation: Turing Machines with Physical Oracles

Author: CG Hempel
D Woods
HT Siegelmann
O Bournez
R Geroch
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

We introduce an abstract analogue-digital model of computation that couples Turing machines to oracles that are physical processes. Since any oracle has the potential to boost the computational power of a Turing machine, the effect on the power of the Turing machine of adding a physical process raises interesting questions. Do physical processes add significantly to the power of Turing machines; can they break the Turing Barrier? Does the power of the Turing machine vary with different physical processes? Specifically, here, we take a physical oracle to be a physical experiment, controlled by the Turing machine, that measures some physical quantity. There are three protocols of communication between the Turing machine and the oracle that simulate the types of error propagation common to analogue-digital devices, namely: infinite precision, unbounded precision, and fixed precision. These three types of precision introduce three variants of the physical oracle model. On fixing one archetypal experiment, we show how to classify the computational power of the three models by establishing the lower and upper bounds. Using new techniques and ideas about timing, we give a complete classification.info:eu-repo/semantics/publishedVersio

Crossref

Cronfa at Swansea University

Universidade de Lisboa: Repositório.UL