Search CORE

317 research outputs found

Statistical Sources of Variable Selection Bias in Classification Tree Algorithms Based on the Gini Index

Author: Strobl Carolin
Publication venue
Publication date: 01/01/2005
Field of study

Evidence for variable selection bias in classification tree algorithms based on the Gini Index is reviewed from the literature and embedded into a broader explanatory scheme: Variable selection bias in classification tree algorithms based on the Gini Index can be caused not only by the statistical effect of multiple comparisons, but also by an increasing estimation bias and variance of the splitting criterion when plug-in estimates of entropy measures like the Gini Index are employed. The relevance of these sources of variable selection bias in the different simulation study designs is examined. Variable selection bias due to the explored sources applies to all classification tree algorithms based on empirical entropy measures like the Gini Index, Deviance and Information Gain, and to both binary and multiway splitting algorithms

Open Access LMU

Complexes of not $i$ -connected graphs

Author: Babson Eric
Björner Anders
Linusson Svante
Shareshian John
Welker Volkmar
Publication venue
Publication date: 01/01/1996
Field of study

Complexes of (not) connected graphs, hypergraphs and their homology appear in the construction of knot invariants given by V. Vassiliev. In this paper we study the complexes of not

i

-connected

k

-hypergraphs on

n

vertices. We show that the complex of not

2

-connected graphs has the homotopy type of a wedge of

(n-2)!

spheres of dimension

2n-5

. This answers one of the questions raised by Vassiliev in connection with knot invariants. For this case the

S_n

-action on the homology of the complex is also determined. For complexes of not

2

-connected

k

-hypergraphs we provide a formula for the generating function of the Euler characteristic, and we introduce certain lattices of graphs that encode their topology. We also present partial results for some other cases. In particular, we show that the complex of not

(n-2)

-connected graphs is Alexander dual to the complex of partial matchings of the complete graph. For not

(n-3)

-connected graphs we provide a formula for the generating function of the Euler characteristic

arXiv.org e-Print Archive

CiteSeerX

On partitioning multivariate self-affine time series

Author: Taylor Christopher
Salhi Abdel
Publication venue: Institute of Electrical and Electronics Engineers
Publication date: 01/01/2017
Field of study

Given a multivariate time series, possibly of high dimension, with unknown and time-varying joint distribution, it is of interest to be able to completely partition the time series into disjoint, contiguous subseries, each of which has different distributional or pattern attributes from the preceding and succeeding subseries. An additional feature of many time series is that they display self-affinity, so that subseries at one time scale are similar to subseries at another after application of an affine transformation. Such qualities are observed in time series from many disciplines, including biology, medicine, economics, finance, and computer science. This paper defines the relevant multiobjective combinatorial optimization problem with limited assumptions as a biobjective one, and a specialized evolutionary algorithm is presented which finds optimal self-affine time series partitionings with a minimum of choice parameters. The algorithm not only finds partitionings for all possible numbers of partitions given data constraints, but also for self-affinities between these partitionings and some fine-grained partitioning. The resulting set of Pareto-efficient solution sets provides a rich representation of the self-affine properties of a multivariate time series at different locations and time scales

University of Essex Research Repository

Crossref

Biblioteca Digital de la Comunidad de Madrid

Unbiased split selection for classification trees based on the Gini Index

Author: Augustin Thomas
Boulesteix Anne-Laure
Strobl Carolin
Publication venue
Publication date: 01/01/2005
Field of study

The Gini gain is one of the most common variable selection criteria in machine learning. We derive the exact distribution of the maximally selected Gini gain in the context of binary classification using continuous predictors by means of a combinatorial approach. This distribution provides a formal support for variable selection bias in favor of variables with a high amount of missing values when the Gini gain is used as split selection criterion, and we suggest to use the resulting p-value as an unbiased split selection criterion in recursive partitioning algorithms. We demonstrate the efficiency of our novel method in simulation- and real data- studies from veterinary gynecology in the context of binary classification and continuous predictor variables with different numbers of missing values. Our method is extendible to categorical and ordinal predictor variables and to other split selection criteria such as the cross-entropy criterion

CiteSeerX

Open Access LMU

On partitioning multivariate self-affine time series

Author: Salhi Abdel
Taylor Christopher
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 24/04/2017
Field of study

University of Essex Research Repository

University of Liverpool Repository

Crossref

Tree stability diagnostics and some remedies against instability

Author: Dannegger Felix
Publication venue
Publication date: 01/01/1997
Field of study

Stability aspects of recursive partitioning procedures are investigated. Using resampling techniques, diagnostic tools to assess single split stability and overall tree stability are introduced. To correct for the procedure's preference for covariates with many unique realizations, corrected p-values are used in the factor selection component of the algorithm. Finally, methods to stabilize tree based predictors are discussed

Open Access LMU