Search CORE

158,944 research outputs found

Correcting for selection bias via cross-validation in the classification of microarray data

Author: Chevelu J.
McLachlan G. J.
Zhu J.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2008
Field of study

There is increasing interest in the use of diagnostic rules based on microarray data. These rules are formed by considering the expression levels of thousands of genes in tissue samples taken on patients of known classification with respect to a number of classes, representing, say, disease status or treatment strategy. As the final versions of these rules are usually based on a small subset of the available genes, there is a selection bias that has to be corrected for in the estimation of the associated error rates. We consider the problem using cross-validation. In particular, we present explicit formulae that are useful in explaining the layers of validation that have to be performed in order to avoid improperly cross-validated estimates.Comment: Published in at http://dx.doi.org/10.1214/193940307000000284 the IMS Collections (http://www.imstat.org/publications/imscollections.htm) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

University of Queensland eSpace

Kernel density estimation on the torus

Author: Agnese Panzera
Bai
Batschelet
Beran
Berens
Bowman
Charles C. Taylor
Chow
Coles
Duin
Duong
Efromovich
Efromovich
Fisher
Habbema
Hall
Jammalamadaka
Jones
Kabsch
Kato
Klemelä
Lejeune
Loader
Marco Di Marzio
Mardia
Mardia
Mardia
Pewsey
Prakasa Rao
Rizzo
Rudemo
Scott
Serfling
Silverman
Singh
Stuetzle
Taylor
Taylor
Publication venue: 'Elsevier BV'
Publication date: 01/06/2011
Field of study

Kernel density estimation for multivariate, circular data has been formulated only when the sample space is the sphere, but theory for the torus would also be useful. For data lying on a d-dimensional torus (d >= 1), we discuss kernel estimation of a density, its mixed partial derivatives, and their squared functionals. We introduce a specific class of product kernels whose order is suitably defined in such a way to obtain L-2-risk formulas whose structure can be compared to their Euclidean counterparts. Our kernels are based on circular densities; however, we also discuss smaller bias estimation involving negative kernels which are functions of circular densities. Practical rules for selecting the smoothing degree, based on cross-validation, bootstrap and plug-in ideas are derived. Moreover, we provide specific results on the use of kernels based on the von Mises density. Finally, real-data examples and simulation studies illustrate the findings

Crossref

White Rose Research Online

Growing Regression Forests by Classification: Applications to Object Pose Estimation

Author: Chellappa Rama
Hara Kota
Publication venue
Publication date: 01/01/2014
Field of study

In this work, we propose a novel node splitting method for regression trees and incorporate it into the regression forest framework. Unlike traditional binary splitting, where the splitting rule is selected from a predefined set of binary splitting rules via trial-and-error, the proposed node splitting method first finds clusters of the training data which at least locally minimize the empirical loss without considering the input space. Then splitting rules which preserve the found clusters as much as possible are determined by casting the problem into a classification problem. Consequently, our new node splitting method enjoys more freedom in choosing the splitting rules, resulting in more efficient tree structures. In addition to the Euclidean target space, we present a variant which can naturally deal with a circular target space by the proper use of circular statistics. We apply the regression forest employing our node splitting to head pose estimation (Euclidean target space) and car direction estimation (circular target space) and demonstrate that the proposed method significantly outperforms state-of-the-art methods (38.5% and 22.5% error reduction respectively).Comment: Paper accepted by ECCV 201

arXiv.org e-Print Archive

Crossref

Deriving Models for Software Project Effort Estimation By Means of Genetic Programming

Author: Dounias Georgios
Tsakonas Athanasios
Publication venue: KDIR-2009 Workshop (INSTICC Conference)
Publication date: 01/01/2009
Field of study

Software engineering, effort estimation, genetic programming, symbolic regression. This paper presents the application of a computational intelligence methodology in effort estimation for software projects. Namely, we apply a genetic programming model for symbolic regression; aiming to produce mathematical expressions that (1) are highly accurate and (2) can be used for estimating the development effort by revealing relationships between the project’s features and the required work. We selected to investigate the effectiveness of this methodology into two software engineering domains. The system was proved able to generate models in the form of handy mathematical expressions that are more accurate than those found in literature.

CiteSeerX

Bournemouth University Research Online

Efficient Smoothed Concomitant Lasso Estimation for High Dimensional Regression

Author: Fercoq Olivier
Gramfort Alexandre
Leclère Vincent
Ndiaye Eugene
Salmon Joseph
Publication venue: 'IOP Publishing'
Publication date: 08/06/2016
Field of study

In high dimensional settings, sparse structures are crucial for efficiency, both in term of memory, computation and performance. It is customary to consider

\ell_1

penalty to enforce sparsity in such scenarios. Sparsity enforcing methods, the Lasso being a canonical example, are popular candidates to address high dimension. For efficiency, they rely on tuning a parameter trading data fitting versus sparsity. For the Lasso theory to hold this tuning parameter should be proportional to the noise level, yet the latter is often unknown in practice. A possible remedy is to jointly optimize over the regression parameter as well as over the noise level. This has been considered under several names in the literature: Scaled-Lasso, Square-root Lasso, Concomitant Lasso estimation for instance, and could be of interest for confidence sets or uncertainty quantification. In this work, after illustrating numerical difficulties for the Smoothed Concomitant Lasso formulation, we propose a modification we coined Smoothed Concomitant Lasso, aimed at increasing numerical stability. We propose an efficient and accurate solver leading to a computational cost no more expansive than the one for the Lasso. We leverage on standard ingredients behind the success of fast Lasso solvers: a coordinate descent algorithm, combined with safe screening rules to achieve speed efficiency, by eliminating early irrelevant features

arXiv.org e-Print Archive

HAL-Ecole des Ponts ParisTech

Bandwidth choice for nonparametric classification

Author: Hall Peter
Kang Kee-Hoon
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2005
Field of study

It is shown that, for kernel-based classification with univariate distributions and two populations, optimal bandwidth choice has a dichotomous character. If the two densities cross at just one point, where their curvatures have the same signs, then minimum Bayes risk is achieved using bandwidths which are an order of magnitude larger than those which minimize pointwise estimation error. On the other hand, if the curvature signs are different, or if there are multiple crossing points, then bandwidths of conventional size are generally appropriate. The range of different modes of behavior is narrower in multivariate settings. There, the optimal size of bandwidth is generally the same as that which is appropriate for pointwise density estimation. These properties motivate empirical rules for bandwidth choice.Comment: Published at http://dx.doi.org/10.1214/009053604000000959 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref

The Australian National University