4,847 research outputs found
High-Dimensional Feature Selection by Feature-Wise Kernelized Lasso
The goal of supervised feature selection is to find a subset of input
features that are responsible for predicting output values. The least absolute
shrinkage and selection operator (Lasso) allows computationally efficient
feature selection based on linear dependency between input features and output
values. In this paper, we consider a feature-wise kernelized Lasso for
capturing non-linear input-output dependency. We first show that, with
particular choices of kernel functions, non-redundant features with strong
statistical dependence on output values can be found in terms of kernel-based
independence measures. We then show that the globally optimal solution can be
efficiently computed; this makes the approach scalable to high-dimensional
problems. The effectiveness of the proposed method is demonstrated through
feature selection experiments with thousands of features.Comment: 18 page
Feature selection guided by structural information
In generalized linear regression problems with an abundant number of
features, lasso-type regularization which imposes an -constraint on the
regression coefficients has become a widely established technique. Deficiencies
of the lasso in certain scenarios, notably strongly correlated design, were
unmasked when Zou and Hastie [J. Roy. Statist. Soc. Ser. B 67 (2005) 301--320]
introduced the elastic net. In this paper we propose to extend the elastic net
by admitting general nonnegative quadratic constraints as a second form of
regularization. The generalized ridge-type constraint will typically make use
of the known association structure of features, for example, by using temporal-
or spatial closeness. We study properties of the resulting "structured elastic
net" regression estimation procedure, including basic asymptotics and the issue
of model selection consistency. In this vein, we provide an analog to the
so-called "irrepresentable condition" which holds for the lasso. Moreover, we
outline algorithmic solutions for the structured elastic net within the
generalized linear model family. The rationale and the performance of our
approach is illustrated by means of simulated and real world data, with a focus
on signal regression.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS302 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …