133 research outputs found
Better subset regression
To find efficient screening methods for high dimensional linear regression
models, this paper studies the relationship between model fitting and screening
performance. Under a sparsity assumption, we show that a subset that includes
the true submodel always yields smaller residual sum of squares (i.e., has
better model fitting) than all that do not in a general asymptotic setting.
This indicates that, for screening important variables, we could follow a
"better fitting, better screening" rule, i.e., pick a "better" subset that has
better model fitting. To seek such a better subset, we consider the
optimization problem associated with best subset regression. An EM algorithm,
called orthogonalizing subset screening, and its accelerating version are
proposed for searching for the best subset. Although the two algorithms cannot
guarantee that a subset they yield is the best, their monotonicity property
makes the subset have better model fitting than initial subsets generated by
popular screening methods, and thus the subset can have better screening
performance asymptotically. Simulation results show that our methods are very
competitive in high dimensional variable screening even for finite sample
sizes.Comment: 24 pages, 1 figur
Exhuming nonnegative garrote from oblivion using suitable initial estimates- illustration in low and high-dimensional real data
The nonnegative garrote (NNG) is among the first approaches that combine
variable selection and shrinkage of regression estimates. When more than the
derivation of a predictor is of interest, NNG has some conceptual advantages
over the popular lasso. Nevertheless, NNG has received little attention. The
original NNG relies on least-squares (OLS) estimates, which are highly variable
in data with a high degree of multicollinearity (HDM) and do not exist in
high-dimensional data (HDD). This might be the reason that NNG is not used in
such data. Alternative initial estimates have been proposed but hardly used in
practice. Analyzing three structurally different data sets, we demonstrated
that NNG can also be applied in HDM and HDD and compared its performance with
the lasso, adaptive lasso, relaxed lasso, and best subset selection in terms of
variables selected, regression estimates, and prediction. Replacing OLS by
ridge initial estimates in HDM and lasso initial estimates in HDD helped NNG
select simpler models than competing approaches without much increase in
prediction errors. Simpler models are easier to interpret, an important issue
for descriptive modelling. Based on the limited experience from three datasets,
we assume that the NNG can be a suitable alternative to the lasso and its
extensions. Neutral comparison simulation studies are needed to better
understand the properties of variable selection methods, compare them and
derive guidance for practice
Extreme Value Analysis of Empirical Frame Coefficients and Implications for Denoising by Soft-Thresholding
Denoising by frame thresholding is one of the most basic and efficient
methods for recovering a discrete signal or image from data that are corrupted
by additive Gaussian white noise. The basic idea is to select a frame of
analyzing elements that separates the data in few large coefficients due to the
signal and many small coefficients mainly due to the noise \epsilon_n. Removing
all data coefficients being in magnitude below a certain threshold yields a
reconstruction of the original signal. In order to properly balance the amount
of noise to be removed and the relevant signal features to be kept, a precise
understanding of the statistical properties of thresholding is important. For
that purpose we derive the asymptotic distribution of max_{\omega \in \Omega_n}
|| for a wide class of redundant frames
(\phi_\omega^n: \omega \in \Omega_n}. Based on our theoretical results we give
a rationale for universal extreme value thresholding techniques yielding
asymptotically sharp confidence regions and smoothness estimates corresponding
to prescribed significance levels. The results cover many frames used in
imaging and signal recovery applications, such as redundant wavelet systems,
curvelet frames, or unions of bases. We show that `generically' a standard
Gumbel law results as it is known from the case of orthonormal wavelet bases.
However, for specific highly redundant frames other limiting laws may occur. We
indeed verify that the translation invariant wavelet transform shows a
different asymptotic behaviour.Comment: [Content: 39 pages, 4 figures] Note that in this version 4 we have
slightely changed the title of the paper and we have rewritten parts of the
introduction. Except for corrected typos the other parts of the paper are the
same as the original versions
Convex and non-convex regularization methods for spatial point processes intensity estimation
This paper deals with feature selection procedures for spatial point
processes intensity estimation. We consider regularized versions of estimating
equations based on Campbell theorem derived from two classical functions:
Poisson likelihood and logistic regression likelihood. We provide general
conditions on the spatial point processes and on penalty functions which ensure
consistency, sparsity and asymptotic normality. We discuss the numerical
implementation and assess finite sample properties in a simulation study.
Finally, an application to tropical forestry datasets illustrates the use of
the proposed methods
Feature selection when there are many influential features
Recent discussion of the success of feature selection methods has argued that
focusing on a relatively small number of features has been counterproductive.
Instead, it is suggested, the number of significant features can be in the
thousands or tens of thousands, rather than (as is commonly supposed at
present) approximately in the range from five to fifty. This change, in orders
of magnitude, in the number of influential features, necessitates alterations
to the way in which we choose features and to the manner in which the success
of feature selection is assessed. In this paper, we suggest a general approach
that is suited to cases where the number of relevant features is very large,
and we consider particular versions of the approach in detail. We propose ways
of measuring performance, and we study both theoretical and numerical
properties of the proposed methodology.Comment: Published in at http://dx.doi.org/10.3150/13-BEJ536 the Bernoulli
(http://isi.cbs.nl/bernoulli/) by the International Statistical
Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm
- …