Search CORE

4,093 research outputs found

On non-asymptotic bounds for estimation in generalized linear models with highly correlated design

Author: van de Geer Sara A.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2007
Field of study

We study a high-dimensional generalized linear model and penalized empirical risk minimization with

\ell_1

penalty. Our aim is to provide a non-trivial illustration that non-asymptotic bounds for the estimator can be obtained without relying on the chaining technique and/or the peeling device.Comment: Published at http://dx.doi.org/10.1214/074921707000000319 in the IMS Lecture Notes Monograph Series (http://www.imstat.org/publications/lecnotes.htm) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

The Smooth-Lasso and other $\ell_1+\ell_2$ -penalized methods

Author: Hebiri Mohamed
Van De Geer Sara A.
Publication venue
Publication date: 07/10/2011
Field of study

We consider a linear regression problem in a high dimensional setting where the number of covariates

p

can be much larger than the sample size

n

. In such a situation, one often assumes sparsity of the regression vector, \textit i.e., the regression vector contains many zero components. We propose a Lasso-type estimator

\hat{\beta}^{Quad}

(where '

Quad

' stands for quadratic) which is based on two penalty terms. The first one is the

\ell_1

norm of the regression coefficients used to exploit the sparsity of the regression as done by the Lasso estimator, whereas the second is a quadratic penalty term introduced to capture some additional information on the setting of the problem. We detail two special cases: the Elastic-Net

\hat{\beta}^{EN}

, which deals with sparse problems where correlations between variables may exist; and the Smooth-Lasso

\hat{\beta}^{SL}

, which responds to sparse problems where successive regression coefficients are known to vary slowly (in some situations, this can also be interpreted in terms of correlations between successive variables). From a theoretical point of view, we establish variable selection consistency results and show that

\hat{\beta}^{Quad}

achieves a Sparsity Inequality, \textit i.e., a bound in terms of the number of non-zero components of the 'true' regression vector. These results are provided under a weaker assumption on the Gram matrix than the one used by the Lasso. In some situations this guarantees a significant improvement over the Lasso. Furthermore, a simulation study is conducted and shows that the S-Lasso

\hat{\beta}^{SL}

performs better than known methods as the Lasso, the Elastic-Net

\hat{\beta}^{EN}

, and the Fused-Lasso with respect to the estimation accuracy. This is especially the case when the regression vector is 'smooth', \textit i.e., when the variations between successive coefficients of the unknown parameter of the regression are small. The study also reveals that the theoretical calibration of the tuning parameters and the one based on 10 fold cross validation imply two S-Lasso solutions with close performance

arXiv.org e-Print Archive

HAL - UPEC / UPEM

On the conditions used to prove oracle results for the Lasso

Author: Bühlmann Peter
van de Geer Sara A.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2009
Field of study

Oracle inequalities and variable selection properties for the Lasso in linear models have been established under a variety of different assumptions on the design matrix. We show in this paper how the different conditions and concepts relate to each other. The restricted eigenvalue condition (Bickel et al., 2009) or the slightly weaker compatibility condition (van de Geer, 2007) are sufficient for oracle results. We argue that both these conditions allow for a fairly general class of design matrices. Hence, optimality of the Lasso for prediction and estimation holds for more general situations than what it appears from coherence (Bunea et al, 2007b,c) or restricted isometry (Candes and Tao, 2005) assumptions.Comment: 33 pages, 1 figur

arXiv.org e-Print Archive

Crossref

Morphology of articular surfaces can solve a phylogenetic issue: one instead of two ancestors for Candiacervus (Mammalia: Cervoidea)

Author: Drinia Hara
Geer Alexandra A. van der
Lyras George
Vos John de
Wörheide Gert
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 01/01/2013
Field of study

Open Access LMU

Digital Repository of Hellenic Managing Authority of the Operational Programme "Education and Lifelong Learning" (EDULLL)

Evaluation of linear ozone photochemistry parametrizations in a stratosphere-troposphere data assimilation system

Author: A. J. Geer
A. J. Geer
D. Cariolle
D. R. Jackson
J. P. McCormack
W. A. Lahoz
Publication venue: Copernicus Publications
Publication date: 01/01/2007
Field of study

This paper evaluates the performance of various linear ozone photochemistry parametrizations using the stratosphere-troposphere data assimilation system of the Met Office. A set of experiments were run for the period 23 September 2003 to 5 November 2003 using the Cariolle (v1.0 and v2.1), LINOZ and Chem2D-OPP (v0.1 and v2.1) parametrizations. All operational meteorological observations were assimilated, together with ozone retrievals from the Michelson Interferometer for Passive Atmospheric Sounding (MIPAS). Experiments were validated against independent data from the Halogen Occultation Experiment (HALOE) and ozonesondes. Additionally, a simple offline method for comparing the parametrizations is introduced. <br><br> It is shown that in the upper stratosphere and mesosphere, outside the polar night, ozone analyses are controlled by the photochemistry parametrizations and not by the assimilated observations. The most important factor in getting good results at these levels is to pay attention to the ozone and temperature climatologies in the parametrizations. There should be no discrepancies between the climatologies and the assimilated observations or the model, but there is also a competing demand that the climatologies be objectively accurate in themselves. Conversely, in the lower stratosphere outside regions of heterogeneous ozone depletion, the ozone analyses are dominated by observational increments and the photochemistry parametrizations have little influence. <br><br> We investigate a number of known problems in LINOZ and Cariolle v1.0 in more detail than previously, and we find discrepancies in Cariolle v2.1 and Chem2D-OPP v2.1, which are demonstrated to have been removed in the latest available versions (v2.8 and v2.6 respectively). In general, however, all the parametrizations work well through much of the stratosphere, helped by the presence of good quality assimilated MIPAS observations

Directory of Open Access Journals

Performance of the MIND detector at a Neutrino Factory using realistic muon reconstruction

Author: A. Cervera
A. Laing
Abe
Adamson
Adamson
Barger
Bari
Brun
Burguet-Castell
Cervera
Cervera
Cervera-Villanueva
Cervera-Villanueva
Cline
De Rujula
Donini
Dziewonski
F.J.P. Soler
Geer
Geer
Ingelman
J. Martín-Albo
Kayis-Topaksu
Maki
Michael
Pontecorvo
Pontecorvo
Schwetz
Publication venue: 'Elsevier BV'
Publication date: 01/01/2010
Field of study

A Neutrino Factory producing an intense beam composed of nu_e(nubar_e) and nubar_mu(nu_mu) from muon decays has been shown to have the greatest sensitivity to the two currently unmeasured neutrino mixing parameters, theta_13 and delta_CP . Using the `wrong-sign muon' signal to measure nu_e to nu_mu(nubar_e to nubar_mu) oscillations in a 50 ktonne Magnetised Iron Neutrino Detector (MIND) sensitivity to delta_CP could be maintained down to small values of theta_13. However, the detector efficiencies used in previous studies were calculated assuming perfect pattern recognition. In this paper, MIND is re-assessed taking into account, for the first time, a realistic pattern recognition for the muon candidate. Reoptimisation of the analysis utilises a combination of methods, including a multivariate analysis similar to the one used in MINOS, to maintain high efficiency while suppressing backgrounds, ensuring that the signal selection efficiency and the background levels are comparable or better than the ones in previous analyses

arXiv.org e-Print Archive

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositori d'Objectes Digitals per a l'Ensenyament la Recerca i la Cultura

Enlighten

Digital.CSIC