Search CORE

1,897 research outputs found

lassopack: Model selection and prediction with regularized regression in Stata

Author: Athey S.
Huang J.
Shao J.
Tikhonov A. N.
Van der Kooij A.
Yang Y.
Zhao P.
Publication venue
Publication date: 16/01/2019
Field of study

This article introduces lassopack, a suite of programs for regularized regression in Stata. lassopack implements lasso, square-root lasso, elastic net, ridge regression, adaptive lasso and post-estimation OLS. The methods are suitable for the high-dimensional setting where the number of predictors

p

may be large and possibly greater than the number of observations,

n

. We offer three different approaches for selecting the penalization (`tuning') parameters: information criteria (implemented in lasso2),

K

-fold cross-validation and

h

-step ahead rolling cross-validation for cross-section, panel and time-series data (cvlasso), and theory-driven (`rigorous') penalization for the lasso and square-root lasso for cross-section and panel data (rlasso). We discuss the theoretical framework and practical considerations for each approach. We also present Monte Carlo results to compare the performance of the penalization approaches.Comment: 52 pages, 6 figures, 6 tables; submitted to Stata Journal; for more information see https://statalasso.github.io

arXiv.org e-Print Archive

Heriot Watt Pure

Crossref

Stellar Content from high resolution galactic spectra via Maximum A Posteriori

Author: A. Lançon
Bagnulo
Bruzual
C. Pichon
Charlot
Charlot
Cid Fernandes
Craig
Dopita
E. Thiébaut
Feltzing
Golub
Gonzalez Delgado
Gratton
Hansen
Hansen
Hanson
Heavens
Henault
Katz
Kochanek
Kroupa
Le Borgne
Lejeune
Mateu
Merritt
Moultaka
Moultaka
P. Ocvirk
Panter
Pichon
Pichon
Prochaska
Prugniel
Puech
Reichardt
Saha
Thiébaut
Thiébaut
Thomas
Titterington
Varah
Vergely
Worthey
Publication venue: 'Wiley'
Publication date: 10/05/2005
Field of study

This paper describes STECMAP (STEllar Content via Maximum A Posteriori), a flexible, non-parametric inversion method for the interpretation of the integrated light spectra of galaxies, based on synthetic spectra of single stellar populations (SSPs). We focus on the recovery of a galaxy's star formation history and stellar age-metallicity relation. We use the high resolution SSPs produced by PEGASE-HR to quantify the informational content of the wavelength range 4000 - 6800 Angstroms. A detailed investigation of the properties of the corresponding simplified linear problem is performed using singular value decomposition. It turns out to be a powerful tool for explaining and predicting the behaviour of the inversion. We provide means of quantifying the fundamental limitations of the problem considering the intrinsic properties of the SSPs in the spectral range of interest, as well as the noise in these models and in the data. We performed a systematic simulation campaign and found that, when the time elapsed between two bursts of star formation is larger than 0.8 dex, the properties of each episode can be constrained with a precision of 0.04 dex in age and 0.02 dex in metallicity from high quality data (R=10 000, signal-to-noise ratio SNR=100 per pixel), not taking model errors into account. The described methods and error estimates will be useful in the design and in the analysis of extragalactic spectroscopic surveys.Comment: 31 pages, 23 figures, accepted for publication in MNRA

arXiv.org e-Print Archive

Choosing a penalty for model selection in heteroscedastic regression

Author: Arlot Sylvain
Publication venue
Publication date: 03/06/2010
Field of study

We consider the problem of choosing between several models in least-squares regression with heteroscedastic data. We prove that any penalization procedure is suboptimal when the penalty is a function of the dimension of the model, at least for some typical heteroscedastic model selection problems. In particular, Mallows' Cp is suboptimal in this framework. On the contrary, optimal model selection is possible with data-driven penalties such as resampling or

V

-fold penalties. Therefore, it is worth estimating the shape of the penalty from data, even at the price of a higher computational cost. Simulation experiments illustrate the existence of a trade-off between statistical accuracy and computational complexity. As a conclusion, we sketch some rules for choosing a penalty in least-squares regression, depending on what is known about possible variations of the noise-level

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Localized Regression

Author: Binder Harald
Tutz Gerhard
Publication venue
Publication date: 01/01/2004
Field of study

The main problem with localized discriminant techniques is the curse of dimensionality, which seems to restrict their use to the case of few variables. This restriction does not hold if localization is combined with a reduction of dimension. In particular it is shown that localization yields powerful classifiers even in higher dimensions if localization is combined with locally adaptive selection of predictors. A robust localized logistic regression (LLR) method is developed for which all tuning parameters are chosen data¡adaptively. In an extended simulation study we evaluate the potential of the proposed procedure for various types of data and compare it to other classification procedures. In addition we demonstrate that automatic choice of localization, predictor selection and penalty parameters based on cross validation is working well. Finally the method is applied to real data sets and its real world performance is compared to alternative procedures

Open Access LMU

Variable Selection in General Multinomial Logit Models

Author: Pößnecker Wolfgang
Tutz Gerhard
Uhlmann Lorenz
Publication venue
Publication date: 21/06/2012
Field of study

The use of the multinomial logit model is typically restricted to applications with few predictors, because in high-dimensional settings maximum likelihood estimates tend to deteriorate. In this paper we are proposing a sparsity-inducing penalty that accounts for the special structure of multinomial models. In contrast to existing methods, it penalizes the parameters that are linked to one variable in a grouped way and thus yields variable selection instead of parameter selection. We develop a proximal gradient method that is able to efficiently compute stable estimates. In addition, the penalization is extended to the important case of predictors that vary across response categories. We apply our estimator to the modeling of party choice of voters in Germany including voter-specific variables like age and gender but also party-specific features like stance on nuclear energy and immigration

Open Access LMU

Smoothing $\ell_1$ -penalized estimators for high-dimensional time-course data

Author: Bühlmann Peter
Meier Lukas
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2007
Field of study

When a series of (related) linear models has to be estimated it is often appropriate to combine the different data-sets to construct more efficient estimators. We use

\ell_1

-penalized estimators like the Lasso or the Adaptive Lasso which can simultaneously do parameter estimation and model selection. We show that for a time-course of high-dimensional linear models the convergence rates of the Lasso and of the Adaptive Lasso can be improved by combining the different time-points in a suitable way. Moreover, the Adaptive Lasso still enjoys oracle properties and consistent variable selection. The finite sample properties of the proposed methods are illustrated on simulated data and on a real problem of motif finding in DNA sequences.Comment: Published in at http://dx.doi.org/10.1214/07-EJS103 the Electronic Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref