Search CORE

3,091 research outputs found

A Permutation Approach for Selecting the Penalty Parameter in Penalized Model Selection

Author: Nobel Andrew
Sabourin Jeremy
Valdar William
Publication venue
Publication date: 08/04/2014
Field of study

We describe a simple, efficient, permutation based procedure for selecting the penalty parameter in the LASSO. The procedure, which is intended for applications where variable selection is the primary focus, can be applied in a variety of structural settings, including generalized linear models. We briefly discuss connections between permutation selection and existing theory for the LASSO. In addition, we present a simulation study and an analysis of three real data sets in which permutation selection is compared with cross-validation (CV), the Bayesian information criterion (BIC), and a selection method based on recently developed testing procedures for the LASSO

arXiv.org e-Print Archive

CiteSeerX

A Penalty Approach to Differential Item Functioning in Rasch Models

Author: Schauberger Gunther
Tutz Gerhard
Publication venue
Publication date: 07/12/2012
Field of study

A new diagnostic tool for the identification of differential item functioning (DIF) is proposed. Classical approaches to DIF allow to consider only few subpopulations like ethnic groups when investigating if the solution of items depends on the membership to a subpopulation. We propose an explicit model for differential item functioning that includes a set of variables, containing metric as well as categorical components, as potential candidates for inducing DIF. The ability to include a set of covariates entails that the model contains a large number of parameters. Regularized estimators, in particular penalized maximum likelihood estimators, are used to solve the estimation problem and to identify the items that induce DIF. It is shown that the method is able to detect items with DIF. Simulations and two applications demonstrate the applicability of the method

Open Access LMU

Doubly Robust Inference when Combining Probability and Non-probability Samples with High-dimensional Data

Author: Bang
Berger
Berger
Bethlehem
Breidt
Brewer
Brookhart
Buchanan
Cao
Chen
Chen
Chen
Chernozhukov
Chipperfield
Conti
De
Deville
DiSogra
Elliott
Fan
Fan
Farrell
Friedman
Fuller
Gao
Grafström
Han
Hunter
Hájek
Johnson
Kang
Keiding
Kim
Kim
Kott
Kott
Lee
McConville
Meng
O’Muircheartaigh
Patrick
Rivers
Rosenbaum
Shao
Shortreed
Stuart
Stuart
Tillé
Tsiatis
Valliant
Publication venue
Publication date: 23/08/2019
Field of study

Non-probability samples become increasingly popular in survey statistics but may suffer from selection biases that limit the generalizability of results to the target population. We consider integrating a non-probability sample with a probability sample which provides high-dimensional representative covariate information of the target population. We propose a two-step approach for variable selection and finite population inference. In the first step, we use penalized estimating equations with folded-concave penalties to select important variables for the sampling score of selection into the non-probability sample and the outcome model. We show that the penalized estimating equation approach enjoys the selection consistency property for general probability samples. The major technical hurdle is due to the possible dependence of the sample under the finite population framework. To overcome this challenge, we construct martingales which enable us to apply Bernstein concentration inequality for martingales. In the second step, we focus on a doubly robust estimator of the finite population mean and re-estimate the nuisance model parameters by minimizing the asymptotic squared bias of the doubly robust estimator. This estimating strategy mitigates the possible first-step selection error and renders the doubly robust estimator root-n consistent if either the sampling probability or the outcome model is correctly specified

arXiv.org e-Print Archive

Digital Repository @ Iowa State University (ISU)

Crossref

Regularization in regression: comparing Bayesian and frequentist methods in a poorly informative situation

Author: Anbari Mohammed El
Celeux Gilles
Marin Jean-Michel
Robert Christian P.
Publication venue
Publication date: 15/11/2011
Field of study

Using a collection of simulated an real benchmarks, we compare Bayesian and frequentist regularization approaches under a low informative constraint when the number of variables is almost equal to the number of observations on simulated and real datasets. This comparison includes new global noninformative approaches for Bayesian variable selection built on Zellner's g-priors that are similar to Liang et al. (2008). The interest of those calibration-free proposals is discussed. The numerical experiments we present highlight the appeal of Bayesian regularization methods, when compared with non-Bayesian alternatives. They dominate frequentist methods in the sense that they provide smaller prediction errors while selecting the most relevant variables in a parsimonious way

arXiv.org e-Print Archive

Base de publications de l'université Paris-Dauphine

Crossref

INRIA a CCSD electronic archive server

Sparse regulatory networks

Author: James Gareth M.
Sabatti Chiara
Zhou Nengfeng
Zhu Ji
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 24/05/2010
Field of study

In many organisms the expression levels of each gene are controlled by the activation levels of known "Transcription Factors" (TF). A problem of considerable interest is that of estimating the "Transcription Regulation Networks" (TRN) relating the TFs and genes. While the expression levels of genes can be observed, the activation levels of the corresponding TFs are usually unknown, greatly increasing the difficulty of the problem. Based on previous experimental work, it is often the case that partial information about the TRN is available. For example, certain TFs may be known to regulate a given gene or in other cases a connection may be predicted with a certain probability. In general, the biology of the problem indicates there will be very few connections between TFs and genes. Several methods have been proposed for estimating TRNs. However, they all suffer from problems such as unrealistic assumptions about prior knowledge of the network structure or computational limitations. We propose a new approach that can directly utilize prior information about the network structure in conjunction with observed gene expression data to estimate the TRN. Our approach uses

L_1

penalties on the network to ensure a sparse structure. This has the advantage of being computationally efficient as well as making many fewer assumptions about the network structure. We use our methodology to construct the TRN for E. coli and show that the estimate is biologically sensible and compares favorably with previous estimates.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS350 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref