1,891 research outputs found
A Selective Review of Group Selection in High-Dimensional Models
Grouping structures arise naturally in many statistical modeling problems.
Several methods have been proposed for variable selection that respect grouping
structure in variables. Examples include the group LASSO and several concave
group selection methods. In this article, we give a selective review of group
selection concerning methodological developments, theoretical properties and
computational algorithms. We pay particular attention to group selection
methods involving concave penalties. We address both group selection and
bi-level selection methods. We describe several applications of these methods
in nonparametric additive models, semiparametric regression, seemingly
unrelated regressions, genomic data analysis and genome wide association
studies. We also highlight some issues that require further study.Comment: Published in at http://dx.doi.org/10.1214/12-STS392 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Smoothing -penalized estimators for high-dimensional time-course data
When a series of (related) linear models has to be estimated it is often
appropriate to combine the different data-sets to construct more efficient
estimators. We use -penalized estimators like the Lasso or the Adaptive
Lasso which can simultaneously do parameter estimation and model selection. We
show that for a time-course of high-dimensional linear models the convergence
rates of the Lasso and of the Adaptive Lasso can be improved by combining the
different time-points in a suitable way. Moreover, the Adaptive Lasso still
enjoys oracle properties and consistent variable selection. The finite sample
properties of the proposed methods are illustrated on simulated data and on a
real problem of motif finding in DNA sequences.Comment: Published in at http://dx.doi.org/10.1214/07-EJS103 the Electronic
Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Optimal Inference in Crowdsourced Classification via Belief Propagation
Crowdsourcing systems are popular for solving large-scale labelling tasks
with low-paid workers. We study the problem of recovering the true labels from
the possibly erroneous crowdsourced labels under the popular Dawid-Skene model.
To address this inference problem, several algorithms have recently been
proposed, but the best known guarantee is still significantly larger than the
fundamental limit. We close this gap by introducing a tighter lower bound on
the fundamental limit and proving that Belief Propagation (BP) exactly matches
this lower bound. The guaranteed optimality of BP is the strongest in the sense
that it is information-theoretically impossible for any other algorithm to
correctly label a larger fraction of the tasks. Experimental results suggest
that BP is close to optimal for all regimes considered and improves upon
competing state-of-the-art algorithms.Comment: This article is partially based on preliminary results published in
the proceeding of the 33rd International Conference on Machine Learning (ICML
2016
Pac-bayesian bounds for sparse regression estimation with exponential weights
We consider the sparse regression model where the number of parameters is
larger than the sample size . The difficulty when considering
high-dimensional problems is to propose estimators achieving a good compromise
between statistical and computational performances. The BIC estimator for
instance performs well from the statistical point of view \cite{BTW07} but can
only be computed for values of of at most a few tens. The Lasso estimator
is solution of a convex minimization problem, hence computable for large value
of . However stringent conditions on the design are required to establish
fast rates of convergence for this estimator. Dalalyan and Tsybakov
\cite{arnak} propose a method achieving a good compromise between the
statistical and computational aspects of the problem. Their estimator can be
computed for reasonably large and satisfies nice statistical properties
under weak assumptions on the design. However, \cite{arnak} proposes sparsity
oracle inequalities in expectation for the empirical excess risk only. In this
paper, we propose an aggregation procedure similar to that of \cite{arnak} but
with improved statistical performances. Our main theoretical result is a
sparsity oracle inequality in probability for the true excess risk for a
version of exponential weight estimator. We also propose a MCMC method to
compute our estimator for reasonably large values of .Comment: 19 page
A simple forward selection procedure based on false discovery rate control
We propose the use of a new false discovery rate (FDR) controlling procedure
as a model selection penalized method, and compare its performance to that of
other penalized methods over a wide range of realistic settings: nonorthogonal
design matrices, moderate and large pool of explanatory variables, and both
sparse and nonsparse models, in the sense that they may include a small and
large fraction of the potential variables (and even all). The comparison is
done by a comprehensive simulation study, using a quantitative framework for
performance comparisons in the form of empirical minimaxity relative to a
"random oracle": the oracle model selection performance on data dependent
forward selected family of potential models. We show that FDR based procedures
have good performance, and in particular the newly proposed method, emerges as
having empirical minimax performance. Interestingly, using FDR level of 0.05 is
a global best.Comment: Published in at http://dx.doi.org/10.1214/08-AOAS194 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Banking the unbanked: the Mzansi intervention in South Africa:
Purpose
This paper aims to understand household’s latent behaviour decision making in accessing financial services. In this analysis we look at the determinants of the choice of the pre-entry Mzansi account by consumers in South Africa.
Design/methodology/approach
We use 102 variables, grouped in the following categories: basic literacy, understanding financial terms, targets for financial advice, desired financial education and financial perception. Employing a computationally efficient variable selection algorithm we study which variables can satisfactorily explain the choice of a Mzansi account.
Findings
The Mzansi intervention is appealing to individuals with basic but insufficient financial education. Aspirations seem to be very influential in revealing the choice of financial services and to this end Mzansi is perceived as a pre-entry account not meeting the aspirations of individuals aiming to climb up the financial services ladder. We find that Mzansi holders view the account mainly as a vehicle for receiving payments, but on the other hand are debt-averse and inclined to save. Hence although there is at present no concrete evidence that the Mzansi intervention increases access to finance via diversification (i.e. by recruiting customers into higher level accounts and services) our analysis shows that this is very likely to be the case.
Originality/value
The issue of demand side constraints on access to finance have been largely ignored in the theoretical and empirical literature. This paper undertakes some preliminary steps in addressing this gap
- …