914 research outputs found
Beyond Support in Two-Stage Variable Selection
Numerous variable selection methods rely on a two-stage procedure, where a
sparsity-inducing penalty is used in the first stage to predict the support,
which is then conveyed to the second stage for estimation or inference
purposes. In this framework, the first stage screens variables to find a set of
possibly relevant variables and the second stage operates on this set of
candidate variables, to improve estimation accuracy or to assess the
uncertainty associated to the selection of variables. We advocate that more
information can be conveyed from the first stage to the second one: we use the
magnitude of the coefficients estimated in the first stage to define an
adaptive penalty that is applied at the second stage. We give two examples of
procedures that can benefit from the proposed transfer of information, in
estimation and inference problems respectively. Extensive simulations
demonstrate that this transfer is particularly efficient when each stage
operates on distinct subsamples. This separation plays a crucial role for the
computation of calibrated p-values, allowing to control the False Discovery
Rate. In this setup, the proposed transfer results in sensitivity gains ranging
from 50% to 100% compared to state-of-the-art
Banking the unbanked: the Mzansi intervention in South Africa:
Purpose
This paper aims to understand household’s latent behaviour decision making in accessing financial services. In this analysis we look at the determinants of the choice of the pre-entry Mzansi account by consumers in South Africa.
Design/methodology/approach
We use 102 variables, grouped in the following categories: basic literacy, understanding financial terms, targets for financial advice, desired financial education and financial perception. Employing a computationally efficient variable selection algorithm we study which variables can satisfactorily explain the choice of a Mzansi account.
Findings
The Mzansi intervention is appealing to individuals with basic but insufficient financial education. Aspirations seem to be very influential in revealing the choice of financial services and to this end Mzansi is perceived as a pre-entry account not meeting the aspirations of individuals aiming to climb up the financial services ladder. We find that Mzansi holders view the account mainly as a vehicle for receiving payments, but on the other hand are debt-averse and inclined to save. Hence although there is at present no concrete evidence that the Mzansi intervention increases access to finance via diversification (i.e. by recruiting customers into higher level accounts and services) our analysis shows that this is very likely to be the case.
Originality/value
The issue of demand side constraints on access to finance have been largely ignored in the theoretical and empirical literature. This paper undertakes some preliminary steps in addressing this gap
Sparsity with sign-coherent groups of variables via the cooperative-Lasso
We consider the problems of estimation and selection of parameters endowed
with a known group structure, when the groups are assumed to be sign-coherent,
that is, gathering either nonnegative, nonpositive or null parameters. To
tackle this problem, we propose the cooperative-Lasso penalty. We derive the
optimality conditions defining the cooperative-Lasso estimate for generalized
linear models, and propose an efficient active set algorithm suited to
high-dimensional problems. We study the asymptotic consistency of the estimator
in the linear regression setup and derive its irrepresentable conditions, which
are milder than the ones of the group-Lasso regarding the matching of groups
with the sparsity pattern of the true parameters. We also address the problem
of model selection in linear regression by deriving an approximation of the
degrees of freedom of the cooperative-Lasso estimator. Simulations comparing
the proposed estimator to the group and sparse group-Lasso comply with our
theoretical results, showing consistent improvements in support recovery for
sign-coherent groups. We finally propose two examples illustrating the wide
applicability of the cooperative-Lasso: first to the processing of ordinal
variables, where the penalty acts as a monotonicity prior; second to the
processing of genomic data, where the set of differentially expressed probes is
enriched by incorporating all the probes of the microarray that are related to
the corresponding genes.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS520 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
A General Framework for Fast Stagewise Algorithms
Forward stagewise regression follows a very simple strategy for constructing
a sequence of sparse regression estimates: it starts with all coefficients
equal to zero, and iteratively updates the coefficient (by a small amount
) of the variable that achieves the maximal absolute inner product
with the current residual. This procedure has an interesting connection to the
lasso: under some conditions, it is known that the sequence of forward
stagewise estimates exactly coincides with the lasso path, as the step size
goes to zero. Furthermore, essentially the same equivalence holds
outside of least squares regression, with the minimization of a differentiable
convex loss function subject to an norm constraint (the stagewise
algorithm now updates the coefficient corresponding to the maximal absolute
component of the gradient).
Even when they do not match their -constrained analogues, stagewise
estimates provide a useful approximation, and are computationally appealing.
Their success in sparse modeling motivates the question: can a simple,
effective strategy like forward stagewise be applied more broadly in other
regularization settings, beyond the norm and sparsity? The current
paper is an attempt to do just this. We present a general framework for
stagewise estimation, which yields fast algorithms for problems such as
group-structured learning, matrix completion, image denoising, and more.Comment: 56 pages, 15 figure
Time-Varying Parameters as Ridge Regressions
Time-varying parameters (TVPs) models are frequently used in economics to
model structural change. I show that they are in fact ridge regressions.
Instantly, this makes computations, tuning, and implementation much easier than
in the state-space paradigm. Among other things, solving the equivalent dual
ridge problem is computationally very fast even in high dimensions, and the
crucial "amount of time variation" is tuned by cross-validation. Evolving
volatility is dealt with using a two-step ridge regression. I consider
extensions that incorporate sparsity (the algorithm selects which parameters
vary and which do not) and reduced-rank restrictions (variation is tied to a
factor model). To demonstrate the usefulness of the approach, I use it to study
the evolution of monetary policy in Canada. The application requires the
estimation of about 4600 TVPs, a task well within the reach of the new method
Recovering edges in ill-posed inverse problems: optimality of curvelet frames
We consider a model problem of recovering a function from noisy Radon data. The function to be recovered is assumed smooth apart from a discontinuity along a curve, that is, an edge. We use the continuum white-noise model, with noise level .
Traditional linear methods for solving such inverse problems behave poorly in the presence of edges. Qualitatively, the reconstructions are blurred near the edges; quantitatively, they give in our model mean squared errors (MSEs) that tend to zero with noise level only as as . A recent innovation--nonlinear shrinkage in the wavelet domain--visually improves edge sharpness and improves MSE convergence to . However, as we show here, this rate is not optimal.
In fact, essentially optimal performance is obtained by deploying the recently-introduced tight frames of curvelets in this setting. Curvelets are smooth, highly anisotropic elements ideally suited for detecting and synthesizing curved edges. To deploy them in the Radon setting, we construct a curvelet-based biorthogonal decomposition of the Radon operator and build "curvelet shrinkage" estimators based on thresholding of the noisy curvelet coefficients. In effect, the estimator detects edges at certain locations and orientations in the Radon domain and automatically synthesizes edges at corresponding locations and directions in the original domain.
We prove that the curvelet shrinkage can be tuned so that the estimator will attain, within logarithmic factors, the MSE as noise level . This rate of convergence holds uniformly over a class of functions which are except for discontinuities along curves, and (except for log terms) is the minimax rate for that class. Our approach is an instance of a general strategy which should apply in other inverse problems; we sketch a deconvolution example
- …