914 research outputs found

    Beyond Support in Two-Stage Variable Selection

    Full text link
    Numerous variable selection methods rely on a two-stage procedure, where a sparsity-inducing penalty is used in the first stage to predict the support, which is then conveyed to the second stage for estimation or inference purposes. In this framework, the first stage screens variables to find a set of possibly relevant variables and the second stage operates on this set of candidate variables, to improve estimation accuracy or to assess the uncertainty associated to the selection of variables. We advocate that more information can be conveyed from the first stage to the second one: we use the magnitude of the coefficients estimated in the first stage to define an adaptive penalty that is applied at the second stage. We give two examples of procedures that can benefit from the proposed transfer of information, in estimation and inference problems respectively. Extensive simulations demonstrate that this transfer is particularly efficient when each stage operates on distinct subsamples. This separation plays a crucial role for the computation of calibrated p-values, allowing to control the False Discovery Rate. In this setup, the proposed transfer results in sensitivity gains ranging from 50% to 100% compared to state-of-the-art

    Banking the unbanked: the Mzansi intervention in South Africa:

    Get PDF
    Purpose This paper aims to understand household’s latent behaviour decision making in accessing financial services. In this analysis we look at the determinants of the choice of the pre-entry Mzansi account by consumers in South Africa. Design/methodology/approach We use 102 variables, grouped in the following categories: basic literacy, understanding financial terms, targets for financial advice, desired financial education and financial perception. Employing a computationally efficient variable selection algorithm we study which variables can satisfactorily explain the choice of a Mzansi account. Findings The Mzansi intervention is appealing to individuals with basic but insufficient financial education. Aspirations seem to be very influential in revealing the choice of financial services and to this end Mzansi is perceived as a pre-entry account not meeting the aspirations of individuals aiming to climb up the financial services ladder. We find that Mzansi holders view the account mainly as a vehicle for receiving payments, but on the other hand are debt-averse and inclined to save. Hence although there is at present no concrete evidence that the Mzansi intervention increases access to finance via diversification (i.e. by recruiting customers into higher level accounts and services) our analysis shows that this is very likely to be the case. Originality/value The issue of demand side constraints on access to finance have been largely ignored in the theoretical and empirical literature. This paper undertakes some preliminary steps in addressing this gap

    Sparsity with sign-coherent groups of variables via the cooperative-Lasso

    Full text link
    We consider the problems of estimation and selection of parameters endowed with a known group structure, when the groups are assumed to be sign-coherent, that is, gathering either nonnegative, nonpositive or null parameters. To tackle this problem, we propose the cooperative-Lasso penalty. We derive the optimality conditions defining the cooperative-Lasso estimate for generalized linear models, and propose an efficient active set algorithm suited to high-dimensional problems. We study the asymptotic consistency of the estimator in the linear regression setup and derive its irrepresentable conditions, which are milder than the ones of the group-Lasso regarding the matching of groups with the sparsity pattern of the true parameters. We also address the problem of model selection in linear regression by deriving an approximation of the degrees of freedom of the cooperative-Lasso estimator. Simulations comparing the proposed estimator to the group and sparse group-Lasso comply with our theoretical results, showing consistent improvements in support recovery for sign-coherent groups. We finally propose two examples illustrating the wide applicability of the cooperative-Lasso: first to the processing of ordinal variables, where the penalty acts as a monotonicity prior; second to the processing of genomic data, where the set of differentially expressed probes is enriched by incorporating all the probes of the microarray that are related to the corresponding genes.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS520 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    A General Framework for Fast Stagewise Algorithms

    Full text link
    Forward stagewise regression follows a very simple strategy for constructing a sequence of sparse regression estimates: it starts with all coefficients equal to zero, and iteratively updates the coefficient (by a small amount ϵ\epsilon) of the variable that achieves the maximal absolute inner product with the current residual. This procedure has an interesting connection to the lasso: under some conditions, it is known that the sequence of forward stagewise estimates exactly coincides with the lasso path, as the step size ϵ\epsilon goes to zero. Furthermore, essentially the same equivalence holds outside of least squares regression, with the minimization of a differentiable convex loss function subject to an 1\ell_1 norm constraint (the stagewise algorithm now updates the coefficient corresponding to the maximal absolute component of the gradient). Even when they do not match their 1\ell_1-constrained analogues, stagewise estimates provide a useful approximation, and are computationally appealing. Their success in sparse modeling motivates the question: can a simple, effective strategy like forward stagewise be applied more broadly in other regularization settings, beyond the 1\ell_1 norm and sparsity? The current paper is an attempt to do just this. We present a general framework for stagewise estimation, which yields fast algorithms for problems such as group-structured learning, matrix completion, image denoising, and more.Comment: 56 pages, 15 figure

    Time-Varying Parameters as Ridge Regressions

    Full text link
    Time-varying parameters (TVPs) models are frequently used in economics to model structural change. I show that they are in fact ridge regressions. Instantly, this makes computations, tuning, and implementation much easier than in the state-space paradigm. Among other things, solving the equivalent dual ridge problem is computationally very fast even in high dimensions, and the crucial "amount of time variation" is tuned by cross-validation. Evolving volatility is dealt with using a two-step ridge regression. I consider extensions that incorporate sparsity (the algorithm selects which parameters vary and which do not) and reduced-rank restrictions (variation is tied to a factor model). To demonstrate the usefulness of the approach, I use it to study the evolution of monetary policy in Canada. The application requires the estimation of about 4600 TVPs, a task well within the reach of the new method

    Recovering edges in ill-posed inverse problems: optimality of curvelet frames

    Get PDF
    We consider a model problem of recovering a function f(x1,x2)f(x_1,x_2) from noisy Radon data. The function ff to be recovered is assumed smooth apart from a discontinuity along a C2C^2 curve, that is, an edge. We use the continuum white-noise model, with noise level ε\varepsilon. Traditional linear methods for solving such inverse problems behave poorly in the presence of edges. Qualitatively, the reconstructions are blurred near the edges; quantitatively, they give in our model mean squared errors (MSEs) that tend to zero with noise level ε\varepsilon only as O(ε1/2)O(\varepsilon^{1/2}) as ε0\varepsilon\to 0. A recent innovation--nonlinear shrinkage in the wavelet domain--visually improves edge sharpness and improves MSE convergence to O(ε2/3)O(\varepsilon^{2/3}). However, as we show here, this rate is not optimal. In fact, essentially optimal performance is obtained by deploying the recently-introduced tight frames of curvelets in this setting. Curvelets are smooth, highly anisotropic elements ideally suited for detecting and synthesizing curved edges. To deploy them in the Radon setting, we construct a curvelet-based biorthogonal decomposition of the Radon operator and build "curvelet shrinkage" estimators based on thresholding of the noisy curvelet coefficients. In effect, the estimator detects edges at certain locations and orientations in the Radon domain and automatically synthesizes edges at corresponding locations and directions in the original domain. We prove that the curvelet shrinkage can be tuned so that the estimator will attain, within logarithmic factors, the MSE O(ε4/5)O(\varepsilon^{4/5}) as noise level ε0\varepsilon\to 0. This rate of convergence holds uniformly over a class of functions which are C2C^2 except for discontinuities along C2C^2 curves, and (except for log terms) is the minimax rate for that class. Our approach is an instance of a general strategy which should apply in other inverse problems; we sketch a deconvolution example
    corecore