Search CORE

249 research outputs found

Banking the unbanked: the Mzansi intervention in South Africa:

Author: Annim Samuel Kobina
Arun Thankom Gopinath
Kostov Phillip
Publication venue: 'Emerald'
Publication date: 01/01/2014
Field of study

Purpose This paper aims to understand household’s latent behaviour decision making in accessing financial services. In this analysis we look at the determinants of the choice of the pre-entry Mzansi account by consumers in South Africa. Design/methodology/approach We use 102 variables, grouped in the following categories: basic literacy, understanding financial terms, targets for financial advice, desired financial education and financial perception. Employing a computationally efficient variable selection algorithm we study which variables can satisfactorily explain the choice of a Mzansi account. Findings The Mzansi intervention is appealing to individuals with basic but insufficient financial education. Aspirations seem to be very influential in revealing the choice of financial services and to this end Mzansi is perceived as a pre-entry account not meeting the aspirations of individuals aiming to climb up the financial services ladder. We find that Mzansi holders view the account mainly as a vehicle for receiving payments, but on the other hand are debt-averse and inclined to save. Hence although there is at present no concrete evidence that the Mzansi intervention increases access to finance via diversification (i.e. by recruiting customers into higher level accounts and services) our analysis shows that this is very likely to be the case. Originality/value The issue of demand side constraints on access to finance have been largely ignored in the theoretical and empirical literature. This paper undertakes some preliminary steps in addressing this gap

University of Essex Research Repository

CLoK

Crossref

Recommended from our members

Computationally Efficient Methods for High-Dimensional Statistical Problems

Author: Stokell Benjamin
Publication venue: University of Cambridge
Publication date: 07/07/2021
Field of study

With the ever-increasing amount of computational power available, so broadens the horizon of statistical problems that can be tackled. However, many practitioners have only an ordinary personal computer on which to do their work. The need for computationally efficient methodology is as pressing as ever, and there remain some questions as-yet without a confident answer for a practitioner working with tight computational constraints. This thesis develops methods for three such problems. The first, introductory, chapter provides an overview of the area and an accessible preamble to the problems these methods address. In the second chapter we address the problem of modelling a high-dimensional linear regression with categorical predictor variables. The natural sparsity assumption in this setting is on the number of unique values the coefficients within each categorical variable can take. With this assumption, we introduce a new form of penalty function for tackling this problem. While the number of combinations of levels can grow extremely fast in the number of levels, the unique structure of the method enables fast optimisation for this problem. A novel and intricate dynamic programming algorithm computes the exact global optimum over each variable, and is embedded within a block coordinate descent algorithm. This allows fitting of such models quickly on a laptop computer in a memory efficient manner. The scaling requirements sufficient for this method to recover the correct groups cannot be relaxed for any estimator; this strong performance is validated by a range of experiments using both simulated and real data. In the third chapter we explore the possibility that a practitioner has some a priori belief to which variables are most likely to be important, which will be in the form of a permutation of the columns. Our approach takes this ordering and efficiently computes a grid of solution paths by sequentially removing groups of variables without unnecessary recomputation of coefficients. Typical examples of such orderings include the column norms in the (unscaled) design matrix, or the recentness of observations in time series data. This procedure, combined with selecting the size of support set by validation on a test set, has similar performance to that of fitting the oracular submodel. The fourth chapter concerns the efficient estimation of conditional independence graphs in Gaussian graphical models. Neighbourhood selection is practical, popular, and enjoys good performance, but in large-scale settings it can still have computational demands exceeding the resources available to many practitioners. Screening approaches promise large improvements in speed with only a small price to pay in terms of resulting estimation performance. Although it is well-known that nodes adjacent in the conditional independence graph may be uncorrelated, a minimum absolute correlation between adjacent nodes is often tacitly or explicitly assumed in order for screening procedures to be effective. We make use of recent work in covariance estimation and high-dimensional screening of variables to develop a fast, two-stage, screening procedure specifically for use within neighbourhood selection and avoiding this restrictive assumption. Provided that a weaker version of a minimum edge strength requirement holds over most of the graph, the performance of the post-screening nodewise regressions is not compromised, while being substantially faster than the full procedure. This method is robust to the presence of latent confounders, as well as other scenarios that typically impede the screening of variables. Experiments show that our approach strikes a favourable balance between edge detection and computational efficiencyCantab Capital Institute for the Mathematics of Informatio

Apollo (Cambridge)

Sparse reduced-rank regression for imaging genetics studies: models and applications

Author: Vounou Maria
Vounou Maria
Publication venue: Mathematics, Imperial College London
Publication date: 01/02/2012
Field of study

We present a novel statistical technique; the sparse reduced rank regression (sRRR) model which is a strategy for multivariate modelling of high-dimensional imaging responses and genetic predictors. By adopting penalisation techniques, the model is able to enforce sparsity in the regression coefficients, identifying subsets of genetic markers that best explain the variability observed in subsets of the phenotypes. To properly exploit the rich structure present in each of the imaging and genetics domains, we additionally propose the use of several structured penalties within the sRRR model. Using simulation procedures that accurately reflect realistic imaging genetics data, we present detailed evaluations of the sRRR method in comparison with the more traditional univariate linear modelling approach. In all settings considered, we show that sRRR possesses better power to detect the deleterious genetic variants. Moreover, using a simple genetic model, we demonstrate the potential benefits, in terms of statistical power, of carrying out voxel-wise searches as opposed to extracting averages over regions of interest in the brain. Since this entails the use of phenotypic vectors of enormous dimensionality, we suggest the use of a sparse classification model as a de-noising step, prior to the imaging genetics study. Finally, we present the application of a data re-sampling technique within the sRRR model for model selection. Using this approach we are able to rank the genetic markers in order of importance of association to the phenotypes, and similarly rank the phenotypes in order of importance to the genetic markers. In the very end, we illustrate the application perspective of the proposed statistical models in three real imaging genetics datasets and highlight some potential associations

Spiral - Imperial College Digital Repository

Shrinkage methods for variable selection and prediction with applications to genetic data

Author: Cule Erika
Publication venue: School of public health, Imperial College London
Publication date: 01/07/2013
Field of study

Identifying genotypes using genetic material was at first a painstaking laboratory task. In the decades since the first gene was sequenced, techniques have progressed through milestones requiring massive international collaboration. Today’s genotype sequencing facilities use high-throughput technology to sequence entire genomes within days. Despite these technological improvements, and the resultant volume of genetic data, the identification of meaningful genotype-phenotype associations has not been as straightforward as was anticipated in the pre-genome era. The genetic architecture of many common diseases is complex, and heritability often cannot be explained when simple statistical tests are used. This thesis addresses a clinically important problem in statistical genetics - that of predicting disease risk based on genotype information. First, we review progress and current limitations in genetic risk prediction. We then introduce penalised regression. This thesis focusses on ridge regression, a penalised regression approach that has shown promise in risk prediction for high-dimensional data. The choice of the ridge parameter, which controls the amount of penalisation in ridge regression, has not been addressed in the literature with the specific aim of analysing genetic data. We present a method for automatically choosing the ridge parameter based on genome-wide SNP data. Software implementing the method is available to the community. We evaluate the method using simulation studies and a real data example. A ridge regression model does not indicate the strength of association of individual variants with the outcome, a property that is often of interest to geneticists. To this end we extend a previously proposed test of significance in ridge regression models to high-dimensional data and to the logistic model which commonly occurs in the biomedical context. This test is evaluated by comparison to a permutation test, which we view as a benchmark. This test is integrated into the software package mentioned above

Spiral - Imperial College Digital Repository

High Dimensional Statistical Modelling with Limited Information

Author: BASU TATHAGATA
Publication venue
Publication date: 01/01/2021
Field of study

Modern scientific experiments often rely on different statistical tools, regularisation being one of them. Regularisation methods are usually used to avoid overfitting but we may also want use regularisation methods for variable selection, especially when the number of modelling parameters are higher than the total number of observations. However, performing variable selection can often be difficult under limited information and we may get a misspecified model. To overcome this issue, we propose a robust variable selection routine using a Bayesian hierarchical model. We adapt the framework of Narisetty and He to propose a novel spike and slab prior specification for the regression coefficients. We take inspiration from the imprecise beta model and use a set of beta distributions to specify the prior expectation of the selection probability. We perform a robust Bayesian analysis over this set of distributions in order to incorporate expert opinion in an efficient manner. We also discuss novel results on likelihood-based approaches for variable selection. We exploit the framework of the adaptive LASSO to propose sensitivity analyses of LASSO-type problems. The sensitivity analysis also gives us a novel non-deterministic classifier for high dimensional problems, which we illustrate using real datasets. Finally, we illustrate our novel robust Bayesian variable selection using synthetic and real-world data. We show the importance of prior elicitation in variable selection as well as model fitting and compare our method with other Bayesian approaches for variable selection

Durham e-Theses

Randomised and L1-penalty approaches to segmentation in time series and regression models

Author: Korkas Karolos
Publication venue
Publication date: 01/08/2014
Field of study

It is a common approach in statistics to assume that the parameters of a stochastic model change. The simplest model involves parameters than can be exactly or approximately piecewise constant. In such a model, the aim is the posteriori detection of the number and location in time of the changes in the parameters. This thesis develops segmentation methods for non-stationary time series and regression models using randomised methods or methods that involve L1 penalties which force the coefficients in a regression model to be exactly zero. Randomised techniques are not commonly found in nonparametric statistics, whereas L1 methods draw heavily from the variable selection literature. Considering these two categories together, apart from other contributions, enables a comparison between them by pointing out strengths and weaknesses. This is achieved by organising the thesis into three main parts. First, we propose a new technique for detecting the number and locations of the change-points in the second-order structure of a time series. The core of the segmentation procedure is the Wild Binary Segmentation method (WBS) of Fryzlewicz (2014), a technique which involves a certain randomised mechanism. The advantage of WBS over the standard Binary Segmentation lies in its localisation feature, thanks to which it works in cases where the spacings between change-points are short. Our main change-point detection statistic is the wavelet periodogram which allows a rigorous estimation of the local autocovariance of a piecewise-stationary process. We provide a proof of consistency and examine the performance of the method on simulated and real data sets. Second, we study the fused lasso estimator which, in its simplest form, deals with the estimation of a piecewise constant function contaminated with Gaussian noise (Friedman et al. (2007)). We show a fast way of implementing the solution path algorithm of Tibshirani and Taylor (2011) and we make a connection between their algorithm and the taut-string method of Davies and Kovac (2001). In addition, a theoretical result and a simulation study indicate that the fused lasso estimator is suboptimal in detecting the location of a change-point. Finally, we propose a method to estimate regression models in which the coefficients vary with respect to some covariate such as time. In particular, we present a path algorithm based on Tibshirani and Taylor (2011) and the fused lasso method of Tibshirani et al. (2005). Thanks to the adaptability of the fused lasso penalty, our proposed method goes beyond the estimation of piecewise constant models to models where the underlying coefficient function can be piecewise linear, quadratic or cubic. Our simulation studies show that in most cases the method outperforms smoothing splines, a common approach in estimating this class of models

LSE Theses Online

Statistical methods in niche modelling for the spatial prediction of forest tree species

Author: Fensterer Veronika
Publication venue
Publication date: 01/01/2010
Field of study

Open Access LMU

Sparse generalised principal component analysis

Author: Artemiou Andreas
Morgan Jennifer
Smallman Luke
Publication venue: 'Elsevier BV'
Publication date: 18/06/2018
Field of study

In this paper, we develop a sparse method for unsupervised dimension reduction for data from an exponential-family distribution. Our idea extends previous work on Generalised Principal Component Analysis by adding L1 and SCAD penalties to introduce sparsity. We demonstrate the significance and advantages of our method with synthetic and real data examples. We focus on the application to text data which is high-dimensional and non-Gaussian by nature and discuss the potential advantages of our methodology in achieving dimension reduction

Crossref

Online Research @ Cardiff