754 research outputs found

    Efficient Learning of Sparse Conditional Random Fields for Supervised Sequence Labelling

    Full text link
    Conditional Random Fields (CRFs) constitute a popular and efficient approach for supervised sequence labelling. CRFs can cope with large description spaces and can integrate some form of structural dependency between labels. In this contribution, we address the issue of efficient feature selection for CRFs based on imposing sparsity through an L1 penalty. We first show how sparsity of the parameter set can be exploited to significantly speed up training and labelling. We then introduce coordinate descent parameter update schemes for CRFs with L1 regularization. We finally provide some empirical comparisons of the proposed approach with state-of-the-art CRF training strategies. In particular, it is shown that the proposed approach is able to take profit of the sparsity to speed up processing and hence potentially handle larger dimensional models

    Banking the unbanked: the Mzansi intervention in South Africa:

    Get PDF
    Purpose This paper aims to understand household’s latent behaviour decision making in accessing financial services. In this analysis we look at the determinants of the choice of the pre-entry Mzansi account by consumers in South Africa. Design/methodology/approach We use 102 variables, grouped in the following categories: basic literacy, understanding financial terms, targets for financial advice, desired financial education and financial perception. Employing a computationally efficient variable selection algorithm we study which variables can satisfactorily explain the choice of a Mzansi account. Findings The Mzansi intervention is appealing to individuals with basic but insufficient financial education. Aspirations seem to be very influential in revealing the choice of financial services and to this end Mzansi is perceived as a pre-entry account not meeting the aspirations of individuals aiming to climb up the financial services ladder. We find that Mzansi holders view the account mainly as a vehicle for receiving payments, but on the other hand are debt-averse and inclined to save. Hence although there is at present no concrete evidence that the Mzansi intervention increases access to finance via diversification (i.e. by recruiting customers into higher level accounts and services) our analysis shows that this is very likely to be the case. Originality/value The issue of demand side constraints on access to finance have been largely ignored in the theoretical and empirical literature. This paper undertakes some preliminary steps in addressing this gap

    High dimensional stochastic linear contextual bandit with missing covariates

    Full text link
    Recent works in bandit problems adopted lasso convergence theory in the sequential decision-making setting. Even with fully observed contexts, there are technical challenges that hinder the application of existing lasso convergence theory: 1) proving the restricted eigenvalue condition under conditionally sub-Gaussian noise and 2) accounting for the dependence between the context variables and the chosen actions. This paper studies the effect of missing covariates on regret for stochastic linear bandit algorithms. Our work provides a high-probability upper bound on the regret incurred by the proposed algorithm in terms of covariate sampling probabilities, showing that the regret degrades due to missingness by at most ζmin2\zeta_{min}^2, where ζmin\zeta_{min} is the minimum probability of observing covariates in the context vector. We illustrate our algorithm for the practical application of experimental design for collecting gene expression data by a sequential selection of class discriminating DNA probes.Comment: Accepted in MLSP 202

    Improving the Practicality of Model-Based Reinforcement Learning: An Investigation into Scaling up Model-Based Methods in Online Settings

    Get PDF
    This thesis is a response to the current scarcity of practical model-based control algorithms in the reinforcement learning (RL) framework. As of yet there is no consensus on how best to integrate imperfect transition models into RL whilst mitigating policy improvement instabilities in online settings. Current state-of-the-art policy learning algorithms that surpass human performance often rely on model-free approaches that enjoy unmitigated sampling of transition data. Model-based RL (MBRL) instead attempts to distil experience into transition models that allow agents to plan new policies without needing to return to the environment and sample more data. The initial focus of this investigation is on kernel conditional mean embeddings (CMEs) (Song et al., 2009) deployed in an approximate policy iteration (API) algorithm (Grünewälder et al., 2012a). This existing MBRL algorithm boasts theoretically stable policy updates in continuous state and discrete action spaces. The Bellman operator’s value function and (transition) conditional expectation are modelled and embedded respectively as functions in a reproducing kernel Hilbert space (RKHS). The resulting finite-induced approximate pseudo-MDP (Yao et al., 2014a) can be solved exactly in a dynamic programming algorithm with policy improvement suboptimality guarantees. However model construction and policy planning scale cubically and quadratically respectively with the training set size, rendering the CME impractical for sampleabundant tasks in online settings. Three variants of CME API are investigated to strike a balance between stable policy updates and reduced computational complexity. The first variant models the value function and state-action representation explicitly in a parametric CME (PCME) algorithm with favourable computational complexity. However a soft conservative policy update technique is developed to mitigate policy learning oscillations in the planning process. The second variant returns to the non-parametric embedding and contributes (along with external work) to the compressed CME (CCME); a sparse and computationally more favourable CME. The final variant is a fully end-to-end differentiable embedding trained with stochastic gradient updates. The value function remains modelled in an RKHS such that backprop is driven by a non-parametric RKHS loss function. Actively compressed CME (ACCME) satisfies the pseudo-MDP contraction constraint using a sparse softmax activation function. The size of the pseudo-MDP (i.e. the size of the embedding’s last layer) is controlled by sparsifying the last layer weight matrix by extending the truncated gradient method (Langford et al., 2009) with group lasso updates in a novel ‘use it or lose it’ neuron pruning mechanism. Surprisingly this technique does not require extensive fine-tuning between control tasks
    • …
    corecore