111 research outputs found

    Regularized Maximum Likelihood Estimation and Feature Selection in Mixtures-of-Experts Models

    Get PDF
    Mixture of Experts (MoE) are successful models for modeling heterogeneous data in many statistical learning problems including regression, clustering and classification. Generally fitted by maximum likelihood estimation via the well-known EM algorithm, their application to high-dimensional problems is still therefore challenging. We consider the problem of fitting and feature selection in MoE models, and propose a regularized maximum likelihood estimation approach that encourages sparse solutions for heterogeneous regression data models with potentially high-dimensional predictors. Unlike state-of-the art regularized MLE for MoE, the proposed modelings do not require an approximate of the penalty function. We develop two hybrid EM algorithms: an Expectation-Majorization-Maximization (EM/MM) algorithm, and an EM algorithm with coordinate ascent algorithm. The proposed algorithms allow to automatically obtaining sparse solutions without thresholding, and avoid matrix inversion by allowing univariate parameter updates. An experimental study shows the good performance of the algorithms in terms of recovering the actual sparse solutions, parameter estimation, and clustering of heterogeneous regression data

    Optimization with Sparsity-Inducing Penalties

    Get PDF
    Sparse estimation methods are aimed at using or obtaining parsimonious representations of data or models. They were first dedicated to linear variable selection but numerous extensions have now emerged such as structured sparsity or kernel selection. It turns out that many of the related estimation problems can be cast as convex optimization problems by regularizing the empirical risk with appropriate non-smooth norms. The goal of this paper is to present from a general perspective optimization tools and techniques dedicated to such sparsity-inducing penalties. We cover proximal methods, block-coordinate descent, reweighted â„“2\ell_2-penalized techniques, working-set and homotopy methods, as well as non-convex formulations and extensions, and provide an extensive set of experiments to compare various algorithms from a computational point of view

    Structured analysis of the high-dimensional FMR model

    Get PDF
    Abstract(#br)The finite mixture of regression (FMR) model is a popular tool for accommodating data heterogeneity. In the analysis of FMR models with high-dimensional covariates, it is necessary to conduct regularized estimation and identify important covariates rather than noises. In the literature, there has been a lack of attention paid to the differences among important covariates, which can lead to the underlying structure of covariate effects. Specifically, important covariates can be classified into two types: those that behave the same in different subpopulations and those that behave differently. It is of interest to conduct structured analysis to identify such structures, which will enable researchers to better understand covariates and their associations with outcomes. Specifically, the FMR model with high-dimensional covariates is considered. A structured penalization approach is developed for regularized estimation, selection of important variables, and, equally importantly, identification of the underlying covariate effect structure. The proposed approach can be effectively realized, and its statistical properties are rigorously established. Simulation demonstrates its superiority over alternatives. In the analysis of cancer gene expression data, interesting models/structures missed by the existing analysis are identified

    A Cyclic Coordinate Descent Method for Convex Optimization on Polytopes

    Full text link
    Coordinate descent algorithms are popular for huge-scale optimization problems due to their low cost per-iteration. Coordinate descent methods apply to problems where the constraint set is separable across coordinates. In this paper, we propose a new variant of the cyclic coordinate descent method that can handle polyhedral constraints provided that the polyhedral set does not have too many extreme points such as L1-ball and the standard simplex. Loosely speaking, our proposed algorithm PolyCD, can be viewed as a hybrid of cyclic coordinate descent and the Frank-Wolfe algorithms. We prove that PolyCD has a O(1/k) convergence rate for smooth convex objectives. Inspired by the away-step variant of Frank-Wolfe, we propose PolyCDwA, a variant of PolyCD with away steps which has a linear convergence rate when the loss function is smooth and strongly convex. Empirical studies demonstrate that PolyCDwA achieves strong computational performance for large-scale benchmark problems including L1-constrained linear regression, L1-constrained logistic regression and kernel density estimation
    • …
    corecore