Search CORE

21,612 research outputs found

Regularized Maximum Likelihood Estimation and Feature Selection in Mixtures-of-Experts Models

Author: Chamroukhi Faicel
Huynh Bao-Tuyen
Publication venue
Publication date: 29/10/2018
Field of study

Mixture of Experts (MoE) are successful models for modeling heterogeneous data in many statistical learning problems including regression, clustering and classification. Generally fitted by maximum likelihood estimation via the well-known EM algorithm, their application to high-dimensional problems is still therefore challenging. We consider the problem of fitting and feature selection in MoE models, and propose a regularized maximum likelihood estimation approach that encourages sparse solutions for heterogeneous regression data models with potentially high-dimensional predictors. Unlike state-of-the art regularized MLE for MoE, the proposed modelings do not require an approximate of the penalty function. We develop two hybrid EM algorithms: an Expectation-Majorization-Maximization (EM/MM) algorithm, and an EM algorithm with coordinate ascent algorithm. The proposed algorithms allow to automatically obtaining sparse solutions without thresholding, and avoid matrix inversion by allowing univariate parameter updates. An experimental study shows the good performance of the algorithms in terms of recovering the actual sparse solutions, parameter estimation, and clustering of heterogeneous regression data

arXiv.org e-Print Archive

HAL - Normandie Université

Numérisation de Documents Anciens Mathématiques

Hal-Diderot

Differentially Private Model Selection with Penalized and Constrained Likelihood

Author: Chaudhuri K.
Chaudhuri K.
Chaudhuri K.
Dalenius T.
Duchi J. C.
Fienberg S.
Gaboardi M.
Hardt M.
Lei J.
Rubin D. B.
Smith A.
Tibshirani R.
Uhler C.
Publication venue
Publication date: 14/07/2016
Field of study

In statistical disclosure control, the goal of data analysis is twofold: The released information must provide accurate and useful statistics about the underlying population of interest, while minimizing the potential for an individual record to be identified. In recent years, the notion of differential privacy has received much attention in theoretical computer science, machine learning, and statistics. It provides a rigorous and strong notion of protection for individuals' sensitive information. A fundamental question is how to incorporate differential privacy into traditional statistical inference procedures. In this paper we study model selection in multivariate linear regression under the constraint of differential privacy. We show that model selection procedures based on penalized least squares or likelihood can be made differentially private by a combination of regularization and randomization, and propose two algorithms to do so. We show that our private procedures are consistent under essentially the same conditions as the corresponding non-private procedures. We also find that under differential privacy, the procedure becomes more sensitive to the tuning parameters. We illustrate and evaluate our method using simulation studies and two real data examples

arXiv.org e-Print Archive

Crossref

Boston University Institutional Repository (OpenBU)

Recommended from our members

Semiparametric estimation for a class of time-inhomogenous diffusion processes

Author: Li M
Wang H
Yu K
Yu Y
Publication venue: Institute of Statistical Science, Academia Sinica & International Chinese Statistical Association
Publication date: 01/01/2009
Field of study

Copyright @ 2009 Institute of Statistical Science, Academia SinicaWe develop two likelihood-based approaches to semiparametrically estimate a class of time-inhomogeneous diffusion processes: log penalized splines (P-splines) and the local log-linear method. Positive volatility is naturally embedded and this positivity is not guaranteed in most existing diffusion models. We investigate different smoothing parameter selections. Separate bandwidths are used for drift and volatility estimation. In the log P-splines approach, different smoothness for different time varying coefficients is feasible by assigning different penalty parameters. We also provide theorems for both approaches and report statistical inference results. Finally, we present a case study using the weekly three-month Treasury bill data from 1954 to 2004. We find that the log P-splines approach seems to capture the volatility dip in mid-1960s the best. We also present an application to calculate a financial market risk measure called Value at Risk (VaR) using statistical estimates from log P-splines

Brunel University Research Archive

Convex and non-convex regularization methods for spatial point processes intensity estimation

Author: Choiruddin Achmad
Coeurjolly Jean-François
Letué Frédérique
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 07/03/2017
Field of study

This paper deals with feature selection procedures for spatial point processes intensity estimation. We consider regularized versions of estimating equations based on Campbell theorem derived from two classical functions: Poisson likelihood and logistic regression likelihood. We provide general conditions on the spatial point processes and on penalty functions which ensure consistency, sparsity and asymptotic normality. We discuss the numerical implementation and assess finite sample properties in a simulation study. Finally, an application to tropical forestry datasets illustrates the use of the proposed methods

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

VBN