1,294 research outputs found
Modeling item--item similarities for personalized recommendations on Yahoo! front page
We consider the problem of algorithmically recommending items to users on a
Yahoo! front page module. Our approach is based on a novel multilevel
hierarchical model that we refer to as a User Profile Model with Graphical
Lasso (UPG). The UPG provides a personalized recommendation to users by
simultaneously incorporating both user covariates and historical user
interactions with items in a model based way. In fact, we build a per-item
regression model based on a rich set of user covariates and estimate individual
user affinity to items by introducing a latent random vector for each user. The
vector random effects are assumed to be drawn from a prior with a precision
matrix that measures residual partial associations among items. To ensure
better estimates of a precision matrix in high-dimensions, the matrix elements
are constrained through a Lasso penalty. Our model is fitted through a
penalized-quasi likelihood procedure coupled with a scalable EM algorithm. We
employ several computational strategies like multi-threading, conjugate
gradients and heavily exploit problem structure to scale our computations in
the E-step. For the M-step we take recourse to a scalable variant of the
Graphical Lasso algorithm for covariance selection. Through extensive
experiments on a new data set obtained from Yahoo! front page and a benchmark
data set from a movie recommender application, we show that our UPG model
significantly improves performance compared to several state-of-the-art methods
in the literature, especially those based on a bilinear random effects model
(BIRE). In particular, we show that the gains of UPG are significant compared
to BIRE when the number of users is large and the number of items to select
from is small. For large item sets and relatively small user sets the results
of UPG and BIRE are comparable. The UPG leads to faster model building and
produces outputs which are interpretable.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS475 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Factorial graphical lasso for dynamic networks
Dynamic networks models describe a growing number of important scientific
processes, from cell biology and epidemiology to sociology and finance. There
are many aspects of dynamical networks that require statistical considerations.
In this paper we focus on determining network structure. Estimating dynamic
networks is a difficult task since the number of components involved in the
system is very large. As a result, the number of parameters to be estimated is
bigger than the number of observations. However, a characteristic of many
networks is that they are sparse. For example, the molecular structure of genes
make interactions with other components a highly-structured and therefore
sparse process.
Penalized Gaussian graphical models have been used to estimate sparse
networks. However, the literature has focussed on static networks, which lack
specific temporal constraints. We propose a structured Gaussian dynamical
graphical model, where structures can consist of specific time dynamics, known
presence or absence of links and block equality constraints on the parameters.
Thus, the number of parameters to be estimated is reduced and accuracy of the
estimates, including the identification of the network, can be tuned up. Here,
we show that the constrained optimization problem can be solved by taking
advantage of an efficient solver, logdetPPA, developed in convex optimization.
Moreover, model selection methods for checking the sensitivity of the inferred
networks are described. Finally, synthetic and real data illustrate the
proposed methodologies.Comment: 30 pp, 5 figure
Multinomial Inverse Regression for Text Analysis
Text data, including speeches, stories, and other document forms, are often
connected to sentiment variables that are of interest for research in
marketing, economics, and elsewhere. It is also very high dimensional and
difficult to incorporate into statistical analyses. This article introduces a
straightforward framework of sentiment-preserving dimension reduction for text
data. Multinomial inverse regression is introduced as a general tool for
simplifying predictor sets that can be represented as draws from a multinomial
distribution, and we show that logistic regression of phrase counts onto
document annotations can be used to obtain low dimension document
representations that are rich in sentiment information. To facilitate this
modeling, a novel estimation technique is developed for multinomial logistic
regression with very high-dimension response. In particular, independent
Laplace priors with unknown variance are assigned to each regression
coefficient, and we detail an efficient routine for maximization of the joint
posterior over coefficients and their prior scale. This "gamma-lasso" scheme
yields stable and effective estimation for general high-dimension logistic
regression, and we argue that it will be superior to current methods in many
settings. Guidelines for prior specification are provided, algorithm
convergence is detailed, and estimator properties are outlined from the
perspective of the literature on non-concave likelihood penalization. Related
work on sentiment analysis from statistics, econometrics, and machine learning
is surveyed and connected. Finally, the methods are applied in two detailed
examples and we provide out-of-sample prediction studies to illustrate their
effectiveness.Comment: Published in the Journal of the American Statistical Association 108,
2013, with discussion (rejoinder is here: http://arxiv.org/abs/1304.4200).
Software is available in the textir package for
Sparsity with sign-coherent groups of variables via the cooperative-Lasso
We consider the problems of estimation and selection of parameters endowed
with a known group structure, when the groups are assumed to be sign-coherent,
that is, gathering either nonnegative, nonpositive or null parameters. To
tackle this problem, we propose the cooperative-Lasso penalty. We derive the
optimality conditions defining the cooperative-Lasso estimate for generalized
linear models, and propose an efficient active set algorithm suited to
high-dimensional problems. We study the asymptotic consistency of the estimator
in the linear regression setup and derive its irrepresentable conditions, which
are milder than the ones of the group-Lasso regarding the matching of groups
with the sparsity pattern of the true parameters. We also address the problem
of model selection in linear regression by deriving an approximation of the
degrees of freedom of the cooperative-Lasso estimator. Simulations comparing
the proposed estimator to the group and sparse group-Lasso comply with our
theoretical results, showing consistent improvements in support recovery for
sign-coherent groups. We finally propose two examples illustrating the wide
applicability of the cooperative-Lasso: first to the processing of ordinal
variables, where the penalty acts as a monotonicity prior; second to the
processing of genomic data, where the set of differentially expressed probes is
enriched by incorporating all the probes of the microarray that are related to
the corresponding genes.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS520 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Recommended from our members
Sparse functional regression models: minimax rates and contamination
In functional linear regression and functional generalized linear regression models, the effect of the predictor function is usually assumed to be spread across the index space. In this dissertation we consider the sparse functional linear model and the sparse functional generalized linear models (GLM), where the impact of the predictor process on the response is only via its value at one point in the index space, defined as the sensitive point. We are particularly interested in estimating the sensitive point. The minimax rate of convergence for estimating the parameters in sparse functional linear regression is derived. It is shown that the optimal rate for estimating the sensitive point depends on the roughness of the predictor function, which is quantified by a "generalized Hurst exponent". The least squares estimator (LSE) is shown to attain the optimal rate. Also, a lower bound is given on the minimax risk of estimating the parameters in sparse functional GLM, which also depends on the generalized Hurst exponent of the predictor process. The order of the minimax lower bound is the same as that of the weak convergence rate of the maximum likelihood estimator (MLE), given that the functional predictor behaves like a Brownian motion
- …