6,863 research outputs found
Sufficient Dimension Reduction and Modeling Responses Conditioned on Covariates: An Integrated Approach via Convex Optimization
Given observations of a collection of covariates and responses , sufficient dimension reduction (SDR)
techniques aim to identify a mapping
with such that is independent of . The image
summarizes the relevant information in a potentially large number of covariates
that influence the responses . In many contemporary settings, the number
of responses is also quite large, in addition to a large number of
covariates. This leads to the challenge of fitting a succinctly parameterized
statistical model to , which is a problem that is usually not addressed
in a traditional SDR framework. In this paper, we present a computationally
tractable convex relaxation based estimator for simultaneously (a) identifying
a linear dimension reduction of the covariates that is sufficient with
respect to the responses, and (b) fitting several types of structured
low-dimensional models -- factor models, graphical models, latent-variable
graphical models -- to the conditional distribution of . We analyze the
consistency properties of our estimator in a high-dimensional scaling regime.
We also illustrate the performance of our approach on a newsgroup dataset and
on a dataset consisting of financial asset prices.Comment: 34 pages, 1 figur
Markov Network Structure Learning via Ensemble-of-Forests Models
Real world systems typically feature a variety of different dependency types
and topologies that complicate model selection for probabilistic graphical
models. We introduce the ensemble-of-forests model, a generalization of the
ensemble-of-trees model. Our model enables structure learning of Markov random
fields (MRF) with multiple connected components and arbitrary potentials. We
present two approximate inference techniques for this model and demonstrate
their performance on synthetic data. Our results suggest that the
ensemble-of-forests approach can accurately recover sparse, possibly
disconnected MRF topologies, even in presence of non-Gaussian dependencies
and/or low sample size. We applied the ensemble-of-forests model to learn the
structure of perturbed signaling networks of immune cells and found that these
frequently exhibit non-Gaussian dependencies with disconnected MRF topologies.
In summary, we expect that the ensemble-of-forests model will enable MRF
structure learning in other high dimensional real world settings that are
governed by non-trivial dependencies.Comment: 13 pages, 6 figure
A Graphical Model Formulation of Collaborative Filtering Neighbourhood Methods with Fast Maximum Entropy Training
Item neighbourhood methods for collaborative filtering learn a weighted graph
over the set of items, where each item is connected to those it is most similar
to. The prediction of a user's rating on an item is then given by that rating
of neighbouring items, weighted by their similarity. This paper presents a new
neighbourhood approach which we call item fields, whereby an undirected
graphical model is formed over the item graph. The resulting prediction rule is
a simple generalization of the classical approaches, which takes into account
non-local information in the graph, allowing its best results to be obtained
when using drastically fewer edges than other neighbourhood approaches. A fast
approximate maximum entropy training method based on the Bethe approximation is
presented, which uses a simple gradient ascent procedure. When using
precomputed sufficient statistics on the Movielens datasets, our method is
faster than maximum likelihood approaches by two orders of magnitude.Comment: ICML201
Hybrid approximate message passing
Gaussian and quadratic approximations of message passing algorithms on graphs have attracted considerable recent attention due to their computational simplicity, analytic tractability, and wide applicability in optimization and statistical inference problems. This paper presents a systematic framework for incorporating such approximate message passing (AMP) methods in general graphical models. The key concept is a partition of dependencies of a general graphical model into strong and weak edges, with the weak edges representing interactions through aggregates of small, linearizable couplings of variables. AMP approximations based on the Central Limit Theorem can be readily applied to aggregates of many weak edges and integrated with standard message passing updates on the strong edges. The resulting algorithm, which we call hybrid generalized approximate message passing (HyGAMP), can yield significantly simpler implementations of sum-product and max-sum loopy belief propagation. By varying the partition of strong and weak edges, a performance--complexity trade-off can be achieved. Group sparsity and multinomial logistic regression problems are studied as examples of the proposed methodology.The work of S. Rangan was supported in part by the National Science Foundation under Grants 1116589, 1302336, and 1547332, and in part by the industrial affiliates of NYU WIRELESS. The work of A. K. Fletcher was supported in part by the National Science Foundation under Grants 1254204 and 1738286 and in part by the Office of Naval Research under Grant N00014-15-1-2677. The work of V. K. Goyal was supported in part by the National Science Foundation under Grant 1422034. The work of E. Byrne and P. Schniter was supported in part by the National Science Foundation under Grant CCF-1527162. (1116589 - National Science Foundation; 1302336 - National Science Foundation; 1547332 - National Science Foundation; 1254204 - National Science Foundation; 1738286 - National Science Foundation; 1422034 - National Science Foundation; CCF-1527162 - National Science Foundation; NYU WIRELESS; N00014-15-1-2677 - Office of Naval Research
- …