2,940 research outputs found
Incorporating Side Information in Probabilistic Matrix Factorization with Gaussian Processes
Probabilistic matrix factorization (PMF) is a powerful method for modeling
data associated with pairwise relationships, finding use in collaborative
filtering, computational biology, and document analysis, among other areas. In
many domains, there is additional information that can assist in prediction.
For example, when modeling movie ratings, we might know when the rating
occurred, where the user lives, or what actors appear in the movie. It is
difficult, however, to incorporate this side information into the PMF model. We
propose a framework for incorporating side information by coupling together
multiple PMF problems via Gaussian process priors. We replace scalar latent
features with functions that vary over the space of side information. The GP
priors on these functions require them to vary smoothly and share information.
We successfully use this new method to predict the scores of professional
basketball games, where side information about the venue and date of the game
are relevant for the outcome.Comment: 18 pages, 4 figures, Submitted to UAI 201
Parallel Algorithms for Constrained Tensor Factorization via the Alternating Direction Method of Multipliers
Tensor factorization has proven useful in a wide range of applications, from
sensor array processing to communications, speech and audio signal processing,
and machine learning. With few recent exceptions, all tensor factorization
algorithms were originally developed for centralized, in-memory computation on
a single machine; and the few that break away from this mold do not easily
incorporate practically important constraints, such as nonnegativity. A new
constrained tensor factorization framework is proposed in this paper, building
upon the Alternating Direction method of Multipliers (ADMoM). It is shown that
this simplifies computations, bypassing the need to solve constrained
optimization problems in each iteration; and it naturally leads to distributed
algorithms suitable for parallel implementation on regular high-performance
computing (e.g., mesh) architectures. This opens the door for many emerging big
data-enabled applications. The methodology is exemplified using nonnegativity
as a baseline constraint, but the proposed framework can more-or-less readily
incorporate many other types of constraints. Numerical experiments are very
encouraging, indicating that the ADMoM-based nonnegative tensor factorization
(NTF) has high potential as an alternative to state-of-the-art approaches.Comment: Submitted to the IEEE Transactions on Signal Processin
Machine Learning and Integrative Analysis of Biomedical Big Data.
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
Multi-view Learning as a Nonparametric Nonlinear Inter-Battery Factor Analysis
Factor analysis aims to determine latent factors, or traits, which summarize
a given data set. Inter-battery factor analysis extends this notion to multiple
views of the data. In this paper we show how a nonlinear, nonparametric version
of these models can be recovered through the Gaussian process latent variable
model. This gives us a flexible formalism for multi-view learning where the
latent variables can be used both for exploratory purposes and for learning
representations that enable efficient inference for ambiguous estimation tasks.
Learning is performed in a Bayesian manner through the formulation of a
variational compression scheme which gives a rigorous lower bound on the log
likelihood. Our Bayesian framework provides strong regularization during
training, allowing the structure of the latent space to be determined
efficiently and automatically. We demonstrate this by producing the first (to
our knowledge) published results of learning from dozens of views, even when
data is scarce. We further show experimental results on several different types
of multi-view data sets and for different kinds of tasks, including exploratory
data analysis, generation, ambiguity modelling through latent priors and
classification.Comment: 49 pages including appendi
- …