42 research outputs found
Bayesian exponential family projections for coupled data sources
Exponential family extensions of principal component analysis (EPCA) have received a considerable amount of attention in recent years, demonstrating the growing need for basic modeling tools that do not assume the squared loss or Gaussian distribution. We extend the EPCA model toolbox by presenting the first exponential family multi-view learning methods of the partial least squares and canonical correlation analysis, based on a unified representation of EPCA as matrix factorization of the natural parameters of exponential family. The models are based on a new family of priors that are generally usable for all such factorizations. We also introduce new inference strategies, and demonstrate how the methods outperform earlier ones when the Gaussianity assumption does not hold
Infinite factorization of multiple non-parametric views
Combined analysis of multiple data sources has increasing application interest, in particular for distinguishing shared and source-specific aspects. We extend this rationale of classical canonical correlation analysis into a flexible, generative and non-parametric clustering
setting, by introducing a novel non-parametric hierarchical
mixture model. The lower level of the model describes each source with a flexible non-parametric mixture, and the top level combines these to describe commonalities of the sources. The lower-level clusters arise from hierarchical Dirichlet Processes, inducing an infinite-dimensional contingency table between the views. The commonalities between the sources are modeled by an infinite block
model of the contingency table, interpretable as non-negative factorization of infinite matrices, or as a prior for infinite contingency tables. With Gaussian mixture components plugged in for continuous measurements, the model is applied to two views of genes, mRNA expression and abundance of the produced proteins, to expose groups of genes that are co-regulated in either or both of the views.
Cluster analysis of co-expression is a standard simple way of screening for co-regulation, and the two-view analysis extends the approach to distinguishing between pre- and post-translational regulation
Bayesian Group Factor Analysis
We introduce a factor analysis model that summarizes the dependencies between
observed variable groups, instead of dependencies between individual variables
as standard factor analysis does. A group may correspond to one view of the
same set of objects, one of many data sets tied by co-occurrence, or a set of
alternative variables collected from statistics tables to measure one property
of interest. We show that by assuming group-wise sparse factors, active in a
subset of the sets, the variation can be decomposed into factors explaining
relationships between the sets and factors explaining away set-specific
variation. We formulate the assumptions in a Bayesian model which provides the
factors, and apply the model to two data analysis tasks, in neuroimaging and
chemical systems biology.Comment: 9 pages, 5 figure
Multi-Channel Stochastic Variational Inference for the Joint Analysis of Heterogeneous Biomedical Data in Alzheimer's Disease
The joint analysis of biomedical data in Alzheimer's Disease (AD) is
important for better clinical diagnosis and to understand the relationship
between biomarkers. However, jointly accounting for heterogeneous measures
poses important challenges related to the modeling of the variability and the
interpretability of the results. These issues are here addressed by proposing a
novel multi-channel stochastic generative model. We assume that a latent
variable generates the data observed through different channels (e.g., clinical
scores, imaging, ...) and describe an efficient way to estimate jointly the
distribution of both latent variable and data generative process. Experiments
on synthetic data show that the multi-channel formulation allows superior data
reconstruction as opposed to the single channel one. Moreover, the derived
lower bound of the model evidence represents a promising model selection
criterion. Experiments on AD data show that the model parameters can be used
for unsupervised patient stratification and for the joint interpretation of the
heterogeneous observations. Because of its general and flexible formulation, we
believe that the proposed method can find important applications as a general
data fusion technique.Comment: accepted for presentation at MLCN 2018 workshop, in Conjunction with
MICCAI 2018, September 20, Granada, Spai
Miehen hedelmällisyys
Miehen hedelmällisyyden perustutkimus on yhä siemennesteanalyysi, mutta sen kyky ennustaa raskauden alkamista on melko huono. Myös munasolun ominaisuudet vaikuttavat siittiön hedelmöityskykyyn.Jälkeläisten määrää kuvaava hedelmällisyysluku on laskenut, ja myös siemennesteen laatu heikentyy länsimaissa.Ikä, elämäntavat, sairaudet ja lääkitykset vaikuttavat miehen hedelmällisyyteen, mutta jo äidin raskaudenaikaiset elämäntavat saattavat vaikuttaa miehen hedelmällisyyteen enemmän kuin omat.Myös ympäristön monet kemikaalit uhkaavat ihmisen ja eläinten lisääntymisterveyttä.</p
Intentstreams: Smart parallel search streams for branching exploratory search
The user's understanding of information needs and the information available in the data collection can evolve during an exploratory search session. Search systems tailored for well-defined narrow search tasks may be suboptimal for exploratory search where the user can sequentially refine the expressions of her information needs and explore alternative search directions. A major challenge for exploratory search systems design is how to support such behavior and expose the user to relevant yet novel information that can be difficult to discover by using conventional query formulation techniques. We introduce IntentStreams, a system for exploratory search that provides interactive query refinement mechanisms and parallel visualization of search streams. The system models each search stream via an intent model allowing rapid user feedback. The user interface allows swift initiation of alternative and parallel search streams by direct manipulation that does not require typing. A study with 13 participants shows that IntentStreams provides better support for branching behavior compared to a conventional search system
Multicamera Action Recognition with Canonical Correlation Analysis and Discriminative Sequence Classification
Proceedings of: 4th International Work-Conference on the Interplay Between Natural and Artificial Computation, IWINAC 2011, La Palma, Canary Islands, Spain, May 30 - June 3, 2011.This paper presents a feature fusion approach to the recognition of human actions from multiple cameras that avoids the computation of the 3D visual hull. Action descriptors are extracted for each one of the camera views available and projected into a common subspace that maximizes the correlation between each one of the components of the projections. That common subspace is learned using Probabilistic Canonical Correlation Analysis. The action classification is made in that subspace using a discriminative classifier. Results of the proposed method are shown for the classification of the IXMAS dataset.Publicad
Opportunities and challenges for real-world studies on chronic inflammatory joint diseases through data enrichment and collaboration between national registers : the Nordic example
RMD Open 2018;4:e000655. doi:10.1136/ rmdopen-2018-000655Peer reviewe