658 research outputs found
Link Prediction via Generalized Coupled Tensor Factorisation
This study deals with the missing link prediction problem: the problem of
predicting the existence of missing connections between entities of interest.
We address link prediction using coupled analysis of relational datasets
represented as heterogeneous data, i.e., datasets in the form of matrices and
higher-order tensors. We propose to use an approach based on probabilistic
interpretation of tensor factorisation models, i.e., Generalised Coupled Tensor
Factorisation, which can simultaneously fit a large class of tensor models to
higher-order tensors/matrices with com- mon latent factors using different loss
functions. Numerical experiments demonstrate that joint analysis of data from
multiple sources via coupled factorisation improves the link prediction
performance and the selection of right loss function and tensor model is
crucial for accurately predicting missing links
Exploring multimodal data fusion through joint decompositions with flexible couplings
A Bayesian framework is proposed to define flexible coupling models for joint
tensor decompositions of multiple data sets. Under this framework, a natural
formulation of the data fusion problem is to cast it in terms of a joint
maximum a posteriori (MAP) estimator. Data driven scenarios of joint posterior
distributions are provided, including general Gaussian priors and non Gaussian
coupling priors. We present and discuss implementation issues of algorithms
used to obtain the joint MAP estimator. We also show how this framework can be
adapted to tackle the problem of joint decompositions of large datasets. In the
case of a conditional Gaussian coupling with a linear transformation, we give
theoretical bounds on the data fusion performance using the Bayesian Cramer-Rao
bound. Simulations are reported for hybrid coupling models ranging from simple
additive Gaussian models, to Gamma-type models with positive variables and to
the coupling of data sets which are inherently of different size due to
different resolution of the measurement devices.Comment: 15 pages, 7 figures, revised versio
Structure-revealing data fusion
BACKGROUND: Analysis of data from multiple sources has the potential to enhance knowledge discovery by capturing underlying structures, which are, otherwise, difficult to extract. Fusing data from multiple sources has already proved useful in many applications in social network analysis, signal processing and bioinformatics. However, data fusion is challenging since data from multiple sources are often (i) heterogeneous (i.e., in the form of higher-order tensors and matrices), (ii) incomplete, and (iii) have both shared and unshared components. In order to address these challenges, in this paper, we introduce a novel unsupervised data fusion model based on joint factorization of matrices and higher-order tensors. RESULTS: While the traditional formulation of coupled matrix and tensor factorizations modeling only shared factors fails to capture the underlying structures in the presence of both shared and unshared factors, the proposed data fusion model has the potential to automatically reveal shared and unshared components through modeling constraints. Using numerical experiments, we demonstrate the effectiveness of the proposed approach in terms of identifying shared and unshared components. Furthermore, we measure a set of mixtures with known chemical composition using both LC-MS (Liquid Chromatography - Mass Spectrometry) and NMR (Nuclear Magnetic Resonance) and demonstrate that the structure-revealing data fusion model can (i) successfully capture the chemicals in the mixtures and extract the relative concentrations of the chemicals accurately, (ii) provide promising results in terms of identifying shared and unshared chemicals, and (iii) reveal the relevant patterns in LC-MS by coupling with the diffusion NMR data. CONCLUSIONS: We have proposed a structure-revealing data fusion model that can jointly analyze heterogeneous, incomplete data sets with shared and unshared components and demonstrated its promising performance as well as potential limitations on both simulated and real data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2105-15-239) contains supplementary material, which is available to authorized users
Tensor Analysis and Fusion of Multimodal Brain Images
Current high-throughput data acquisition technologies probe dynamical systems
with different imaging modalities, generating massive data sets at different
spatial and temporal resolutions posing challenging problems in multimodal data
fusion. A case in point is the attempt to parse out the brain structures and
networks that underpin human cognitive processes by analysis of different
neuroimaging modalities (functional MRI, EEG, NIRS etc.). We emphasize that the
multimodal, multi-scale nature of neuroimaging data is well reflected by a
multi-way (tensor) structure where the underlying processes can be summarized
by a relatively small number of components or "atoms". We introduce
Markov-Penrose diagrams - an integration of Bayesian DAG and tensor network
notation in order to analyze these models. These diagrams not only clarify
matrix and tensor EEG and fMRI time/frequency analysis and inverse problems,
but also help understand multimodal fusion via Multiway Partial Least Squares
and Coupled Matrix-Tensor Factorization. We show here, for the first time, that
Granger causal analysis of brain networks is a tensor regression problem, thus
allowing the atomic decomposition of brain networks. Analysis of EEG and fMRI
recordings shows the potential of the methods and suggests their use in other
scientific domains.Comment: 23 pages, 15 figures, submitted to Proceedings of the IEE
Scalable Bayesian Non-Negative Tensor Factorization for Massive Count Data
We present a Bayesian non-negative tensor factorization model for
count-valued tensor data, and develop scalable inference algorithms (both batch
and online) for dealing with massive tensors. Our generative model can handle
overdispersed counts as well as infer the rank of the decomposition. Moreover,
leveraging a reparameterization of the Poisson distribution as a multinomial
facilitates conjugacy in the model and enables simple and efficient Gibbs
sampling and variational Bayes (VB) inference updates, with a computational
cost that only depends on the number of nonzeros in the tensor. The model also
provides a nice interpretability for the factors; in our model, each factor
corresponds to a "topic". We develop a set of online inference algorithms that
allow further scaling up the model to massive tensors, for which batch
inference methods may be infeasible. We apply our framework on diverse
real-world applications, such as \emph{multiway} topic modeling on a scientific
publications database, analyzing a political science data set, and analyzing a
massive household transactions data set.Comment: ECML PKDD 201
- …