658 research outputs found

    Link Prediction via Generalized Coupled Tensor Factorisation

    Full text link
    This study deals with the missing link prediction problem: the problem of predicting the existence of missing connections between entities of interest. We address link prediction using coupled analysis of relational datasets represented as heterogeneous data, i.e., datasets in the form of matrices and higher-order tensors. We propose to use an approach based on probabilistic interpretation of tensor factorisation models, i.e., Generalised Coupled Tensor Factorisation, which can simultaneously fit a large class of tensor models to higher-order tensors/matrices with com- mon latent factors using different loss functions. Numerical experiments demonstrate that joint analysis of data from multiple sources via coupled factorisation improves the link prediction performance and the selection of right loss function and tensor model is crucial for accurately predicting missing links

    Exploring multimodal data fusion through joint decompositions with flexible couplings

    Full text link
    A Bayesian framework is proposed to define flexible coupling models for joint tensor decompositions of multiple data sets. Under this framework, a natural formulation of the data fusion problem is to cast it in terms of a joint maximum a posteriori (MAP) estimator. Data driven scenarios of joint posterior distributions are provided, including general Gaussian priors and non Gaussian coupling priors. We present and discuss implementation issues of algorithms used to obtain the joint MAP estimator. We also show how this framework can be adapted to tackle the problem of joint decompositions of large datasets. In the case of a conditional Gaussian coupling with a linear transformation, we give theoretical bounds on the data fusion performance using the Bayesian Cramer-Rao bound. Simulations are reported for hybrid coupling models ranging from simple additive Gaussian models, to Gamma-type models with positive variables and to the coupling of data sets which are inherently of different size due to different resolution of the measurement devices.Comment: 15 pages, 7 figures, revised versio

    Structure-revealing data fusion

    Get PDF
    BACKGROUND: Analysis of data from multiple sources has the potential to enhance knowledge discovery by capturing underlying structures, which are, otherwise, difficult to extract. Fusing data from multiple sources has already proved useful in many applications in social network analysis, signal processing and bioinformatics. However, data fusion is challenging since data from multiple sources are often (i) heterogeneous (i.e., in the form of higher-order tensors and matrices), (ii) incomplete, and (iii) have both shared and unshared components. In order to address these challenges, in this paper, we introduce a novel unsupervised data fusion model based on joint factorization of matrices and higher-order tensors. RESULTS: While the traditional formulation of coupled matrix and tensor factorizations modeling only shared factors fails to capture the underlying structures in the presence of both shared and unshared factors, the proposed data fusion model has the potential to automatically reveal shared and unshared components through modeling constraints. Using numerical experiments, we demonstrate the effectiveness of the proposed approach in terms of identifying shared and unshared components. Furthermore, we measure a set of mixtures with known chemical composition using both LC-MS (Liquid Chromatography - Mass Spectrometry) and NMR (Nuclear Magnetic Resonance) and demonstrate that the structure-revealing data fusion model can (i) successfully capture the chemicals in the mixtures and extract the relative concentrations of the chemicals accurately, (ii) provide promising results in terms of identifying shared and unshared chemicals, and (iii) reveal the relevant patterns in LC-MS by coupling with the diffusion NMR data. CONCLUSIONS: We have proposed a structure-revealing data fusion model that can jointly analyze heterogeneous, incomplete data sets with shared and unshared components and demonstrated its promising performance as well as potential limitations on both simulated and real data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2105-15-239) contains supplementary material, which is available to authorized users

    Tensor Analysis and Fusion of Multimodal Brain Images

    Get PDF
    Current high-throughput data acquisition technologies probe dynamical systems with different imaging modalities, generating massive data sets at different spatial and temporal resolutions posing challenging problems in multimodal data fusion. A case in point is the attempt to parse out the brain structures and networks that underpin human cognitive processes by analysis of different neuroimaging modalities (functional MRI, EEG, NIRS etc.). We emphasize that the multimodal, multi-scale nature of neuroimaging data is well reflected by a multi-way (tensor) structure where the underlying processes can be summarized by a relatively small number of components or "atoms". We introduce Markov-Penrose diagrams - an integration of Bayesian DAG and tensor network notation in order to analyze these models. These diagrams not only clarify matrix and tensor EEG and fMRI time/frequency analysis and inverse problems, but also help understand multimodal fusion via Multiway Partial Least Squares and Coupled Matrix-Tensor Factorization. We show here, for the first time, that Granger causal analysis of brain networks is a tensor regression problem, thus allowing the atomic decomposition of brain networks. Analysis of EEG and fMRI recordings shows the potential of the methods and suggests their use in other scientific domains.Comment: 23 pages, 15 figures, submitted to Proceedings of the IEE

    Scalable Bayesian Non-Negative Tensor Factorization for Massive Count Data

    Full text link
    We present a Bayesian non-negative tensor factorization model for count-valued tensor data, and develop scalable inference algorithms (both batch and online) for dealing with massive tensors. Our generative model can handle overdispersed counts as well as infer the rank of the decomposition. Moreover, leveraging a reparameterization of the Poisson distribution as a multinomial facilitates conjugacy in the model and enables simple and efficient Gibbs sampling and variational Bayes (VB) inference updates, with a computational cost that only depends on the number of nonzeros in the tensor. The model also provides a nice interpretability for the factors; in our model, each factor corresponds to a "topic". We develop a set of online inference algorithms that allow further scaling up the model to massive tensors, for which batch inference methods may be infeasible. We apply our framework on diverse real-world applications, such as \emph{multiway} topic modeling on a scientific publications database, analyzing a political science data set, and analyzing a massive household transactions data set.Comment: ECML PKDD 201
    corecore