23 research outputs found

    Link Prediction via Generalized Coupled Tensor Factorisation

    Full text link
    This study deals with the missing link prediction problem: the problem of predicting the existence of missing connections between entities of interest. We address link prediction using coupled analysis of relational datasets represented as heterogeneous data, i.e., datasets in the form of matrices and higher-order tensors. We propose to use an approach based on probabilistic interpretation of tensor factorisation models, i.e., Generalised Coupled Tensor Factorisation, which can simultaneously fit a large class of tensor models to higher-order tensors/matrices with com- mon latent factors using different loss functions. Numerical experiments demonstrate that joint analysis of data from multiple sources via coupled factorisation improves the link prediction performance and the selection of right loss function and tensor model is crucial for accurately predicting missing links

    Factorization of multiple tensors for supervised feature extraction

    Full text link
    © Springer International Publishing AG 2016. Tensors are effective representations for complex and timevarying networks. The factorization of a tensor provides a high-quality low-rank compact basis for each dimension of the tensor, which facilitates the interpretation of important structures of the represented data. Many existing tensor factorization (TF) methods assume there is one tensor that needs to be decomposed to low-rank factors. However in practice, data are usually generated from different time periods or by different class labels, which are represented by a sequence of multiple tensors associated with different labels. When one needs to analyse and compare multiple tensors, existing TF methods are unsuitable for discovering all potentially useful patterns, as they usually fail to discover either common or unique factors among the tensors: (1) if each tensor is factorized separately, the factor matrices will fail to explicitly capture the common information shared by different tensors, and (2) if tensors are concatenated together to form a larger “overall” tensor and then factorize this concatenated tensor, the intrinsic unique subspaces that are specific to each tensor will be lost. The cause of such an issue is mainly from the fact that existing tensor factorization methods handle data observations in an unsupervised way, considering only features but not labels of the data. To tackle this problem, we design a novel probabilistic tensor factorization model that takes both features and class labels of tensors into account, and produces informative common and unique factors of all tensors simultaneously. Experiment results on feature extraction in classification problems demonstrate the effectiveness of the factors discovered by our method

    Efficient Bayesian Model Selection in PARAFAC via Stochastic Thermodynamic Integration

    Get PDF
    International audienceParallel factor analysis (PARAFAC) is one of the most popular tensor factorization models. Even though it has proven successful in diverse application fields, the performance of PARAFAC usually hinges up on the rank of the factorization, which is typically specified manually by the practitioner. In this study, we develop a novel parallel and distributed Bayesian model selection technique for rank estimation in large-scale PARAFAC models. The proposed approach integrates ideas from the emerging field of stochastic gradient Markov Chain Monte Carlo, statistical physics, and distributed stochastic optimization. As opposed to the existing methods, which are based on some heuristics, our method has a clear mathematical interpretation, and has significantly lower computational requirements, thanks to data subsampling and parallelization. We provide formal theoretical analysis on the bias induced by the proposed approach. Our experiments on synthetic and large-scale real datasets show that our method is able to find the optimal model order while being significantly faster than the state-of-the-art

    Doubly robust Bayesian inference for non-stationary streaming data with β-divergences

    Get PDF
    We present the very first robust Bayesian Online Changepoint Detection algorithm through General Bayesian Inference (GBI) with β-divergences. The resulting inference procedure is doubly robust for both the predictive and the changepoint (CP) posterior, with linear time and constant space complexity. We provide a construction for exponential models and demonstrate it on the Bayesian Linear Regression model. In so doing, we make two additional contributions: Firstly, we make GBI scalable using Structural Variational approximations that are exact as β→0 . Secondly, we give a principled way of choosing the divergence parameter β by minimizing expected predictive loss on-line. We offer the state of the art and improve the False Discovery Rate of CP S by more than 80% on real world data

    Exploring multimodal data fusion through joint decompositions with flexible couplings

    Full text link
    A Bayesian framework is proposed to define flexible coupling models for joint tensor decompositions of multiple data sets. Under this framework, a natural formulation of the data fusion problem is to cast it in terms of a joint maximum a posteriori (MAP) estimator. Data driven scenarios of joint posterior distributions are provided, including general Gaussian priors and non Gaussian coupling priors. We present and discuss implementation issues of algorithms used to obtain the joint MAP estimator. We also show how this framework can be adapted to tackle the problem of joint decompositions of large datasets. In the case of a conditional Gaussian coupling with a linear transformation, we give theoretical bounds on the data fusion performance using the Bayesian Cramer-Rao bound. Simulations are reported for hybrid coupling models ranging from simple additive Gaussian models, to Gamma-type models with positive variables and to the coupling of data sets which are inherently of different size due to different resolution of the measurement devices.Comment: 15 pages, 7 figures, revised versio

    Extraction of Temporal Patterns in Multi-rate and Multi-modal Datasets

    Get PDF
    International audienceWe focus on the problem of analyzing corpora composed of irregularly sampled (multi-rate) heterogeneous temporal data. We propose a novel convolutive multi-rate factorization model for extracting multi-modal patterns from such multi-rate data. Our model builds up on previously proposed multi-view (coupled) nonnegative matrix factor-ization techniques, and extends them by accounting for heterogeneous sample rates and enabling the patterns to have a duration. We illustrate the proposed methodology on the joint study of audiovisual data for speech analysis