31 research outputs found
Probabilistic Latent Tensor Factorization Model for Link Pattern Prediction in Multi-relational Networks
This paper aims at the problem of link pattern prediction in collections of
objects connected by multiple relation types, where each type may play a
distinct role. While common link analysis models are limited to single-type
link prediction, we attempt here to capture the correlations among different
relation types and reveal the impact of various relation types on performance
quality. For that, we define the overall relations between object pairs as a
\textit{link pattern} which consists in interaction pattern and connection
structure in the network, and then use tensor formalization to jointly model
and predict the link patterns, which we refer to as \textit{Link Pattern
Prediction} (LPP) problem. To address the issue, we propose a Probabilistic
Latent Tensor Factorization (PLTF) model by introducing another latent factor
for multiple relation types and furnish the Hierarchical Bayesian treatment of
the proposed probabilistic model to avoid overfitting for solving the LPP
problem. To learn the proposed model we develop an efficient Markov Chain Monte
Carlo sampling method. Extensive experiments are conducted on several real
world datasets and demonstrate significant improvements over several existing
state-of-the-art methods.Comment: 19pages, 5 figure
Incorporating Side Information in Probabilistic Matrix Factorization with Gaussian Processes
Probabilistic matrix factorization (PMF) is a powerful method for modeling
data associated with pairwise relationships, finding use in collaborative
filtering, computational biology, and document analysis, among other areas. In
many domains, there is additional information that can assist in prediction.
For example, when modeling movie ratings, we might know when the rating
occurred, where the user lives, or what actors appear in the movie. It is
difficult, however, to incorporate this side information into the PMF model. We
propose a framework for incorporating side information by coupling together
multiple PMF problems via Gaussian process priors. We replace scalar latent
features with functions that vary over the space of side information. The GP
priors on these functions require them to vary smoothly and share information.
We successfully use this new method to predict the scores of professional
basketball games, where side information about the venue and date of the game
are relevant for the outcome.Comment: 18 pages, 4 figures, Submitted to UAI 201
Effective and Efficient Similarity Index for Link Prediction of Complex Networks
Predictions of missing links of incomplete networks like protein-protein
interaction networks or very likely but not yet existent links in evolutionary
networks like friendship networks in web society can be considered as a
guideline for further experiments or valuable information for web users. In
this paper, we introduce a local path index to estimate the likelihood of the
existence of a link between two nodes. We propose a network model with
controllable density and noise strength in generating links, as well as collect
data of six real networks. Extensive numerical simulations on both modeled
networks and real networks demonstrated the high effectiveness and efficiency
of the local path index compared with two well-known and widely used indices,
the common neighbors and the Katz index. Indeed, the local path index provides
competitively accurate predictions as the Katz index while requires much less
CPU time and memory space, which is therefore a strong candidate for potential
practical applications in data mining of huge-size networks.Comment: 8 pages, 5 figures, 3 table
Transposable regularized covariance models with an application to missing data imputation
Missing data estimation is an important challenge with high-dimensional data
arranged in the form of a matrix. Typically this data matrix is transposable,
meaning that either the rows, columns or both can be treated as features. To
model transposable data, we present a modification of the matrix-variate
normal, the mean-restricted matrix-variate normal, in which the rows and
columns each have a separate mean vector and covariance matrix. By placing
additive penalties on the inverse covariance matrices of the rows and columns,
these so-called transposable regularized covariance models allow for maximum
likelihood estimation of the mean and nonsingular covariance matrices. Using
these models, we formulate EM-type algorithms for missing data imputation in
both the multivariate and transposable frameworks. We present theoretical
results exploiting the structure of our transposable models that allow these
models and imputation methods to be applied to high-dimensional data.
Simulations and results on microarray data and the Netflix data show that these
imputation techniques often outperform existing methods and offer a greater
degree of flexibility.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS314 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org