243 research outputs found
Zero-Truncated Poisson Tensor Factorization for Massive Binary Tensors
We present a scalable Bayesian model for low-rank factorization of massive
tensors with binary observations. The proposed model has the following key
properties: (1) in contrast to the models based on the logistic or probit
likelihood, using a zero-truncated Poisson likelihood for binary data allows
our model to scale up in the number of \emph{ones} in the tensor, which is
especially appealing for massive but sparse binary tensors; (2)
side-information in form of binary pairwise relationships (e.g., an adjacency
network) between objects in any tensor mode can also be leveraged, which can be
especially useful in "cold-start" settings; and (3) the model admits simple
Bayesian inference via batch, as well as \emph{online} MCMC; the latter allows
scaling up even for \emph{dense} binary data (i.e., when the number of ones in
the tensor/network is also massive). In addition, non-negative factor matrices
in our model provide easy interpretability, and the tensor rank can be inferred
from the data. We evaluate our model on several large-scale real-world binary
tensors, achieving excellent computational scalability, and also demonstrate
its usefulness in leveraging side-information provided in form of
mode-network(s).Comment: UAI (Uncertainty in Artificial Intelligence) 201
Non-negative Matrix Factorization for Discrete Data with Hierarchical Side-Information
Abstract We present a probabilistic framework for efficient non-negative matrix factorization of discrete (count/binary) data with sideinformation. The side-information is given as a multi-level structure, taxonomy, or ontology, with nodes at each level being categorical-valued observations. For example, when modeling documents with a twolevel side-information (documents being at level-zero), level-one may represent (one or more) authors associated with each document and level-two may represent affiliations of each author. The model easily generalizes to more than two levels (or taxonomy/ontology of arbitrary depth). Our model can learn embeddings of entities present at each level in the data/sideinformation hierarchy (e.g., documents, authors, affiliations, in the previous example), with appropriate sharing of information across levels. The model also enjoys full local conjugacy, facilitating efficient Gibbs sampling for model inference. Inference cost scales in the number of non-zero entries in the data matrix, which is especially appealing for real-world massive but sparse matrices. We demonstrate the effectiveness of the model on several real-world data sets
Topic-Based Embeddings for Learning from Large Knowledge Graphs
Abstract We present a scalable probabilistic framework for learning from multi-relational data, given in form of entity-relation-entity triplets, with a potentially massive number of entities and relations (e.g., in multirelational networks, knowledge bases, etc.). We define each triplet via a relation-specific bilinear function of the embeddings of entities associated with it (these embeddings correspond to "topics"). To handle massive number of relations and the data sparsity problem (very few observations per relation), we also extend this model to allow sharing of parameters across relations, which leads to a substantial reduction in the number of parameters to be learned. In addition to yielding excellent predictive performance (e.g., for knowledge base completion tasks), the interpretability of our topic-based embedding framework enables easy qualitative analyses. Computational cost of our models scales in the number of positive triplets, which makes it easy to scale to massive realworld multi-relational data sets, which are usually extremely sparse. We develop simpleto-implement batch as well as online Gibbs sampling algorithms and demonstrate the effectiveness of our models on tasks such as multi-relational link-prediction, and learning from large knowledge bases
Recent Advances in the Catalytic Depolymerization of Lignin towards Phenolic Chemicals : A Review
The efficient valorization of lignin could dictate the success of the 2nd generation biorefinery. Lignin, accounting for on average a third of the lignocellulosic biomass, is the most promising candidate for sustainable production of value-added phenolics. However, the structural alteration induced during lignin isolation is often depleting its potential for value-added chemicals. Recently, catalytic reductive depolymerization of lignin has appeared to be a promising and effective method for its valorization to obtain phenolic monomers. The present study systematically summarizes the far-reaching and state-of-the-art lignin valorization strategies during different stages, including conventional catalytic depolymerization of technical lignin, emerging reductive catalytic fractionation of protolignin, stabilization strategies to inhibit the undesired condensation reactions, and further catalytic upgrading of lignin-derived monomers. Finally, the potential challenges for the future researches on the efficient valorization of lignin and possible solutions are proposed
- …