200 research outputs found
A Review of Relational Machine Learning for Knowledge Graphs
Relational machine learning studies methods for the statistical analysis of relational, or graph-structured, data. In this paper, we provide a review of how such statistical models can be “trained” on large knowledge graphs, and then used to predict new facts about the world (which is equivalent to predicting new edges in the graph). In particular, we discuss two different kinds of statistical relational models, both of which can scale to massive datasets. The first is based on tensor factorization methods and related latent variable models. The second is based on mining observable patterns in the graph. We also show how to combine these latent and observable models to get improved modeling power at decreased computational cost. Finally, we discuss how such statistical models of graphs can be combined with text-based information extraction methods for automatically constructing knowledge graphs from the Web. In particular, we discuss Google’s Knowledge Vault project.This work was supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF - 1231216
Coarse-Graining with Equivariant Neural Networks: A Path Towards Accurate and Data-Efficient Models
Machine learning has recently entered into the mainstream of coarse-grained
(CG) molecular modeling and simulation. While a variety of methods for
incorporating deep learning into these models exist, many of them involve
training neural networks to act directly as the CG force field. This has
several benefits, the most significant of which is accuracy. Neural networks
can inherently incorporate multi-body effects during the calculation of CG
forces, and a well-trained neural network force field outperforms pairwise
basis sets generated from essentially any methodology. However, this comes at a
significant cost. First, these models are typically slower than pairwise force
fields even when accounting for specialized hardware which accelerates the
training and integration of such networks. The second, and the focus of this
paper, is the need for the considerable amount of data needed to train such
force fields. It is common to use tens of microseconds of molecular dynamics
data to train a single CG model, which approaches the point of eliminating the
CG models usefulness in the first place. As we investigate in this work, it is
apparent that this data-hunger trap from neural networks for predicting
molecular energies and forces is caused in large part by the difficulty in
learning force equivariance, i.e., the fact that force vectors should rotate
while maintaining their magnitude in response to an equivalent rotation of the
system. We demonstrate that for CG water, networks that inherently incorporate
this equivariance into their embedding can produce functional models using
datasets as small as a single frame of reference data, which networks without
inherent symmetry equivariance cannot
Predicting Pair Correlation Functions of Glasses using Machine Learning
Glasses offer a broad range of tunable thermophysical properties that are
linked to their compositions. However, it is challenging to establish a
universal composition-property relation of glasses due to their enormous
composition and chemical space. Here, we address this problem and develop a
metamodel of composition-atomistic structure relation of a class of glassy
material via a machine learning (ML) approach. Within this ML framework, an
unsupervised deep learning technique, viz. convolutional neural network (CNN)
autoencoder, and a regression algorithm, viz. random forest (RF), are
integrated into a fully automated pipeline to predict the spatial distribution
of atoms in a glass. The RF regression model predicts the pair correlation
function of a glass in a latent space. Subsequently, the decoder of the CNN
converts the latent space representation to the actual pair correlation
function of the given glass. The atomistic structures of silicate (SiO2) and
sodium borosilicate (NBS) based glasses with varying compositions and dopants
are collected from molecular dynamics (MD) simulations to establish and
validate this ML pipeline. The model is found to predict the atom pair
correlation function for many unknown glasses very accurately. This method is
very generic and can accelerate the design, discovery, and fundamental
understanding of composition-atomistic structure relations of glasses and other
materials
A Survey on Knowledge Graphs: Representation, Acquisition and Applications
Human knowledge provides a formal understanding of the world. Knowledge
graphs that represent structural relations between entities have become an
increasingly popular research direction towards cognition and human-level
intelligence. In this survey, we provide a comprehensive review of knowledge
graph covering overall research topics about 1) knowledge graph representation
learning, 2) knowledge acquisition and completion, 3) temporal knowledge graph,
and 4) knowledge-aware applications, and summarize recent breakthroughs and
perspective directions to facilitate future research. We propose a full-view
categorization and new taxonomies on these topics. Knowledge graph embedding is
organized from four aspects of representation space, scoring function, encoding
models, and auxiliary information. For knowledge acquisition, especially
knowledge graph completion, embedding methods, path inference, and logical rule
reasoning, are reviewed. We further explore several emerging topics, including
meta relational learning, commonsense reasoning, and temporal knowledge graphs.
To facilitate future research on knowledge graphs, we also provide a curated
collection of datasets and open-source libraries on different tasks. In the
end, we have a thorough outlook on several promising research directions
How to Retrain Recommender System? A Sequential Meta-Learning Method
Practical recommender systems need be periodically retrained to refresh the
model with new interaction data. To pursue high model fidelity, it is usually
desirable to retrain the model on both historical and new data, since it can
account for both long-term and short-term user preference. However, a full
model retraining could be very time-consuming and memory-costly, especially
when the scale of historical data is large. In this work, we study the model
retraining mechanism for recommender systems, a topic of high practical values
but has been relatively little explored in the research community.
Our first belief is that retraining the model on historical data is
unnecessary, since the model has been trained on it before. Nevertheless,
normal training on new data only may easily cause overfitting and forgetting
issues, since the new data is of a smaller scale and contains fewer information
on long-term user preference. To address this dilemma, we propose a new
training method, aiming to abandon the historical data during retraining
through learning to transfer the past training experience. Specifically, we
design a neural network-based transfer component, which transforms the old
model to a new model that is tailored for future recommendations. To learn the
transfer component well, we optimize the "future performance" -- i.e., the
recommendation accuracy evaluated in the next time period. Our Sequential
Meta-Learning(SML) method offers a general training paradigm that is applicable
to any differentiable model. We demonstrate SML on matrix factorization and
conduct experiments on two real-world datasets. Empirical results show that SML
not only achieves significant speed-up, but also outperforms the full model
retraining in recommendation accuracy, validating the effectiveness of our
proposals. We release our codes at: https://github.com/zyang1580/SML.Comment: Appear in SIGIR 202
Predicting Drug-Drug Interactions Using Knowledge Graphs
In the last decades, people have been consuming and combining more drugs than
before, increasing the number of Drug-Drug Interactions (DDIs). To predict
unknown DDIs, recently, studies started incorporating Knowledge Graphs (KGs)
since they are able to capture the relationships among entities providing
better drug representations than using a single drug property. In this paper,
we propose the medicX end-to-end framework that integrates several drug
features from public drug repositories into a KG and embeds the nodes in the
graph using various translation, factorisation and Neural Network (NN) based KG
Embedding (KGE) methods. Ultimately, we use a Machine Learning (ML) algorithm
that predicts unknown DDIs. Among the different translation and
factorisation-based KGE models, we found that the best performing combination
was the ComplEx embedding method with a Long Short-Term Memory (LSTM) network,
which obtained an F1-score of 95.19% on a dataset based on the DDIs found in
DrugBank version 5.1.8. This score is 5.61% better than the state-of-the-art
model DeepDDI. Additionally, we also developed a graph auto-encoder model that
uses a Graph Neural Network (GNN), which achieved an F1-score of 91.94%.
Consequently, GNNs have demonstrated a stronger ability to mine the underlying
semantics of the KG than the ComplEx model, and thus using higher dimension
embeddings within the GNN can lead to state-of-the-art performance
- …