45 research outputs found
Using Pairwise Occurrence Information to Improve Knowledge Graph Completion on Large-Scale Datasets
Bilinear models such as DistMult and ComplEx are effective methods for
knowledge graph (KG) completion. However, they require large batch sizes, which
becomes a performance bottleneck when training on large scale datasets due to
memory constraints. In this paper we use occurrences of entity-relation pairs
in the dataset to construct a joint learning model and to increase the quality
of sampled negatives during training. We show on three standard datasets that
when these two techniques are combined, they give a significant improvement in
performance, especially when the batch size and the number of generated
negative examples are low relative to the size of the dataset. We then apply
our techniques to a dataset containing 2 million entities and demonstrate that
our model outperforms the baseline by 2.8% absolute on [email protected]: 8 pages, 3 figures, accepted at EMNLP 201
Knowledge Base Completion: Baseline strikes back (Again)
Knowledge Base Completion has been a very active area recently, where
multiplicative models have generally outperformed additive and other deep
learning methods -- like GNN, CNN, path-based models. Several recent KBC papers
propose architectural changes, new training methods, or even a new problem
reformulation. They evaluate their methods on standard benchmark datasets -
FB15k, FB15k-237, WN18, WN18RR, and Yago3-10. Recently, some papers discussed
how 1-N scoring can speed up training and evaluation. In this paper, we discuss
how by just applying this training regime to a basic model like Complex gives
near SOTA performance on all the datasets -- we call this model COMPLEX-V2. We
also highlight how various multiplicative methods recently proposed in
literature benefit from this trick and become indistinguishable in terms of
performance on most datasets. This paper calls for a reassessment of their
individual value, in light of these findings
BESS: Balanced Entity Sampling and Sharing for Large-Scale Knowledge Graph Completion
We present the award-winning submission to the WikiKG90Mv2 track of
OGB-LSC@NeurIPS 2022. The task is link-prediction on the large-scale knowledge
graph WikiKG90Mv2, consisting of 90M+ nodes and 600M+ edges. Our solution uses
a diverse ensemble of Knowledge Graph Embedding models combining five
different scoring functions (TransE, TransH, RotatE, DistMult, ComplEx) and two
different loss functions (log-sigmoid, sampled softmax cross-entropy). Each
individual model is trained in parallel on a Graphcore Bow Pod using
BESS (Balanced Entity Sampling and Sharing), a new distribution framework for
KGE training and inference based on balanced collective communications between
workers. Our final model achieves a validation MRR of 0.2922 and a
test-challenge MRR of 0.2562, winning the first place in the competition. The
code is publicly available at:
https://github.com/graphcore/distributed-kge-poplar/tree/2022-ogb-submission.Comment: First place in the WikiKG90Mv2 track of the Open Graph Benchmark
Large-Scale Challenge @NeurIPS202