Neural-symbolic learning for knowledge base completion

Abstract

A query answering task computes the prediction scores of ground queries inferred from a Knowledge Base (KB). Traditional symbolic-based methods solve this task using ‘exact’ provers. However, they are not very scalable and difficult to apply to current large KBs. Sub-symbolic methods have recently been proposed to address this problem. They require to be trained to learn the semantics of the symbolic representation and use it to make predictions about query answering. Such predictions may rely upon unknown rules over the given KB. Not all proposed sub-symbolic systems are capable of inducing rules from the KB; and even more challenging is the learning of rules that are human interpretable. Some approaches, e.g., those based on a Neural Theorem Prover (NTP), are able to address this problem but with limited scalability and expressivity of the rules that they can induce. We take inspiration from the NTP framework and propose three sub-symbolic architectures that solve the query answering task in a scalable manner while supporting the induction of more expressive rules. Two of these architectures, called Topical NTP (TNTP) and Topic-Subdomain NTP (TSNTP), address the scalability aspect. Trained representations of predicates and constants are clustered and the soft-unification of the backward chaining proof procedure that they use is controlled by these clusters. The third architecture, called Negation-as-Failure TSNTP (NAF TSNTP), addresses the expressivity of the induced rules by supporting the learning of rules with negation-as-failure. All these architectures make use of additional hyperparameters that encourage the learning of induced rules during training. Each architecture is evaluated over benchmark datasets with increased complexity in size of the KB, number of predicates and constants present in the KB, and level of incompleteness of the KB with respect to test sets. The evaluation measures the accuracy of query answering prediction and computational time. The former uses two key metrics, AUC_PR and HITS, adopted also by existing sub-symbolic systems that solve the same task, whereas the computational time is in terms of CPU training time. The evaluation performance of our systems is compared against that of existing state-of-the-art sub-symbolic systems, showing that our approaches are indeed in most cases more accurate in solving query answering tasks, whilst being more efficient in computational time. The increased accuracy in some tasks is specifically due to the learning of more expressive rules, thus demonstrating the importance of increased expressivity in rule induction.Open Acces

    Similar works