7,964 research outputs found
Knowledge Graph Embedding with Iterative Guidance from Soft Rules
Embedding knowledge graphs (KGs) into continuous vector spaces is a focus of
current research. Combining such an embedding model with logic rules has
recently attracted increasing attention. Most previous attempts made a one-time
injection of logic rules, ignoring the interactive nature between embedding
learning and logical inference. And they focused only on hard rules, which
always hold with no exception and usually require extensive manual effort to
create or validate. In this paper, we propose Rule-Guided Embedding (RUGE), a
novel paradigm of KG embedding with iterative guidance from soft rules. RUGE
enables an embedding model to learn simultaneously from 1) labeled triples that
have been directly observed in a given KG, 2) unlabeled triples whose labels
are going to be predicted iteratively, and 3) soft rules with various
confidence levels extracted automatically from the KG. In the learning process,
RUGE iteratively queries rules to obtain soft labels for unlabeled triples, and
integrates such newly labeled triples to update the embedding model. Through
this iterative procedure, knowledge embodied in logic rules may be better
transferred into the learned embeddings. We evaluate RUGE in link prediction on
Freebase and YAGO. Experimental results show that: 1) with rule knowledge
injected iteratively, RUGE achieves significant and consistent improvements
over state-of-the-art baselines; and 2) despite their uncertainties,
automatically extracted soft rules are highly beneficial to KG embedding, even
those with moderate confidence levels. The code and data used for this paper
can be obtained from https://github.com/iieir-km/RUGE.Comment: To appear in AAAI 201
Constructing Knowledge Graph for Cybersecurity Education
abstract: There currently exist various challenges in learning cybersecuirty knowledge, along with a shortage of experts in the related areas, while the demand for such talents keeps growing. Unlike other topics related to the computer system such as computer architecture and computer network, cybersecurity is a multidisciplinary topic involving scattered technologies, which yet remains blurry for its future direction. Constructing a knowledge graph (KG) in cybersecurity education is a first step to address the challenges and improve the academic learning efficiency.
With the advancement of big data and Natural Language Processing (NLP) technologies, constructing large KGs and mining concepts, from unstructured text by using learning methodologies, become possible. The NLP-based KG with the semantic similarity between concepts has brought inspiration to different industrial applications, yet far from completeness in the domain expertise, including education in computer science related fields.
In this research work, a KG in cybersecurity area has been constructed using machine-learning-based word embedding (i.e., mapping a word or phrase onto a vector of low dimensions) and hyperlink-based concept mining from the full dataset of words available using the latest Wikipedia dump. The different approaches in corpus training are compared and the performance based on different similarity tasks is evaluated. As a result, the best performance of trained word vectors has been applied, which is obtained by using Skip-Gram model of Word2Vec, to construct the needed KG. In order to improve the efficiency of knowledge learning, a web-based front-end is constructed to visualize the KG, which provides the convenience in browsing related materials and searching for cybersecurity-related concepts and independence relations.Dissertation/ThesisMasters Thesis Computer Science 201
Knowledge Graph Embedding: A Survey from the Perspective of Representation Spaces
Knowledge graph embedding (KGE) is a increasingly popular technique that aims
to represent entities and relations of knowledge graphs into low-dimensional
semantic spaces for a wide spectrum of applications such as link prediction,
knowledge reasoning and knowledge completion. In this paper, we provide a
systematic review of existing KGE techniques based on representation spaces.
Particularly, we build a fine-grained classification to categorise the models
based on three mathematical perspectives of the representation spaces: (1)
Algebraic perspective, (2) Geometric perspective, and (3) Analytical
perspective. We introduce the rigorous definitions of fundamental mathematical
spaces before diving into KGE models and their mathematical properties. We
further discuss different KGE methods over the three categories, as well as
summarise how spatial advantages work over different embedding needs. By
collating the experimental results from downstream tasks, we also explore the
advantages of mathematical space in different scenarios and the reasons behind
them. We further state some promising research directions from a representation
space perspective, with which we hope to inspire researchers to design their
KGE models as well as their related applications with more consideration of
their mathematical space properties.Comment: 32 pages, 6 figure
Improving Items and Contexts Understanding with Descriptive Graph for Conversational Recommendation
State-of-the-art methods on conversational recommender systems (CRS) leverage
external knowledge to enhance both items' and contextual words' representations
to achieve high quality recommendations and responses generation. However, the
representations of the items and words are usually modeled in two separated
semantic spaces, which leads to misalignment issue between them. Consequently,
this will cause the CRS to only achieve a sub-optimal ranking performance,
especially when there is a lack of sufficient information from the user's
input. To address limitations of previous works, we propose a new CRS framework
KLEVER, which jointly models items and their associated contextual words in the
same semantic space. Particularly, we construct an item descriptive graph from
the rich items' textual features, such as item description and categories.
Based on the constructed descriptive graph, KLEVER jointly learns the
embeddings of the words and items, towards enhancing both recommender and
dialog generation modules. Extensive experiments on benchmarking CRS dataset
demonstrate that KLEVER achieves superior performance, especially when the
information from the users' responses is lacking.Comment: 14 pages, 3 figures, 9 table
Pretrained Embeddings for E-commerce Machine Learning: When it Fails and Why?
The use of pretrained embeddings has become widespread in modern e-commerce
machine learning (ML) systems. In practice, however, we have encountered
several key issues when using pretrained embedding in a real-world production
system, many of which cannot be fully explained by current knowledge.
Unfortunately, we find that there is a lack of a thorough understanding of how
pre-trained embeddings work, especially their intrinsic properties and
interactions with downstream tasks. Consequently, it becomes challenging to
make interactive and scalable decisions regarding the use of pre-trained
embeddings in practice.
Our investigation leads to two significant discoveries about using pretrained
embeddings in e-commerce applications. Firstly, we find that the design of the
pretraining and downstream models, particularly how they encode and decode
information via embedding vectors, can have a profound impact. Secondly, we
establish a principled perspective of pre-trained embeddings via the lens of
kernel analysis, which can be used to evaluate their predictability,
interactively and scalably. These findings help to address the practical
challenges we faced and offer valuable guidance for successful adoption of
pretrained embeddings in real-world production. Our conclusions are backed by
solid theoretical reasoning, benchmark experiments, as well as online testings
Thinking outside the graph: scholarly knowledge graph construction leveraging natural language processing
Despite improved digital access to scholarly knowledge in recent decades, scholarly communication remains exclusively document-based.
The document-oriented workflows in science publication have reached the limits of adequacy as highlighted by recent discussions on the increasing proliferation of scientific literature, the deficiency of peer-review and the reproducibility crisis.
In this form, scientific knowledge remains locked in representations that are inadequate for machine processing.
As long as scholarly communication remains in this form, we cannot take advantage of all the advancements taking place in machine learning and natural language processing techniques.
Such techniques would facilitate the transformation from pure text based into (semi-)structured semantic descriptions that are interlinked in a collection of big federated graphs.
We are in dire need for a new age of semantically enabled infrastructure adept at storing, manipulating, and querying scholarly knowledge.
Equally important is a suite of machine assistance tools designed to populate, curate, and explore the resulting scholarly knowledge graph.
In this thesis, we address the issue of constructing a scholarly knowledge graph using natural language processing techniques.
First, we tackle the issue of developing a scholarly knowledge graph for structured scholarly communication, that can be populated and constructed automatically.
We co-design and co-implement the Open Research Knowledge Graph (ORKG), an infrastructure capable of modeling, storing, and automatically curating scholarly communications.
Then, we propose a method to automatically extract information into knowledge graphs.
With Plumber, we create a framework to dynamically compose open information extraction pipelines based on the input text.
Such pipelines are composed from community-created information extraction components in an effort to consolidate individual research contributions under one umbrella.
We further present MORTY as a more targeted approach that leverages automatic text summarization to create from the scholarly article's text structured summaries containing all required information.
In contrast to the pipeline approach, MORTY only extracts the information it is instructed to, making it a more valuable tool for various curation and contribution use cases.
Moreover, we study the problem of knowledge graph completion.
exBERT is able to perform knowledge graph completion tasks such as relation and entity prediction tasks on scholarly knowledge graphs by means of textual triple classification.
Lastly, we use the structured descriptions collected from manual and automated sources alike with a question answering approach that builds on the machine-actionable descriptions in the ORKG.
We propose JarvisQA, a question answering interface operating on tabular views of scholarly knowledge graphs i.e., ORKG comparisons.
JarvisQA is able to answer a variety of natural language questions, and retrieve complex answers on pre-selected sub-graphs.
These contributions are key in the broader agenda of studying the feasibility of natural language processing methods on scholarly knowledge graphs, and lays the foundation of which methods can be used on which cases.
Our work indicates what are the challenges and issues with automatically constructing scholarly knowledge graphs, and opens up future research directions
MNER-QG: An End-to-End MRC framework for Multimodal Named Entity Recognition with Query Grounding
Multimodal named entity recognition (MNER) is a critical step in information
extraction, which aims to detect entity spans and classify them to
corresponding entity types given a sentence-image pair. Existing methods either
(1) obtain named entities with coarse-grained visual clues from attention
mechanisms, or (2) first detect fine-grained visual regions with toolkits and
then recognize named entities. However, they suffer from improper alignment
between entity types and visual regions or error propagation in the two-stage
manner, which finally imports irrelevant visual information into texts. In this
paper, we propose a novel end-to-end framework named MNER-QG that can
simultaneously perform MRC-based multimodal named entity recognition and query
grounding. Specifically, with the assistance of queries, MNER-QG can provide
prior knowledge of entity types and visual regions, and further enhance
representations of both texts and images. To conduct the query grounding task,
we provide manual annotations and weak supervisions that are obtained via
training a highly flexible visual grounding model with transfer learning. We
conduct extensive experiments on two public MNER datasets, Twitter2015 and
Twitter2017. Experimental results show that MNER-QG outperforms the current
state-of-the-art models on the MNER task, and also improves the query grounding
performance.Comment: 13 pages, 6 figures, published to AAA
Referring Expression Comprehension: A Survey of Methods and Datasets
Referring expression comprehension (REC) aims to localize a target object in
an image described by a referring expression phrased in natural language.
Different from the object detection task that queried object labels have been
pre-defined, the REC problem only can observe the queries during the test. It
thus more challenging than a conventional computer vision problem. This task
has attracted a lot of attention from both computer vision and natural language
processing community, and several lines of work have been proposed, from
CNN-RNN model, modular network to complex graph-based model. In this survey, we
first examine the state of the art by comparing modern approaches to the
problem. We classify methods by their mechanism to encode the visual and
textual modalities. In particular, we examine the common approach of joint
embedding images and expressions to a common feature space. We also discuss
modular architectures and graph-based models that interface with structured
graph representation. In the second part of this survey, we review the datasets
available for training and evaluating REC systems. We then group results
according to the datasets, backbone models, settings so that they can be fairly
compared. Finally, we discuss promising future directions for the field, in
particular the compositional referring expression comprehension that requires
longer reasoning chain to address.Comment: Accepted to IEEE TM
- …