9,875 research outputs found
Information Extraction from Scientific Literature for Method Recommendation
As a research community grows, more and more papers are published each year.
As a result there is increasing demand for improved methods for finding
relevant papers, automatically understanding the key ideas and recommending
potential methods for a target problem. Despite advances in search engines, it
is still hard to identify new technologies according to a researcher's need.
Due to the large variety of domains and extremely limited annotated resources,
there has been relatively little work on leveraging natural language processing
in scientific recommendation. In this proposal, we aim at making scientific
recommendations by extracting scientific terms from a large collection of
scientific papers and organizing the terms into a knowledge graph. In
preliminary work, we trained a scientific term extractor using a small amount
of annotated data and obtained state-of-the-art performance by leveraging large
amount of unannotated papers through applying multiple semi-supervised
approaches. We propose to construct a knowledge graph in a way that can make
minimal use of hand annotated data, using only the extracted terms,
unsupervised relational signals such as co-occurrence, and structural external
resources such as Wikipedia. Latent relations between scientific terms can be
learned from the graph. Recommendations will be made through graph inference
for both observed and unobserved relational pairs.Comment: Thesis Proposal. arXiv admin note: text overlap with arXiv:1708.0607
Text Generation from Knowledge Graphs with Graph Transformers
Generating texts which express complex ideas spanning multiple sentences
requires a structured representation of their content (document plan), but
these representations are prohibitively expensive to manually produce. In this
work, we address the problem of generating coherent multi-sentence texts from
the output of an information extraction system, and in particular a knowledge
graph. Graphical knowledge representations are ubiquitous in computing, but
pose a significant challenge for text generation techniques due to their
non-hierarchical nature, collapsing of long-distance dependencies, and
structural variety. We introduce a novel graph transforming encoder which can
leverage the relational structure of such knowledge graphs without imposing
linearization or hierarchical constraints. Incorporated into an encoder-decoder
setup, we provide an end-to-end trainable system for graph-to-text generation
that we apply to the domain of scientific text. Automatic and human evaluations
show that our technique produces more informative texts which exhibit better
document structure than competitive encoder-decoder methods.Comment: Accepted as a long paper in NAACL 201
A Comprehensive Survey of Graph Embedding: Problems, Techniques and Applications
Graph is an important data representation which appears in a wide diversity
of real-world scenarios. Effective graph analytics provides users a deeper
understanding of what is behind the data, and thus can benefit a lot of useful
applications such as node classification, node recommendation, link prediction,
etc. However, most graph analytics methods suffer the high computation and
space cost. Graph embedding is an effective yet efficient way to solve the
graph analytics problem. It converts the graph data into a low dimensional
space in which the graph structural information and graph properties are
maximally preserved. In this survey, we conduct a comprehensive review of the
literature in graph embedding. We first introduce the formal definition of
graph embedding as well as the related concepts. After that, we propose two
taxonomies of graph embedding which correspond to what challenges exist in
different graph embedding problem settings and how the existing work address
these challenges in their solutions. Finally, we summarize the applications
that graph embedding enables and suggest four promising future research
directions in terms of computation efficiency, problem settings, techniques and
application scenarios.Comment: A 20-page comprehensive survey of graph/network embedding for over
150+ papers till year 2018. It provides systematic categorization of
problems, techniques and applications. Accepted by IEEE Transactions on
Knowledge and Data Engineering (TKDE). Comments and suggestions are welcomed
for continuously improving this surve
Entity Embeddings with Conceptual Subspaces as a Basis for Plausible Reasoning
Conceptual spaces are geometric representations of conceptual knowledge, in
which entities correspond to points, natural properties correspond to convex
regions, and the dimensions of the space correspond to salient features. While
conceptual spaces enable elegant models of various cognitive phenomena, the
lack of automated methods for constructing such representations have so far
limited their application in artificial intelligence. To address this issue, we
propose a method which learns a vector-space embedding of entities from
Wikipedia and constrains this embedding such that entities of the same semantic
type are located in some lower-dimensional subspace. We experimentally
demonstrate the usefulness of these subspaces as (approximate) conceptual space
representations by showing, among others, that important features can be
modelled as directions and that natural properties tend to correspond to convex
regions
Cross-lingual Entity Alignment via Joint Attribute-Preserving Embedding
Entity alignment is the task of finding entities in two knowledge bases (KBs)
that represent the same real-world object. When facing KBs in different natural
languages, conventional cross-lingual entity alignment methods rely on machine
translation to eliminate the language barriers. These approaches often suffer
from the uneven quality of translations between languages. While recent
embedding-based techniques encode entities and relationships in KBs and do not
need machine translation for cross-lingual entity alignment, a significant
number of attributes remain largely unexplored. In this paper, we propose a
joint attribute-preserving embedding model for cross-lingual entity alignment.
It jointly embeds the structures of two KBs into a unified vector space and
further refines it by leveraging attribute correlations in the KBs. Our
experimental results on real-world datasets show that this approach
significantly outperforms the state-of-the-art embedding approaches for
cross-lingual entity alignment and could be complemented with methods based on
machine translation
From Images to Sentences through Scene Description Graphs using Commonsense Reasoning and Knowledge
In this paper we propose the construction of linguistic descriptions of
images. This is achieved through the extraction of scene description graphs
(SDGs) from visual scenes using an automatically constructed knowledge base.
SDGs are constructed using both vision and reasoning. Specifically, commonsense
reasoning is applied on (a) detections obtained from existing perception
methods on given images, (b) a "commonsense" knowledge base constructed using
natural language processing of image annotations and (c) lexical ontological
knowledge from resources such as WordNet. Amazon Mechanical Turk(AMT)-based
evaluations on Flickr8k, Flickr30k and MS-COCO datasets show that in most
cases, sentences auto-constructed from SDGs obtained by our method give a more
relevant and thorough description of an image than a recent state-of-the-art
image caption based approach. Our Image-Sentence Alignment Evaluation results
are also comparable to that of the recent state-of-the art approaches
node2bits: Compact Time- and Attribute-aware Node Representations for User Stitching
Identity stitching, the task of identifying and matching various online
references (e.g., sessions over different devices and timespans) to the same
user in real-world web services, is crucial for personalization and
recommendations. However, traditional user stitching approaches, such as
grouping or blocking, require quadratic pairwise comparisons between a massive
number of user activities, thus posing both computational and storage
challenges. Recent works, which are often application-specific, heuristically
seek to reduce the amount of comparisons, but they suffer from low precision
and recall. To solve the problem in an application-independent way, we take a
heterogeneous network-based approach in which users (nodes) interact with
content (e.g., sessions, websites), and may have attributes (e.g., location).
We propose node2bits, an efficient framework that represents multi-dimensional
features of node contexts with binary hashcodes. node2bits leverages
feature-based temporal walks to encapsulate short- and long-term interactions
between nodes in heterogeneous web networks, and adopts SimHash to obtain
compact, binary representations and avoid the quadratic complexity for
similarity search. Extensive experiments on large-scale real networks show that
node2bits outperforms traditional techniques and existing works that generate
real-valued embeddings by up to 5.16% in F1 score on user stitching, while
taking only up to 1.56% as much storage
Knowledge Graph Alignment using String Edit Distance
In this work, we propose a novel knowledge graph alignment technique based
upon string edit distance that exploits the type information between entities
and can find similarity between relations of any arityComment: Position Pape
Collaborative Adversarial Learning for RelationalLearning on Multiple Bipartite Graphs
Relational learning aims to make relation inference by exploiting the
correlations among different types of entities. Exploring relational learning
on multiple bipartite graphs has been receiving attention because of its
popular applications such as recommendations. How to make efficient relation
inference with few observed links is the main problem on multiple bipartite
graphs. Most existing approaches attempt to solve the sparsity problem via
learning shared representations to integrate knowledge from multi-source data
for shared entities. However, they merely model the correlations from one
aspect (e.g. distribution, representation), and cannot impose sufficient
constraints on different relations of the shared entities. One effective way of
modeling the multi-domain data is to learn the joint distribution of the shared
entities across domains.In this paper, we propose Collaborative Adversarial
Learning (CAL) that explicitly models the joint distribution of the shared
entities across multiple bipartite graphs. The objective of CAL is formulated
from a variational lower bound that maximizes the joint log-likelihoods of the
observations. In particular, CAL consists of distribution-level and
feature-level alignments for knowledge from multiple bipartite graphs. The
two-level alignment acts as two different constraints on different relations of
the shared entities and facilitates better knowledge transfer for relational
learning on multiple bipartite graphs. Extensive experiments on two real-world
datasets have shown that the proposed model outperforms the existing methods.Comment: 8 pages. It has been accepted by IEEE International Conference on
Knowledge Graphs (ICKG) 202
- …