32,778 research outputs found
An Interpretable Knowledge Transfer Model for Knowledge Base Completion
Knowledge bases are important resources for a variety of natural language
processing tasks but suffer from incompleteness. We propose a novel embedding
model, \emph{ITransF}, to perform knowledge base completion. Equipped with a
sparse attention mechanism, ITransF discovers hidden concepts of relations and
transfer statistical strength through the sharing of concepts. Moreover, the
learned associations between relations and concepts, which are represented by
sparse attention vectors, can be interpreted easily. We evaluate ITransF on two
benchmark datasets---WN18 and FB15k for knowledge base completion and obtains
improvements on both the mean rank and Hits@10 metrics, over all baselines that
do not use additional information.Comment: Accepted by ACL 2017. Minor updat
edge2vec: Representation learning using edge semantics for biomedical knowledge discovery
Representation learning provides new and powerful graph analytical approaches
and tools for the highly valued data science challenge of mining knowledge
graphs. Since previous graph analytical methods have mostly focused on
homogeneous graphs, an important current challenge is extending this
methodology for richly heterogeneous graphs and knowledge domains. The
biomedical sciences are such a domain, reflecting the complexity of biology,
with entities such as genes, proteins, drugs, diseases, and phenotypes, and
relationships such as gene co-expression, biochemical regulation, and
biomolecular inhibition or activation. Therefore, the semantics of edges and
nodes are critical for representation learning and knowledge discovery in real
world biomedical problems. In this paper, we propose the edge2vec model, which
represents graphs considering edge semantics. An edge-type transition matrix is
trained by an Expectation-Maximization approach, and a stochastic gradient
descent model is employed to learn node embedding on a heterogeneous graph via
the trained transition matrix. edge2vec is validated on three biomedical domain
tasks: biomedical entity classification, compound-gene bioactivity prediction,
and biomedical information retrieval. Results show that by considering
edge-types into node embedding learning in heterogeneous graphs,
\textbf{edge2vec}\ significantly outperforms state-of-the-art models on all
three tasks. We propose this method for its added value relative to existing
graph analytical methodology, and in the real world context of biomedical
knowledge discovery applicability.Comment: 10 page
Purging of untrustworthy recommendations from a grid
In grid computing, trust has massive significance. There is lot of research
to propose various models in providing trusted resource sharing mechanisms. The
trust is a belief or perception that various researchers have tried to
correlate with some computational model. Trust on any entity can be direct or
indirect. Direct trust is the impact of either first impression over the entity
or acquired during some direct interaction. Indirect trust is the trust may be
due to either reputation gained or recommendations received from various
recommenders of a particular domain in a grid or any other domain outside that
grid or outside that grid itself. Unfortunately, malicious indirect trust leads
to the misuse of valuable resources of the grid. This paper proposes the
mechanism of identifying and purging the untrustworthy recommendations in the
grid environment. Through the obtained results, we show the way of purging of
untrustworthy entities.Comment: 8 pages, 4 figures, 1 table published by IJNGN journal; International
Journal of Next-Generation Networks (IJNGN) Vol.3, No.4, December 201
Inferring Missing Entity Type Instances for Knowledge Base Completion: New Dataset and Methods
Most of previous work in knowledge base (KB) completion has focused on the
problem of relation extraction. In this work, we focus on the task of inferring
missing entity type instances in a KB, a fundamental task for KB competition
yet receives little attention. Due to the novelty of this task, we construct a
large-scale dataset and design an automatic evaluation methodology. Our
knowledge base completion method uses information within the existing KB and
external information from Wikipedia. We show that individual methods trained
with a global objective that considers unobserved cells from both the entity
and the type side gives consistently higher quality predictions compared to
baseline methods. We also perform manual evaluation on a small subset of the
data to verify the effectiveness of our knowledge base completion methods and
the correctness of our proposed automatic evaluation method.Comment: North American Chapter of the Association for Computational
Linguistics- Human Language Technologies, 201
MAG: A Multilingual, Knowledge-base Agnostic and Deterministic Entity Linking Approach
Entity linking has recently been the subject of a significant body of
research. Currently, the best performing approaches rely on trained
mono-lingual models. Porting these approaches to other languages is
consequently a difficult endeavor as it requires corresponding training data
and retraining of the models. We address this drawback by presenting a novel
multilingual, knowledge-based agnostic and deterministic approach to entity
linking, dubbed MAG. MAG is based on a combination of context-based retrieval
on structured knowledge bases and graph algorithms. We evaluate MAG on 23 data
sets and in 7 languages. Our results show that the best approach trained on
English datasets (PBOH) achieves a micro F-measure that is up to 4 times worse
on datasets in other languages. MAG, on the other hand, achieves
state-of-the-art performance on English datasets and reaches a micro F-measure
that is up to 0.6 higher than that of PBOH on non-English languages.Comment: Accepted in K-CAP 2017: Knowledge Capture Conferenc
Harvesting Entities from the Web Using Unique Identifiers -- IBEX
In this paper we study the prevalence of unique entity identifiers on the
Web. These are, e.g., ISBNs (for books), GTINs (for commercial products), DOIs
(for documents), email addresses, and others. We show how these identifiers can
be harvested systematically from Web pages, and how they can be associated with
human-readable names for the entities at large scale.
Starting with a simple extraction of identifiers and names from Web pages, we
show how we can use the properties of unique identifiers to filter out noise
and clean up the extraction result on the entire corpus. The end result is a
database of millions of uniquely identified entities of different types, with
an accuracy of 73--96% and a very high coverage compared to existing knowledge
bases. We use this database to compute novel statistics on the presence of
products, people, and other entities on the Web.Comment: 30 pages, 5 figures, 9 tables. Complete technical report for A.
Talaika, J. A. Biega, A. Amarilli, and F. M. Suchanek. IBEX: Harvesting
Entities from the Web Using Unique Identifiers. WebDB workshop, 201
Generating Preview Tables for Entity Graphs
Users are tapping into massive, heterogeneous entity graphs for many
applications. It is challenging to select entity graphs for a particular need,
given abundant datasets from many sources and the oftentimes scarce information
for them. We propose methods to produce preview tables for compact presentation
of important entity types and relationships in entity graphs. The preview
tables assist users in attaining a quick and rough preview of the data. They
can be shown in a limited display space for a user to browse and explore,
before she decides to spend time and resources to fetch and investigate the
complete dataset. We formulate several optimization problems that look for
previews with the highest scores according to intuitive goodness measures,
under various constraints on preview size and distance between preview tables.
The optimization problem under distance constraint is NP-hard. We design a
dynamic-programming algorithm and an Apriori-style algorithm for finding
optimal previews. Results from experiments, comparison with related work and
user studies demonstrated the scoring measures' accuracy and the discovery
algorithms' efficiency.Comment: This is the camera-ready version of a SIGMOD16 paper. There might be
tiny differences in layout, spacing and linebreaking, compared with the
version in the SIGMOD16 proceedings, since we must submit TeX files and use
arXiv to compile the file
WikiM: Metapaths based Wikification of Scientific Abstracts
In order to disseminate the exponential extent of knowledge being produced in
the form of scientific publications, it would be best to design mechanisms that
connect it with already existing rich repository of concepts -- the Wikipedia.
Not only does it make scientific reading simple and easy (by connecting the
involved concepts used in the scientific articles to their Wikipedia
explanations) but also improves the overall quality of the article. In this
paper, we present a novel metapath based method, WikiM, to efficiently wikify
scientific abstracts -- a topic that has been rarely investigated in the
literature. One of the prime motivations for this work comes from the
observation that, wikified abstracts of scientific documents help a reader to
decide better, in comparison to the plain abstracts, whether (s)he would be
interested to read the full article. We perform mention extraction mostly
through traditional tf-idf measures coupled with a set of smart filters. The
entity linking heavily leverages on the rich citation and author publication
networks. Our observation is that various metapaths defined over these networks
can significantly enhance the overall performance of the system. For mention
extraction and entity linking, we outperform most of the competing
state-of-the-art techniques by a large margin arriving at precision values of
72.42% and 73.8% respectively over a dataset from the ACL Anthology Network. In
order to establish the robustness of our scheme, we wikify three other datasets
and get precision values of 63.41%-94.03% and 67.67%-73.29% respectively for
the mention extraction and the entity linking phase
Toward Entity-Aware Search
As the Web has evolved into a data-rich repository, with the standard "page view," current search engines are becoming increasingly inadequate for a wide range of query tasks. While we often search for various data "entities" (e.g., phone number, paper PDF, date), today's engines only take us indirectly to pages. In my Ph.D. study, we focus on a novel type of Web search that is aware of data entities inside pages, a significant departure from traditional document retrieval. We study the various essential aspects of supporting entity-aware Web search. To begin with, we tackle the core challenge of ranking entities, by distilling its underlying conceptual model Impression Model and developing a probabilistic ranking framework, EntityRank, that is able to seamlessly integrate both local and global information in ranking. We also report a prototype system built to show the initial promise of the proposal. Then, we aim at distilling and abstracting the essential computation requirements of entity search. From the dual views of reasoning--entity as input and entity as output, we propose a dual-inversion framework, with two indexing and partition schemes, towards efficient and scalable query processing. Further, to recognize more entity instances, we study the problem of entity synonym discovery through mining query log data. The results we obtained so far have shown clear promise of entity-aware search, in its usefulness, effectiveness, efficiency and scalability
- …