5,102 research outputs found
Context-aware Path Ranking for Knowledge Base Completion
Knowledge base (KB) completion aims to infer missing facts from existing ones
in a KB. Among various approaches, path ranking (PR) algorithms have received
increasing attention in recent years. PR algorithms enumerate paths between
entity pairs in a KB and use those paths as features to train a model for
missing fact prediction. Due to their good performances and high model
interpretability, several methods have been proposed. However, most existing
methods suffer from scalability (high RAM consumption) and feature explosion
(trains on an exponentially large number of features) problems. This paper
proposes a Context-aware Path Ranking (C-PR) algorithm to solve these problems
by introducing a selective path exploration strategy. C-PR learns global
semantics of entities in the KB using word embedding and leverages the
knowledge of entity semantics to enumerate contextually relevant paths using
bidirectional random walk. Experimental results on three large KBs show that
the path features (fewer in number) discovered by C-PR not only improve
predictive performance but also are more interpretable than existing baselines
Fighting with the Sparsity of Synonymy Dictionaries
Graph-based synset induction methods, such as MaxMax and Watset, induce
synsets by performing a global clustering of a synonymy graph. However, such
methods are sensitive to the structure of the input synonymy graph: sparseness
of the input dictionary can substantially reduce the quality of the extracted
synsets. In this paper, we propose two different approaches designed to
alleviate the incompleteness of the input dictionaries. The first one performs
a pre-processing of the graph by adding missing edges, while the second one
performs a post-processing by merging similar synset clusters. We evaluate
these approaches on two datasets for the Russian language and discuss their
impact on the performance of synset induction methods. Finally, we perform an
extensive error analysis of each approach and discuss prominent alternative
methods for coping with the problem of the sparsity of the synonymy
dictionaries.Comment: In Proceedings of the 6th Conference on Analysis of Images, Social
Networks, and Texts (AIST'2017): Springer Lecture Notes in Computer Science
(LNCS
Russian Lexicographic Landscape: a Tale of 12 Dictionaries
The paper reports on quantitative analysis of 12 Russian dictionaries at three levels: 1) headwords: The size and overlap of word lists, coverage of large corpora, and presence of neologisms; 2) synonyms: Overlap of synsets in different dictionaries; 3) definitions: Distribution of definition lengths and numbers of senses, as well as textual similarity of same-headword definitions in different dictionaries. The total amount of data in the study is 805,900 dictionary entries, 892,900 definitions, and 84,500 synsets. The study reveals multiple connections and mutual influences between dictionaries, uncovers differences in modern electronic vs. traditional printed resources, as well as suggests directions for development of new and improvement of existing lexical semantic resources
The Effectiveness of Concept Based Search for Video Retrieval
In this paper we investigate how a small number of high-level concepts\ud
derived for video shots, such as Sport. Face.Indoor. etc., can be used effectively for ad hoc search in video material. We will answer the following questions: 1) Can we automatically construct concept queries from ordinary text queries? 2) What is the best way to combine evidence from single concept detectors into final search results? We evaluated algorithms for automatic concept query formulation using WordNet based concept extraction, and we evaluated algorithms for fast, on-line combination of concepts. Experimental results on data from the TREC Video 2005 workshop and 25 test users show the following. 1) Automatic query formulation through WordNet based concept extraction can achieve comparable results to user created query concepts and 2) Combination methods that take neighboring shots into account outperform more simple combination methods
- …