354 research outputs found
Predicting ConceptNet Path Quality Using Crowdsourced Assessments of Naturalness
In many applications, it is important to characterize the way in which two
concepts are semantically related. Knowledge graphs such as ConceptNet provide
a rich source of information for such characterizations by encoding relations
between concepts as edges in a graph. When two concepts are not directly
connected by an edge, their relationship can still be described in terms of the
paths that connect them. Unfortunately, many of these paths are uninformative
and noisy, which means that the success of applications that use such path
features crucially relies on their ability to select high-quality paths. In
existing applications, this path selection process is based on relatively
simple heuristics. In this paper we instead propose to learn to predict path
quality from crowdsourced human assessments. Since we are interested in a
generic task-independent notion of quality, we simply ask human participants to
rank paths according to their subjective assessment of the paths' naturalness,
without attempting to define naturalness or steering the participants towards
particular indicators of quality. We show that a neural network model trained
on these assessments is able to predict human judgments on unseen paths with
near optimal performance. Most notably, we find that the resulting path
selection method is substantially better than the current heuristic approaches
at identifying meaningful paths.Comment: In Proceedings of the Web Conference (WWW) 201
Multilingual Knowledge Base Completion by Cross-lingual Semantic Relation Inference
International audienceIn the present paper, we propose a simple en-dogenous method for enhancing a multilingual knowledge base through the cross-lingual semantic relation inference. It can be run on multilingual resources prior to semantic representation learning. Multilingual knowledge bases may integrate preexisting structured resources available for resource-rich languages. We aim at performing cross-lingual inference on them to improve the low resource language by creating semantic relationships
Unsupervised does not mean uninterpretable : the case for word sense induction and disambiguation
This dataset contains the models for interpretable Word Sense Disambiguation (WSD) that were employed in Panchenko et al. (2017; the paper can be accessed at https://www.lt.informatik.tu-darmstadt.de/fileadmin/user_upload/Group_LangTech/publications/EACL_Interpretability___FINAL__1_.pdf).
The files were computed on a 2015 dump from the English Wikipedia. Their contents:
Induced Sense Inventories: wp_stanford_sense_inventories.tar.gz
This file contains 3 inventories (coarse, medium fine)
Language Model (3-gram): wiki_text.3.arpa.gz
This file contains all n-grams up to n=3 and can be loaded into an index
Weighted Dependency Features: wp_stanford_lemma_LMI_s0.0_w2_f2_wf2_wpfmax1000_wpfmin2_p1000.gz
This file contains weighted word--context-feature combinations and includes their count and an LMI significance score
Distributional Thesaurus (DT) of Dependency Features: wp_stanford_lemma_BIM_LMI_s0.0_w2_f2_wf2_wpfmax1000_wpfmin2_p1000_simsortlimit200_feature expansion.gz
This file contains a DT of context features. The context feature similarities can be used for context expansion
For further information, consult the paper and the companion page: http://jobimtext.org/wsd/
Panchenko A., Ruppert E., Faralli S., Ponzetto S. P., and Biemann C. (2017): Unsupervised Does Not Mean Uninterpretable: The Case for Word Sense Induction and Disambiguation. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics (EACL'2017). Valencia, Spain. Association for Computational Linguistics
Formulaic language
The notion of formulaicity has received increasing attention in disciplines and areas as diverse as linguistics, literary studies, art theory and art history. In recent years, linguistic studies of formulaicity have been flourishing and the very notion of formulaicity has been approached from various methodological and theoretical perspectives and with various purposes in mind. The linguistic approach to formulaicity is still in a state of rapid development and the objective of the current volume is to present the current explorations in the field. Papers collected in the volume make numerous suggestions for further development of the field and they are arranged into three complementary parts. The first part, with three chapters, presents new theoretical and methodological insights as well as their practical application in the development of custom-designed software tools for identification and exploration of formulaic language in texts. Two papers in the second part explore formulaic language in the context of language learning. Finally, the third part, with three chapters, showcases descriptive research on formulaic language conducted primarily from the perspectives of corpus linguistics and translation studies. The volume will be of interest to anyone involved in the study of formulaic language either from a theoretical or a practical perspective
- …