50 research outputs found

    Conception: Multilingually-Enhanced, Human-Readable Concept Vector Representations

    Get PDF
    To date, the most successful word, word sense, and concept modelling techniques have used large corpora and knowledge resources to produce dense vector representations that capture semantic similarities in a relatively low-dimensional space. Most current approaches, however, suffer from a monolingual bias, with their strength depending on the amount of data available across languages. In this paper we address this issue and propose Conception, a novel technique for building language-independent vector representations of concepts which places multilinguality at its core while retaining explicit relationships between concepts. Our approach results in high-coverage representations that outperform the state of the art in multilingual and cross-lingual Semantic Word Similarity and Word Sense Disambiguation, proving particularly robust on low-resource languages. Conception – its software and the complete set of representations – is available at https://github.com/SapienzaNLP/conception

    Spreading semantic information by Word Sense Disambiguation

    Get PDF
    This paper presents an unsupervised approach to solve semantic ambiguity based on the integration of the Personalized PageRank algorithm with word-sense frequency information. Natural Language tasks such as Machine Translation or Recommender Systems are likely to be enriched by our approach, which includes semantic information that obtains the appropriate word-sense via support from two sources: a multidimensional network that includes a set of different resources (i.e. WordNet, WordNet Domains, WordNet Affect, SUMO and Semantic Classes); and the information provided by word-sense frequencies and word-sense collocation from the SemCor Corpus. Our series of results were analyzed and compared against the results of several renowned studies using SensEval-2, SensEval-3 and SemEval-2013 datasets. After conducting several experiments, our procedure produced the best results in the unsupervised procedure category taking SensEval campaigns rankings as reference.This research work has been partially funded by the University of Alicante, Generalitat Valenciana , Spanish Government, Ministerio de Educación, Cultura y Deporte and ASAP - Ayudas Fundación BBVA a equipos de investigación científica 2016(FUNDACIONBBVA2-16PREMIO) through the projects, TIN2015- 65100-R, TIN2015-65136-C2-2-R, PROMETEOII/2014/001, GRE16- 01: “Plataforma inteligente para recuperación, análisis y representación de la información generada por usuarios en Internet” and PR16_SOC_0013

    SemEval-2010 Task 17: All-words Word Sense Disambiguation on a Specific Domain

    Get PDF
    Domain portability and adaptation of NLP components and Word Sense Disambiguation systems present new challenges. The difficulties found by supervised systems to adapt might change the way we assess the strengths and weaknesses of supervised and knowledge-based WSD systems. Unfortunately, all existing evaluation datasets for specific domains are lexical-sample corpora. This task presented all-words datasets on the environment domain for WSD in four languages (Chinese, Dutch, English, Italian). 11 teams participated, with supervised and knowledge-based systems, mainly in the English dataset. The results show that in all languages the participants where able to beat the most frequent sense heuristic as estimated from general corpora. The most successful approaches used some sort of supervision in the form of hand-tagged examples from the domain

    Framing Word Sense Disambiguation as a Multi-Label Problem for Model-Agnostic Knowledge Integration

    Get PDF
    Recent studies treat Word Sense Disambiguation (WSD) as a single-label classification problem in which one is asked to choose only the best-fitting sense for a target word, given its context. However, gold data labelled by expert annotators suggest that maximizing the probability of a single sense may not be the most suitable training objective for WSD, especially if the sense inventory of choice is fine-grained. In this paper, we approach WSD as a multi-label classification problem in which multiple senses can be assigned to each target word. Not only does our simple method bear a closer resemblance to how human annotators disambiguate text, but it can also be seamlessly extended to exploit structured knowledge from semantic networks to achieve state-of-the-art results in English all-words WSD

    Word Domain Disambiguation via Word Sense Disambiguation

    Get PDF
    Word subject domains have been widely used to improve the perform-ance of word sense disambiguation al-gorithms. However, comparatively little effort has been devoted so far to the disambiguation of word subject do-mains. The few existing approaches have focused on the development of al-gorithms specific to word domain dis-ambiguation. In this paper we explore an alternative approach where word domain disambiguation is achieved via word sense disambiguation. Our study shows that this approach yields very strong results, suggesting that word domain disambiguation can be ad-dressed in terms of word sense disam-biguation with no need for special purpose algorithms
    corecore