275 research outputs found

    Disambiguating Nouns, Verbs, and Adjectives Using Automatically Acquired Selectional Preferences

    Get PDF
    Selectional preferences have been used by word sense disambiguation (WSD) systems as one source of disambiguating information. We evaluate WSD using selectional preferences acquired for English adjective—noun, subject, and direct object grammatical relationships with respect to a standard test corpus. The selectional preferences are specific to verb or adjective classes, rather than individual word forms, so they can be used to disambiguate the co-occurring adjectives and verbs, rather than just the nominal argument heads. We also investigate use of the one-senseper-discourse heuristic to propagate a sense tag for a word to other occurrences of the same word within the current document in order to increase coverage. Although the preferences perform well in comparison with other unsupervised WSD systems on the same corpus, the results show that for many applications, further knowledge sources would be required to achieve an adequate level of accuracy and coverage. In addition to quantifying performance, we analyze the results to investigate the situations in which the selectional preferences achieve the best precision and in which the one-sense-per-discourse heuristic increases performance

    Assessing the contribution of shallow and deep knowledge sources for word sense disambiguation

    No full text
    Corpus-based techniques have proved to be very beneficial in the development of efficient and accurate approaches to word sense disambiguation (WSD) despite the fact that they generally represent relatively shallow knowledge. It has always been thought, however, that WSD could also benefit from deeper knowledge sources. We describe a novel approach to WSD using inductive logic programming to learn theories from first-order logic representations that allows corpus-based evidence to be combined with any kind of background knowledge. This approach has been shown to be effective over several disambiguation tasks using a combination of deep and shallow knowledge sources. Is it important to understand the contribution of the various knowledge sources used in such a system. This paper investigates the contribution of nine knowledge sources to the performance of the disambiguation models produced for the SemEval-2007 English lexical sample task. The outcome of this analysis will assist future work on WSD in concentrating on the most useful knowledge sources

    The interaction of knowledge sources in word sense disambiguation

    Get PDF
    Word sense disambiguation (WSD) is a computational linguistics task likely to benefit from the tradition of combining different knowledge sources in artificial in telligence research. An important step in the exploration of this hypothesis is to determine which linguistic knowledge sources are most useful and whether their combination leads to improved results. We present a sense tagger which uses several knowledge sources. Tested accuracy exceeds 94% on our evaluation corpus.Our system attempts to disambiguate all content words in running text rather than limiting itself to treating a restricted vocabulary of words. It is argued that this approach is more likely to assist the creation of practical systems

    SemEval-2007 Task 16: evaluation of wide coverage knowledge resources

    Get PDF
    This task tries to establish the relative quality of available semantic resources (derived by manual or automatic means). The quality of each large-scale knowledge resource is indirectly evaluated on a Word Sense Disambiguation task. In particular, we use Senseval-3 and SemEval-2007 English Lexical Sample tasks as evaluation bechmarks to evaluate the relative quality of each resource. Furthermore, trying to be as neutral as possible with respect the knowledge bases studied, we apply systematically the same disambiguation method to all the resources. A completely different behaviour is observed on both lexical data sets (Senseval-3 and SemEval-2007).Peer ReviewedPostprint (author’s final draft

    KnowNet: A proposal for building highly connected and dense knowledge bases from the web

    Get PDF
    This paper presents a new fully automatic method for building highly dense and accurate knowledge bases from existing semantic resources. Basically, the method uses a wide-coverage and accurate nowledge-based Word Sense Disambiguation algorithm to assign the most appropriate senses to large sets of topically related words acquired from the web. KnowNet, the resulting knowledge-base which connects large sets of semantically-related concepts is a major step towards the autonomous acquisition of knowledge from raw corpora. In fact, KnowNet is several times larger than any available knowledge resource encoding relations between synsets, and the knowledge KnowNet contains outperform any other resource when is empirically evaluated in a common multilingual framework.Peer ReviewedPreprint (author's version

    Evaluating large-scale knowledge resources across languages

    Get PDF
    This paper presents an empirical evaluation in a multilingual scenario of the semantic knowledge present on publicly available large-scale knowledge resources. The study covers a wide range of manually and automatically derived large-scale knowledge resources for English and Spanish. In order to establish a fair and neutral comparison, the knowledge resources are evaluated using the same method on two Word Sense Disambiguation tasks (Senseval-3 English and Spanish Lexical Sample Tasks). First, this study empirically demonstrates that the combination of the knowledge contained in these resources surpass the most frequent sense classi er for English. Second, we also show that this large-scale topical knowledge acquired from one language can be successfully ported to other languages.Peer ReviewedPostprint (author’s final draft

    Verb similarity: comparing corpus and psycholinguistic data

    Get PDF
    Similarity, which plays a key role in fields like cognitive science, psycholinguistics and natural language processing, is a broad and multifaceted concept. In this work we analyse how two approaches that belong to different perspectives, the corpus view and the psycholinguistic view, articulate similarity between verb senses in Spanish. Specifically, we compare the similarity between verb senses based on their argument structure, which is captured through semantic roles, with their similarity defined by word associations. We address the question of whether verb argument structure, which reflects the expression of the events, and word associations, which are related to the speakers' organization of the mental lexicon, shape similarity between verbs in a congruent manner, a topic which has not been explored previously. While we find significant correlations between verb sense similarities obtained from these two approaches, our findings also highlight some discrepancies between them and the importance of the degree of abstraction of the corpus annotation and psycholinguistic representations.La similitud, que desempeña un papel clave en campos como la ciencia cognitiva, la psicolingüística y el procesamiento del lenguaje natural, es un concepto amplio y multifacético. En este trabajo analizamos cómo dos enfoques que pertenecen a diferentes perspectivas, la visión del corpus y la visión psicolingüística, articulan la semejanza entre los sentidos verbales en español. Específicamente, comparamos la similitud entre los sentidos verbales basados en su estructura argumental, que se capta a través de roles semánticos, con su similitud definida por las asociaciones de palabras. Abordamos la cuestión de si la estructura del argumento verbal, que refleja la expresión de los acontecimientos, y las asociaciones de palabras, que están relacionadas con la organización de los hablantes del léxico mental, forman similitud entre los verbos de una manera congruente, un tema que no ha sido explorado previamente. Mientras que encontramos correlaciones significativas entre las similitudes de los sentidos verbales obtenidas de estos dos enfoques, nuestros hallazgos también resaltan algunas discrepancias entre ellos y la importancia del grado de abstracción de la anotación del corpus y las representaciones psicolingüísticas.La similitud, que exerceix un paper clau en camps com la ciència cognitiva, la psicolingüística i el processament del llenguatge natural, és un concepte ampli i multifacètic. En aquest treball analitzem com dos enfocaments que pertanyen a diferents perspectives, la visió del corpus i la visió psicolingüística, articulen la semblança entre els sentits verbals en espanyol. Específicament, comparem la similitud entre els sentits verbals basats en la seva estructura argumental, que es capta a través de rols semàntics, amb la seva similitud definida per les associacions de paraules. Abordem la qüestió de si l'estructura de l'argument verbal, que reflecteix l'expressió dels esdeveniments, i les associacions de paraules, que estan relacionades amb l'organització dels parlants del lèxic mental, formen similitud entre els verbs d'una manera congruent, un tema que no ha estat explorat prèviament. Mentre que trobem correlacions significatives entre les similituds dels sentits verbals obtingudes d'aquests dos enfocaments, les nostres troballes també ressalten algunes discrepàncies entre ells i la importància del grau d'abstracció de l'anotació del corpus i les representacions psicolingüístiques

    D6.1: Technologies and Tools for Lexical Acquisition

    Get PDF
    This report describes the technologies and tools to be used for Lexical Acquisition in PANACEA. It includes descriptions of existing technologies and tools which can be built on and improved within PANACEA, as well as of new technologies and tools to be developed and integrated in PANACEA platform. The report also specifies the Lexical Resources to be produced. Four main areas of lexical acquisition are included: Subcategorization frames (SCFs), Selectional Preferences (SPs), Lexical-semantic Classes (LCs), for both nouns and verbs, and Multi-Word Expressions (MWEs)
    corecore