54 research outputs found

    Boosting the Coverage of a Semantic Lexicon by Automatically Extracted Event Nominalizations

    Get PDF
    International audienceAn important trend in recent works on lexical semantics has been the development of learning methods capable of extracting semantic information from text corpora. The majority of these methods are based on the distributional hypothesis of meaning and acquire semantic information by identifying distributional patterns in texts. In this article, we present a distributional analysis method for extracting nominalization relations from monolingual corpora. The acquisition method makes use of distributional and morphological information to select nominalization candidates. We explain how the learning is performed on a dependency annotated corpus and describe the nominalization results. Furthermore, we show how these results served to enrich an existing lexical resource, the WOLF (Wordnet Libre du Français). We present the techniques that we developed in order to integrate the new information into WOLF, based on both its structure and content. Finally, we evaluate the validity of the automatically obtained information and the correctness of its integration into the semantic resource. The method proved to be useful for boosting the coverage of WOLF and presents the advantage of filling verbal synsets, which are particularly difficult to handle due to the high level of verbal polysemy

    A Survey of Paraphrasing and Textual Entailment Methods

    Full text link
    Paraphrasing methods recognize, generate, or extract phrases, sentences, or longer natural language expressions that convey almost the same information. Textual entailment methods, on the other hand, recognize, generate, or extract pairs of natural language expressions, such that a human who reads (and trusts) the first element of a pair would most likely infer that the other element is also true. Paraphrasing can be seen as bidirectional textual entailment and methods from the two areas are often similar. Both kinds of methods are useful, at least in principle, in a wide range of natural language processing applications, including question answering, summarization, text generation, and machine translation. We summarize key ideas from the two areas by considering in turn recognition, generation, and extraction methods, also pointing to prominent articles and resources.Comment: Technical Report, Natural Language Processing Group, Department of Informatics, Athens University of Economics and Business, Greece, 201

    Enhancing factoid question answering using frame semantic-based approaches

    Get PDF
    FrameNet is used to enhance the performance of semantic QA systems. FrameNet is a linguistic resource that encapsulates Frame Semantics and provides scenario-based generalizations over lexical items that share similar semantic backgrounds.Doctor of Philosoph

    Le système WoDiS - WOlf & DIStributions pour la substitution lexicale

    Get PDF
    International audienceIn this paper we describe the WoDiS system, as entered in the SemDis-TALN2014 lexical substitution shared task. Substitution candidates are generated from the WOLF (WordNet Libre du Français) and are clustered according to the structure of the synsets containing them to reflect the different senses of the target word. These senses are represented in a vector space specific to the target word, based on distributional data extracted from a corpus. This vector space is then mapped to the context with simple topical similarity metrics used in document classification. To overcome the data sparseness problem while representing the less frequent senses, we apply a lexical expansion method which allows to extract a higher number of relevant contexts and to compensate for the bias present in corpus-based distributional vectors. Our system ranked fourth in the final evaluation.Le présent article décrit le système WoDiS pour la tâche de substitution lexicale SemDis-TALN 2014. L'algorithme mis en place exploite le WOLF (WordNet Libre du Français) pour générer des candidats de substitution ainsi que pour induire un regroupement des sens fondé sur la structure des synsets. Un espace vectoriel est ensuite créé pour caractériser les différents sens du mot cible à partir de données distributionnelles extraites d'un corpus. Lors de la désambiguïsation, cet espace est confronté au contexte par des méthodes empruntées au domaine de la classification thématique de documents. Pour surmonter le problème de l'insuffisance des données pour les sens peu fréquents, une expansion lexicale est appliquée au niveau des groupes de sens, qui permet de retrouver davantage de contextes caractéristiques et compenser le biais que présentent les vecteurs de mots induits de corpus. Le système a fini quatrième (sur neuf systèmes soumis) dans l'évaluation

    Native and non-native processing of morphologically complex words in Italian

    Get PDF
    The present work focuses on the organization of the mental lexicon in native and non-native speakers and aims at investigating whether words are connected in the mind in terms of morphological criteria, i.e., through a network of associations establishing when a co-occurrence of form and meaning is found. Psycholinguistic research on native lexical access has demonstrated that morphology indeed underlies the organization of the mental lexicon, even though controversies about the locus of this level of organization remain. On the other hand, research in the field of second language acquisition has only recently turned to investigate such issues and its findings so far have been controversial. Specifically, the debate centers on whether native and non-native speakers share the same processing systems. According to recent proposals (Heyer & Clahsen 2015), this would not be the case and L2 processing would be more affected by formal rather than morphological criteria. In this light, the present work is aimed at verifying the impact of formal characteristics in native and non-native lexical access focusing on the processing of formally transparent versus non-transparent words in Italian. Two morphological phenomena are investigated by means of four psycholinguistic experiments involving a lexical decision task combined with the masked priming paradigm. Experiments 1 & 2 compare the processing of allomorphic vs non-allomorphic derivatives, to investigate whether formal alterations impair the appreciation of the relationship between two morphologically related words. Experiments 3 & 4 are focused on lack of base autonomy found in so-called bound stems, i.e., stems which cannot occur in isolation and are aimed at determining whether the processing of free and bound stems differs. The results of Experiments 1 and 2 indicate that allomorphic variation does not influence the associations established among related words in native speakers, in line with the predictions that can be formulated within usage-based perspectives on language. Non-native speakers, on the other hand, seem to be more pervasively affected by the phonological/orthographical properties of words, but not to the point that transparent morphological relations can be reduced to mere form overlap shared by morphological relatives. Likewise, stem autonomy was not found to affect the way words containing bound and free stems are processed by native speakers, at least under certain conditions, suggesting that boundedness is not an issue influencing the establishment of morphological relationships among words. Non-native speakers, however, were found to be sensitive to the isolability of the stem, in a way that suggests that free bases may be more salient morphological units for them, as opposed to bound stems, which are seemingly more closely associated with orthographic strings resembling each other. Taken together, the findings of the present work suggest a model of the native mental lexicon based on words and morphological schemas emerging from the relationships establishing among them, despite phonological variations and stem boundedness. While it is unclear whether such a system of connections and schemas is equally strong in the non-native lexicon, morphological relationships still appear to drive lexical organization. Crucially, however, such organization is modulated by form, as demonstrated by the effects of phonological variations and lack of base autonomy

    Looking Beyond the Canonical Formulation and Evaluation Paradigm of Prepositional Phrase Attachment

    Get PDF
    Prepositional phrase attachment has long been considered one of the most difficult tasks in automated syntactic parsing of natural language text. In this thesis, we examine several aspects of what has become the dominant view of PP attachment in natural language processing with an eye toward extending this view to a more realistic account of the problem. In particular, we take issue with the manner in which most PP attachment work is evaluated, and the degree to which traditional assumptions and simplifications no longer allow for realistically meaningful assessments. We also argue for looking beyond the canonical subset of attachment problems, where almost all attention has been focused, toward a fuller view of the task, both in terms of the types of ambiguities addressed and the contextual information considered

    The semantic transparency of English compound nouns

    Get PDF
    What is semantic transparency, why is it important, and which factors play a role in its assessment? This work approaches these questions by investigating English compound nouns. The first part of the book gives an overview of semantic transparency in the analysis of compound nouns, discussing its role in models of morphological processing and differentiating it from related notions. After a chapter on the semantic analysis of complex nominals, it closes with a chapter on previous attempts to model semantic transparency. The second part introduces new empirical work on semantic transparency, introducing two different sets of statistical models for compound transparency. In particular, two semantic factors were explored: the semantic relations holding between compound constituents and the role of different readings of the constituents and the whole compound, operationalized in terms of meaning shifts and in terms of the distribution of specifc readings across constituent families. All semantic annotations used in the book are freely available

    The semantic transparency of English compound nouns

    Get PDF
    What is semantic transparency, why is it important, and which factors play a role in its assessment? This work approaches these questions by investigating English compound nouns. The first part of the book gives an overview of semantic transparency in the analysis of compound nouns, discussing its role in models of morphological processing and differentiating it from related notions. After a chapter on the semantic analysis of complex nominals, it closes with a chapter on previous attempts to model semantic transparency. The second part introduces new empirical work on semantic transparency, introducing two different sets of statistical models for compound transparency. In particular, two semantic factors were explored: the semantic relations holding between compound constituents and the role of different readings of the constituents and the whole compound, operationalized in terms of meaning shifts and in terms of the distribution of specifc readings across constituent families

    Native and non-native processing of morphologically complex words in Italian

    Get PDF
    The present work focuses on the organization of the mental lexicon in native and non-native speakers and aims at investigating whether words are connected in the mind in terms of morphological criteria, i.e., through a network of associations establishing when a co-occurrence of form and meaning is found. Psycholinguistic research on native lexical access has demonstrated that morphology indeed underlies the organization of the mental lexicon, even though controversies about the locus of this level of organization remain. On the other hand, research in the field of second language acquisition has only recently turned to investigate such issues and its findings so far have been controversial. Specifically, the debate centers on whether native and non-native speakers share the same processing systems. According to recent proposals (Heyer & Clahsen 2015), this would not be the case and L2 processing would be more affected by formal rather than morphological criteria. In this light, the present work is aimed at verifying the impact of formal characteristics in native and non-native lexical access focusing on the processing of formally transparent versus non-transparent words in Italian. Two morphological phenomena are investigated by means of four psycholinguistic experiments involving a lexical decision task combined with the masked priming paradigm. Experiments 1 & 2 compare the processing of allomorphic vs non-allomorphic derivatives, to investigate whether formal alterations impair the appreciation of the relationship between two morphologically related words. Experiments 3 & 4 are focused on lack of base autonomy found in so-called bound stems, i.e., stems which cannot occur in isolation and are aimed at determining whether the processing of free and bound stems differs. The results of Experiments 1 and 2 indicate that allomorphic variation does not influence the associations established among related words in native speakers, in line with the predictions that can be formulated within usage-based perspectives on language. Non-native speakers, on the other hand, seem to be more pervasively affected by the phonological/orthographical properties of words, but not to the point that transparent morphological relations can be reduced to mere form overlap shared by morphological relatives. Likewise, stem autonomy was not found to affect the way words containing bound and free stems are processed by native speakers, at least under certain conditions, suggesting that boundedness is not an issue influencing the establishment of morphological relationships among words. Non-native speakers, however, were found to be sensitive to the isolability of the stem, in a way that suggests that free bases may be more salient morphological units for them, as opposed to bound stems, which are seemingly more closely associated with orthographic strings resembling each other. Taken together, the findings of the present work suggest a model of the native mental lexicon based on words and morphological schemas emerging from the relationships establishing among them, despite phonological variations and stem boundedness. While it is unclear whether such a system of connections and schemas is equally strong in the non-native lexicon, morphological relationships still appear to drive lexical organization. Crucially, however, such organization is modulated by form, as demonstrated by the effects of phonological variations and lack of base autonomy
    corecore