282 research outputs found

    D6.1: Technologies and Tools for Lexical Acquisition

    Get PDF
    This report describes the technologies and tools to be used for Lexical Acquisition in PANACEA. It includes descriptions of existing technologies and tools which can be built on and improved within PANACEA, as well as of new technologies and tools to be developed and integrated in PANACEA platform. The report also specifies the Lexical Resources to be produced. Four main areas of lexical acquisition are included: Subcategorization frames (SCFs), Selectional Preferences (SPs), Lexical-semantic Classes (LCs), for both nouns and verbs, and Multi-Word Expressions (MWEs)

    Syntaxe computationnelle du hongrois : de l'analyse en chunks à la sous-catégorisation verbale

    Get PDF
    We present the creation of two resources for Hungarian NLP applications: a rule-based shallow parser and a database of verbal subcategorization frames. Hungarian, as a non-configurational language with a rich morphology, presents specific challenges for NLP at the level of morphological and syntactic processing. While efficient and precise morphological analyzers are already available, Hungarian is under-resourced with respect to syntactic analysis. Our work aimed at overcoming this problem by providing resources for syntactic processing. Hungarian language is characterized by a rich morphology and a non-configurational encoding of grammatical functions. These features imply that the syntactic processing of Hungarian has to rely on morphological features rather than on constituent order. The broader interest of our undertaking is to propose representations and methods that are adapted to these specific characteristics, and at the same time are in line with state of the art research methodologies. More concretely, we attempt to adapt current results in argument realization and lexical semantics to the task of labeling sentence constituents according to their syntactic function and semantic role in Hungarian. Syntax and semantics are not completely independent modules in linguistic analysis and language processing: it has been known for decades that semantic properties of words affect their syntactic distribution. Within the syntax-semantics interface, the field of argument realization deals with the (partial or complete) prediction of verbal subcategorization from semantic properties. Research on verbal lexical semantics and semantically motivated mapping has been concentrating on predicting the syntactic realization of arguments, taking for granted (either explicitly or implicitly) that the distinction between arguments and adjuncts is known, and that adjuncts' syntactic realization is governed by productive syntactic rules, not lexical properties. However, besides the correlation between verbal aspect or actionsart and time adverbs (e.g. Vendler, 1967 or Kiefer, 1992 for Hungarian), the distribution of adjuncts among verbs or verb classes did not receive significant attention, especially within the lexical semantics framework. We claim that contrary to the widely shared presumption, adjuncts are often not fully productive. We therefore propose a gradual notion of productivity, defined in relation to Levin-type lexical semantic verb classes (Levin, 1993; Levin and Rappaport-Hovav, 2005). The definition we propose for the argument-adjunct dichotomy is based on evidence from Hungarian and exploits the idea that lexical semantics not only influences complement structure but is the key to the argument-adjunct distinction and the realization of adjunctsLa linguistique informatique est un domaine de recherche qui se concentre sur les méthodes et les perspectives de la modélisation formelle (statistique ou symbolique) de la langue naturelle. La linguistique informatique, tout comme la linguistique théorique, est une discipline fortement modulaire : les niveaux d'analyse linguistique comprennent la segmentation, l'analyse morphologique, la désambiguïsation, l'analyse syntaxique et sémantique. Tandis qu'un nombre d'outils existent déjà pour les traitements de bas niveau (analyse morphologique, étiquetage grammatical), le hongrois peut être considéré comme une langue peu doté pour l'analyse syntaxique et sémantique. Le travail décrit dans la présente thèse vise à combler ce manque en créant des ressources pour le traitement syntaxique du hongrois : notamment, un analyseur en chunks et une base de données lexicale de schémas de sous-catégorisation verbale. La première partie de la recherche présentée ici se concentre sur la création d'un analyseur syntaxique de surface (ou analyseur en chunks) pour le hongrois. La sortie de l'analyseur de surface est conçue pour servir d'entrée pour un traitement ultérieur visant à annoter les relations de dépendance entre le prédicat et ses compléments essentiels et circonstanciels. L'analyseur profond est mis en œuvre dans NooJ (Silberztein, 2004) en tant qu'une cascade de grammaires. Le deuxième objectif de recherche était de proposer une représentation lexicale pour la structure argumentale en hongrois. Cette représentation doit pouvoir gérer la vaste gamme de phénomènes qui échappent à la dichotomie traditionnelle entre un complément essentiel et un circonstanciel (p. ex. des structures partiellement productives, des écarts entre la prédictibilité syntaxique et sémantique). Nous avons eu recours à des résultats de la recherche récente sur la réalisation d'arguments et choisi un cadre qui répond à nos critères et qui est adaptable à une langue non-configurationnelle. Nous avons utilisé la classification sémantique de Levin (1993) comme modèle. Nous avons adapté les notions relatives à cette classification, à savoir celle de la composante sémantique et celle de l'alternance syntaxique, ainsi que la méthodologie d'explorer et de décrire le comportement des prédicats à l'aide de cette représentation, à la tâche de construire une représentation lexicale des verbes dans une langue non-configurationnelle. La première étape consistait à définir les règles de codage et de construire un vaste base de données lexicale pour les verbes et leurs compléments. Par la suite, nous avons entrepris deux expériences pour l'enrichissement de ce lexique avec des informations sémantiques lexicales afin de formaliser des généralisations syntaxiques et sémantiques pertinentes sur les classes de prédicats sous-jacentes. La première approche que nous avons testée consistait en une élaboration manuelle de classification de verbes en fonction de leur structure de compléments et de l'attribution de rôles sémantiques à ces compléments. Nous avons cherché la réponse aux questions suivantes: quelles sont les composants sémantiques pertinents pour définir une classification sémantique des prédicats hongrois? Quelles sont les implications syntaxiques spécifiques à ces classes? Et, plus généralement, quelle est la nature des alternances spécifiques aux classes verbales en hongrois ? Dans la phase finale de la recherche, nous avons étudié le potentiel de l'acquisition automatique pour extraire des classes de verbes à partir de corpus. Nous avons effectué une classification non supervisée, basée sur des données distributionnelles, pour obtenir une classification sémantique pertinente des verbes hongrois. Nous avons également testé la méthode de classification non supervisée sur des données françaises

    German particle verbs: Compositionality at the syntax-semantics interface

    Get PDF
    Particle verbs represent a type of multi-word expression composed of a base verb and a particle. The meaning of the particle verb is often, but not always, derived from the meaning of the base verb, sometimes in quite complex ways. In this work, we computationally assess the levels of German particle verb compositionality, with the use of distributional semantic methods. Our results demonstrate that the prediction of particle verb compositionality is possible with statistical significance. Furthermore, we investigate properties of German particle verbs that are relevant for their compositionality: the particular subcategorization behavior of particle verbs and their corresponding base verbs, and the question in how far the verb particles can be attributed meaning by themselves, which they contribute to the particle verb

    Context Effects in Language Production: Models of Syntactic Priming in Dialogue Corpora

    Get PDF
    Institute for Communicating and Collaborative SystemsThis thesis addresses the cognitive basis of syntactic adaptation, which biases speakers to repeat their own syntactic constructions and those of their conversational partners. I address two types of syntactic adaptation: short-term priming and longterm adaptation. I develop two metrics for syntactic adaptation within a speaker and between speakers in dialogue: one for short-term priming effects that decay quickly, and one for long-term adaptation over the course of a dialogue. Both methods estimate adaptation in large datasets consisting of transcribed human-human dialogue annotated with syntactic information. Two such corpora in English are used: Switchboard, a collection of spontaneous phone conversation, and HCRC Map Task, a set of task-oriented dialogues in which participants describe routes on a map to one another. I find both priming and long-term adaptation in both corpora, confirming well-known experimental results (e.g., Bock, 1986b). I extend prior work by showing that syntactic priming effects not only apply to selected syntactic constructions that are alternative realizations of the same semantics, but still hold when a broad variety of syntactic phrase structure rules are considered. Each rule represents a cognitive decision during syntactic processing. I show that the priming effect for a rule is inversely proportional to its frequency. With this methodology, I test predictions of the Interactive Alignment Model (IAM, Pickering and Garrod, 2004). The IAM claims that linguistic and situation model agreement between interlocutors in dialogue is the result of a cascade of resource-free, mechanistic priming effects on various linguistic levels. I examine task-oriented dialogue in Map Task, which provides a measure of task success through the deviance of the communicated routes on the maps. I find that long term syntactic adaptation predicts communicative success, and it does so earlier than lexical adaptation. The result is applied in a machine-learning based model that estimates task success based on the dialogue, capturing 14 percent of the variance in Map Task. Short-term syntactic priming differs qualitatively from long term adaptation, as it does not predict task success, providing evidence against learning as a single cognitive basis of adaptation effects. I obtain further evidence for the correlation between semantic activity and syntactic priming through a comparison of the Map Task and Switchboard corpora, showing that short-term priming is stronger in task-oriented dialogue than in spontaneous conversation. This difference is evident for priming between and within speakers, which suggests that priming is a mechanistic rather than strategic effect. I turn to an investigation of the level at which syntactic priming influences language production. I establish that the effect applies to structural syntactic decisions as opposed to all surface sequences of lexical categories. To do so, I identify pairs of part-of-speech categories which consistently cross constituent boundaries defined by the phrase structure analyses of the sentences. I show that such distituents are less sensitive to priming than pairs occurring within constituents. Thus, syntactic priming is sensitive to syntactic structure. The notion of constituent structure differs among syntactic models. Combinatory Categorial Grammar (CCG, Steedman, 2000) formalizes flexible constituent structure, accounting a varying degree of incrementality in syntactic sentence planning. I examine whether priming effects can support the predictions of CCG using the Switchboard corpus, which has been annotated with CCG syntax. I confirm the syntactic priming effect for lexical and non-lexical CCG categories, which encode partially satisfied subcategorization frames. I then show that both incremental and normal-form constituent structures exhibit priming, arguing for language production accounts that support flexible incrementality. The empirical results are reflected in a cognitive model of syntactic realization in language production. The model assumes that language production is subject to the same principles and constraints as any other form of cognition and follows the ACT-R framework (Anderson et al., 2004). Its syntactic process implements my empirical results on priming and is based on CCG. Syntactic planning can take place incrementally and non-incrementally. The model is able to generate simple sentences that vary syntactically, similar to the materials used in the experimental priming literature. Syntactic adaptation emerges due to a preferential and sped-up memory retrieval of syntactic categories describing linearization and subcategorization requirements. Long-term adaptation is explained as a form of learning, while shortterm priming is the result of a combination of learning and spreading activation from semantic and lexical material. Simulations show that the model produces the adaptation effects and their inverse frequency interaction, as well as cumulativity of long-term adaptation

    Examining inter-sentential influences on predicted verb subcategorization

    Get PDF
    This study investigated the influences of prior discourse context and cumulative syntactic priming on readers' predictions for verb subcategorizations. An additional aim was to determine whether cumulative syntactic priming has the same degree of influence following coherent discourse contexts as when following series of unrelated sentences. Participants (N = 40) read sentences using a self-paced, sentence-by-sentence procedure. Half of these sentences comprised a coherent discourse context intended to increase the expectation for a sentential complement (S) completion. The other half consisted of scrambled sentences. The trials in both conditions varied according to the proportion of verbs that resolved to an S (either 6S or 2S). Following each condition, participants read temporarily ambiguous sentences that resolved to an S. Reading times across the disambiguating and postdisambiguating regions were measured. No significant main effects or interactions were found for either region. However, the lack of significant findings for these analyses may have been due to low power. In a follow-up analysis, data from each gender were analyzed separately. For the data contributed by males, there were no significant findings. For the data contributed by females, the effect of coherence was significant (by participants but not by items) across the postdisambiguating region, and there was a marginally significant interaction (p =.05) between coherence and frequency across this region suggesting that discourse-level information may differentially influence the local sentence processing of female and male participant

    Information and Incrementality in Syntactic Bootstrapping

    Get PDF
    Some words are harder to learn than others. For instance, action verbs like "run" and "hit" are learned earlier than propositional attitude verbs like "think" and "want." One reason "think" and "want" might be learned later is that, whereas we can see and hear running and hitting, we can't see or hear thinking and wanting. Children nevertheless learn these verbs, so a route other than the senses must exist. There is mounting evidence that this route involves, in large part, inferences based on the distribution of syntactic contexts a propositional attitude verb occurs in---a process known as "syntactic bootstrapping." This fact makes the domain of propositional attitude verbs a prime proving ground for models of syntactic bootstrapping. With this in mind, this dissertation has two goals: on the one hand, it aims to construct a computational model of syntactic bootstrapping; on the other, it aims to use this model to investigate the limits on the amount of information about propositional attitude verb meanings that can be gleaned from syntactic distributions. I show throughout the dissertation that these goals are mutually supportive. In Chapter 1, I set out the main problems that drive the investigation. In Chapters 2 and 3, I use both psycholinguistic experiments and computational modeling to establish that there is a significant amount of semantic information carried in both participants' syntactic acceptability judgments and syntactic distributions in corpora. To investigate the nature of this relationship I develop two computational models: (i) a nonnegative model of (semantic-to-syntactic) projection and (ii) a nonnegative model of syntactic bootstrapping. In Chapter 4, I use a novel variant of the Human Simulation Paradigm to show that the information carried in syntactic distribution is actually utilized by (simulated) learners. In Chapter 5, I present a proposal for how to solve a standing problem in how syntactic bootstrapping accounts for certain kinds of cross-linguistic variation. And in Chapter 6, I conclude with future directions for this work
    corecore