282 research outputs found
D6.1: Technologies and Tools for Lexical Acquisition
This report describes the technologies and tools to be used for Lexical Acquisition in PANACEA. It includes descriptions of existing technologies and tools which can be built on and improved within PANACEA, as well as of new technologies and tools to be developed and integrated in PANACEA platform. The report also specifies the Lexical Resources to be produced. Four main areas of lexical acquisition are included: Subcategorization frames (SCFs), Selectional Preferences (SPs), Lexical-semantic Classes (LCs), for both nouns and verbs, and Multi-Word Expressions (MWEs)
Syntaxe computationnelle du hongrois : de l'analyse en chunks à la sous-catégorisation verbale
We present the creation of two resources for Hungarian NLP applications: a rule-based shallow parser and a database of verbal subcategorization frames. Hungarian, as a non-configurational language with a rich morphology, presents specific challenges for NLP at the level of morphological and syntactic processing. While efficient and precise morphological analyzers are already available, Hungarian is under-resourced with respect to syntactic analysis. Our work aimed at overcoming this problem by providing resources for syntactic processing. Hungarian language is characterized by a rich morphology and a non-configurational encoding of grammatical functions. These features imply that the syntactic processing of Hungarian has to rely on morphological features rather than on constituent order. The broader interest of our undertaking is to propose representations and methods that are adapted to these specific characteristics, and at the same time are in line with state of the art research methodologies. More concretely, we attempt to adapt current results in argument realization and lexical semantics to the task of labeling sentence constituents according to their syntactic function and semantic role in Hungarian. Syntax and semantics are not completely independent modules in linguistic analysis and language processing: it has been known for decades that semantic properties of words affect their syntactic distribution. Within the syntax-semantics interface, the field of argument realization deals with the (partial or complete) prediction of verbal subcategorization from semantic properties. Research on verbal lexical semantics and semantically motivated mapping has been concentrating on predicting the syntactic realization of arguments, taking for granted (either explicitly or implicitly) that the distinction between arguments and adjuncts is known, and that adjuncts' syntactic realization is governed by productive syntactic rules, not lexical properties. However, besides the correlation between verbal aspect or actionsart and time adverbs (e.g. Vendler, 1967 or Kiefer, 1992 for Hungarian), the distribution of adjuncts among verbs or verb classes did not receive significant attention, especially within the lexical semantics framework. We claim that contrary to the widely shared presumption, adjuncts are often not fully productive. We therefore propose a gradual notion of productivity, defined in relation to Levin-type lexical semantic verb classes (Levin, 1993; Levin and Rappaport-Hovav, 2005). The definition we propose for the argument-adjunct dichotomy is based on evidence from Hungarian and exploits the idea that lexical semantics not only influences complement structure but is the key to the argument-adjunct distinction and the realization of adjunctsLa linguistique informatique est un domaine de recherche qui se concentre sur les méthodes et les perspectives de la modélisation formelle (statistique ou symbolique) de la langue naturelle. La linguistique informatique, tout comme la linguistique théorique, est une discipline fortement modulaire : les niveaux d'analyse linguistique comprennent la segmentation, l'analyse morphologique, la désambiguïsation, l'analyse syntaxique et sémantique. Tandis qu'un nombre d'outils existent déjà pour les traitements de bas niveau (analyse morphologique, étiquetage grammatical), le hongrois peut être considéré comme une langue peu doté pour l'analyse syntaxique et sémantique. Le travail décrit dans la présente thèse vise à combler ce manque en créant des ressources pour le traitement syntaxique du hongrois : notamment, un analyseur en chunks et une base de données lexicale de schémas de sous-catégorisation verbale. La première partie de la recherche présentée ici se concentre sur la création d'un analyseur syntaxique de surface (ou analyseur en chunks) pour le hongrois. La sortie de l'analyseur de surface est conçue pour servir d'entrée pour un traitement ultérieur visant à annoter les relations de dépendance entre le prédicat et ses compléments essentiels et circonstanciels. L'analyseur profond est mis en œuvre dans NooJ (Silberztein, 2004) en tant qu'une cascade de grammaires. Le deuxième objectif de recherche était de proposer une représentation lexicale pour la structure argumentale en hongrois. Cette représentation doit pouvoir gérer la vaste gamme de phénomènes qui échappent à la dichotomie traditionnelle entre un complément essentiel et un circonstanciel (p. ex. des structures partiellement productives, des écarts entre la prédictibilité syntaxique et sémantique). Nous avons eu recours à des résultats de la recherche récente sur la réalisation d'arguments et choisi un cadre qui répond à nos critères et qui est adaptable à une langue non-configurationnelle. Nous avons utilisé la classification sémantique de Levin (1993) comme modèle. Nous avons adapté les notions relatives à cette classification, à savoir celle de la composante sémantique et celle de l'alternance syntaxique, ainsi que la méthodologie d'explorer et de décrire le comportement des prédicats à l'aide de cette représentation, à la tâche de construire une représentation lexicale des verbes dans une langue non-configurationnelle. La première étape consistait à définir les règles de codage et de construire un vaste base de données lexicale pour les verbes et leurs compléments. Par la suite, nous avons entrepris deux expériences pour l'enrichissement de ce lexique avec des informations sémantiques lexicales afin de formaliser des généralisations syntaxiques et sémantiques pertinentes sur les classes de prédicats sous-jacentes. La première approche que nous avons testée consistait en une élaboration manuelle de classification de verbes en fonction de leur structure de compléments et de l'attribution de rôles sémantiques à ces compléments. Nous avons cherché la réponse aux questions suivantes: quelles sont les composants sémantiques pertinents pour définir une classification sémantique des prédicats hongrois? Quelles sont les implications syntaxiques spécifiques à ces classes? Et, plus généralement, quelle est la nature des alternances spécifiques aux classes verbales en hongrois ? Dans la phase finale de la recherche, nous avons étudié le potentiel de l'acquisition automatique pour extraire des classes de verbes à partir de corpus. Nous avons effectué une classification non supervisée, basée sur des données distributionnelles, pour obtenir une classification sémantique pertinente des verbes hongrois. Nous avons également testé la méthode de classification non supervisée sur des données françaises
German particle verbs: Compositionality at the syntax-semantics interface
Particle verbs represent a type of multi-word expression composed of a base verb and a particle. The meaning of the particle verb is often, but not always, derived from the meaning of the base verb, sometimes in quite complex ways. In this work, we computationally assess the levels of German particle verb compositionality, with the use of distributional semantic methods. Our results demonstrate that the prediction of particle verb compositionality is possible with statistical significance. Furthermore, we investigate properties of German particle verbs that are relevant for their compositionality: the particular subcategorization behavior of particle verbs and their corresponding base verbs, and the question in how far the verb particles can be attributed meaning by themselves, which they contribute to the particle verb
Context Effects in Language Production: Models of Syntactic Priming in Dialogue Corpora
Institute for Communicating and Collaborative SystemsThis thesis addresses the cognitive basis of syntactic adaptation, which biases speakers
to repeat their own syntactic constructions and those of their conversational
partners. I address two types of syntactic adaptation: short-term priming and longterm
adaptation.
I develop two metrics for syntactic adaptation within a speaker and between
speakers in dialogue: one for short-term priming effects that decay quickly, and
one for long-term adaptation over the course of a dialogue. Both methods estimate
adaptation in large datasets consisting of transcribed human-human dialogue annotated
with syntactic information. Two such corpora in English are used: Switchboard,
a collection of spontaneous phone conversation, and HCRC Map Task, a set
of task-oriented dialogues in which participants describe routes on a map to one
another. I find both priming and long-term adaptation in both corpora, confirming
well-known experimental results (e.g., Bock, 1986b). I extend prior work by showing
that syntactic priming effects not only apply to selected syntactic constructions
that are alternative realizations of the same semantics, but still hold when a broad
variety of syntactic phrase structure rules are considered. Each rule represents a
cognitive decision during syntactic processing. I show that the priming effect for a
rule is inversely proportional to its frequency.
With this methodology, I test predictions of the Interactive Alignment Model
(IAM, Pickering and Garrod, 2004). The IAM claims that linguistic and situation model
agreement between interlocutors in dialogue is the result of a cascade of
resource-free, mechanistic priming effects on various linguistic levels. I examine
task-oriented dialogue in Map Task, which provides a measure of task success
through the deviance of the communicated routes on the maps. I find that long term
syntactic adaptation predicts communicative success, and it does so earlier
than lexical adaptation. The result is applied in a machine-learning based model
that estimates task success based on the dialogue, capturing 14 percent of the variance
in Map Task. Short-term syntactic priming differs qualitatively from long term
adaptation, as it does not predict task success, providing evidence against
learning as a single cognitive basis of adaptation effects.
I obtain further evidence for the correlation between semantic activity and syntactic
priming through a comparison of the Map Task and Switchboard corpora,
showing that short-term priming is stronger in task-oriented dialogue than in spontaneous conversation. This difference is evident for priming between and within
speakers, which suggests that priming is a mechanistic rather than strategic effect.
I turn to an investigation of the level at which syntactic priming influences language
production. I establish that the effect applies to structural syntactic decisions
as opposed to all surface sequences of lexical categories. To do so, I identify pairs of
part-of-speech categories which consistently cross constituent boundaries defined
by the phrase structure analyses of the sentences. I show that such distituents are
less sensitive to priming than pairs occurring within constituents. Thus, syntactic
priming is sensitive to syntactic structure.
The notion of constituent structure differs among syntactic models. Combinatory
Categorial Grammar (CCG, Steedman, 2000) formalizes flexible constituent
structure, accounting a varying degree of incrementality in syntactic sentence planning.
I examine whether priming effects can support the predictions of CCG using
the Switchboard corpus, which has been annotated with CCG syntax. I confirm the
syntactic priming effect for lexical and non-lexical CCG categories, which encode
partially satisfied subcategorization frames. I then show that both incremental and
normal-form constituent structures exhibit priming, arguing for language production
accounts that support flexible incrementality.
The empirical results are reflected in a cognitive model of syntactic realization
in language production. The model assumes that language production is subject
to the same principles and constraints as any other form of cognition and follows
the ACT-R framework (Anderson et al., 2004). Its syntactic process implements
my empirical results on priming and is based on CCG. Syntactic planning can take
place incrementally and non-incrementally. The model is able to generate simple
sentences that vary syntactically, similar to the materials used in the experimental
priming literature.
Syntactic adaptation emerges due to a preferential and sped-up memory retrieval
of syntactic categories describing linearization and subcategorization requirements.
Long-term adaptation is explained as a form of learning, while shortterm
priming is the result of a combination of learning and spreading activation
from semantic and lexical material. Simulations show that the model produces the
adaptation effects and their inverse frequency interaction, as well as cumulativity
of long-term adaptation
Recommended from our members
Learning for semantic parsing using statistical syntactic parsing techniques
textNatural language understanding is a sub-field of natural language processing, which builds automated systems to understand natural language. It is such an ambitious task that it sometimes is referred to as an AI-complete problem, implying that its difficulty is equivalent to solving the central artificial intelligence problem -- making computers as intelligent as people. Despite its complexity, natural language understanding continues to be a fundamental problem in natural language processing in terms of its theoretical and empirical importance. In recent years, startling progress has been made at different levels of natural language processing tasks, which provides great opportunity for deeper natural language understanding. In this thesis, we focus on the task of semantic parsing, which maps a natural language sentence into a complete, formal meaning representation in a meaning representation language. We present two novel state-of-the-art learned syntax-based semantic parsers using statistical syntactic parsing techniques, motivated by the following two reasons. First, the syntax-based semantic parsing is theoretically well-founded in computational semantics. Second, adopting a syntax-based approach allows us to directly leverage the enormous progress made in statistical syntactic parsing. The first semantic parser, Scissor, adopts an integrated syntactic-semantic parsing approach, in which a statistical syntactic parser is augmented with semantic parameters to produce a semantically-augmented parse tree (SAPT). This integrated approach allows both syntactic and semantic information to be available during parsing time to obtain an accurate combined syntactic-semantic analysis. The performance of Scissor is further improved by using discriminative reranking for incorporating non-local features. The second semantic parser, SynSem, exploits an existing syntactic parser to produce disambiguated parse trees that drive the compositional semantic interpretation. This pipeline approach allows semantic parsing to conveniently leverage the most recent progress in statistical syntactic parsing. We report experimental results on two real applications: an interpreter for coaching instructions in robotic soccer and a natural-language database interface, showing that the improvement of Scissor and SynSem over other systems is mainly on long sentences, where the knowledge of syntax given in the form of annotated SAPTs or syntactic parses from an existing parser helps semantic composition. SynSem also significantly improves results with limited training data, and is shown to be robust to syntactic errors.Computer Science
Examining inter-sentential influences on predicted verb subcategorization
This study investigated the influences of prior discourse context and cumulative syntactic priming on readers' predictions for verb subcategorizations. An additional aim was to determine whether cumulative syntactic priming has the same degree of influence following coherent discourse contexts as when following series of unrelated sentences. Participants (N = 40) read sentences using a self-paced, sentence-by-sentence procedure. Half of these sentences comprised a coherent discourse context intended to increase the expectation for a sentential complement (S) completion. The other half consisted of scrambled sentences. The trials in both conditions varied according to the proportion of verbs that resolved to an S (either 6S or 2S). Following each condition, participants read temporarily ambiguous sentences that resolved to an S. Reading times across the disambiguating and postdisambiguating regions were measured. No significant main effects or interactions were found for either region. However, the lack of significant findings for these analyses may have been due to low power. In a follow-up analysis, data from each gender were analyzed separately. For the data contributed by males, there were no significant findings. For the data contributed by females, the effect of coherence was significant (by participants but not by items) across the postdisambiguating region, and there was a marginally significant interaction (p =.05) between coherence and frequency across this region suggesting that discourse-level information may differentially influence the local sentence processing of female and male participant
Information and Incrementality in Syntactic Bootstrapping
Some words are harder to learn than others. For instance, action verbs like "run" and "hit" are learned earlier than propositional attitude verbs like "think" and "want." One reason "think" and "want" might be learned later is that, whereas we can see and hear running and hitting, we can't see or hear thinking and wanting. Children nevertheless learn these verbs, so a route other than the senses must exist. There is mounting evidence that this route involves, in large part, inferences based on the distribution of syntactic contexts a propositional attitude verb occurs in---a process known as "syntactic bootstrapping." This fact makes the domain of propositional attitude verbs a prime proving ground for models of syntactic bootstrapping.
With this in mind, this dissertation has two goals: on the one hand, it aims to construct a computational model of syntactic bootstrapping; on the other, it aims to use this model to investigate the limits on the amount of information about propositional attitude verb meanings that can be gleaned from syntactic distributions. I show throughout the dissertation that these goals are mutually supportive.
In Chapter 1, I set out the main problems that drive the investigation. In Chapters 2 and 3, I use both psycholinguistic experiments and computational modeling to establish that there is a significant amount of semantic information carried in both participants' syntactic acceptability judgments and syntactic distributions in corpora. To investigate the nature of this relationship I develop two computational models: (i) a nonnegative model of (semantic-to-syntactic) projection and (ii) a nonnegative model of syntactic bootstrapping. In Chapter 4, I use a novel variant of the Human Simulation Paradigm to show that the information carried in syntactic distribution is actually utilized by (simulated) learners. In Chapter 5, I present a proposal for how to solve a standing problem in how syntactic bootstrapping accounts for certain kinds of cross-linguistic variation. And in Chapter 6, I conclude with future directions for this work
- …