300 research outputs found

    One of a kind. The processing of indefinite one-anaphora in spoken Danish

    Get PDF
    It is a hallmark of natural language use that the way we talk about something reflects how it is represented in the mind of our conversation partner. This thesis studies the use and cognitive processing of referring expressions like one in comparison with other expression types in spoken Danish. The cognitive status of referents in other people’s minds can be understood in terms or referential givenness. The common view of givenness is that it constitutes a one-dimensional scale or continuum of cognitive prominence. In opposition to this view, the present thesis assumes that givenness is partly composed of the dynamic referential features of accessibility and identifiability, two fundamental dimensions of givenness that are essentially free to vary independently of each other. A small corpus study of spoken Danish shows no differences between referent accessibility in indefinite and definite pronominal forms. Furthermore, referents of definite and indefinite forms clearly tend to differ with respect to identifiability. An experimental eye-tracking study provides evidence that there is no difference in the timecourse of the initiation of anaphoric reference resolution processes between expressions differing in definiteness marking and lexical explicitness. It is also shown, however, that the referential commitment of pronouns---both indefinite and definite---lag behind that of full noun phrases. Finally, an explorative dyadic eye-tracking study suggests that the moment-by-moment activation of referents in both speaker and listener vary as a function of lexical explicitness in indefinite forms. This result is consistent with the assumption that givenness differences associated with accessibility marking generalize to indefinite forms. All of these findings provide support for the new view of givenness proposed in the thesis. The dyadic eye-tracking methodology eventually arrived at in the thesis proves that it is possible to study language processing in unscripted, relatively natural dialogue in both speaker and listener simultaneously, and that interesting results can be obtained that are well worth the effort

    The use of English referring expressions by Chinese children living in Britain

    Get PDF
    This thesis examined the English referring expressions used by the Chinese children living in Britain and English children matched by English language ability to the Chinese children. Two adult groups (one Chinese and one English) were used as controls. Two experiments were conducted in a year time apart, involving 166 participants in total. In the experiments, participants described stories presented in pictures to listeners who could (El) or could not (E2) see the pictures. The stories in El described two protagonists of different genders, those in E2 described two of the same gender. Predictions concerned the use of appropriate referring expressions on first mention of novel entities and on second mention of familiar entities; whether a thematic subject strategy was used; whether Chinese children's choice of specific referring expressions (Bare Nouns, Demonstratives, and Zero Anaphors) was influenced by their first language; and which factors (Fist Language, English Language Ability, Cognitive Ability, and Age) were significant predictors of the children's use of English referring expressions. The main results were as follows: Both groups of children used definite references on second mention more frequently than they used indefinite references on first mention. There were hardly any transcripts showing use of a thematic subject strategy. Instead, participants used either an explicit strategy, in which full explicit noun phrases were used throughout or a strategy in which the subject slot is reserved for the current topic, which may change a the discourse proceeds. English parents predominantly used this second strategy. Regression analyses showed that cognitive ability was the best predictor of first mention indefinites in both experiments and of second mention definites in El, where definite articles were appropriate for identifying the referent. English language ability was the best predictor of second mention definites in both experiments. These results were discussed in relation to previous studies and the notion of mental models. It was concluded that Chinese children did not use an inter-language that contained information about specific words or phrases. The major effect of first language may be discourse level strategies, but this was only appeared with the parents

    Knowledge acquisition for coreference resolution

    Get PDF
    Diese Arbeit befasst sich mit dem Problem der statistischen Koreferenzauflösung. Theoretische Studien bezeichnen Koreferenz als ein vielseitiges linguistisches Phänomen, das von verschiedenen Faktoren beeinflusst wird. Moderne statistiche Algorithmen dagegen basieren sich typischerweise auf einfache wissensarme Modelle. Ziel dieser Arbeit ist das Schließen der Lücke zwischen Theorie und Praxis. Ausgehend von den Erkentnissen der theoretischen Studien erfolgt die Bestimmung der linguistischen Faktoren die fuer die Koreferenz besonders relevant erscheinen. Unterschiedliche Informationsquellen werden betrachtet: von der Oberflächenübereinstimmung bis zu den tieferen syntaktischen, semantischen und pragmatischen Merkmalen. Die Präzision der untersuchten Faktoren wird mit korpus-basierten Methoden evaluiert. Die Ergebnisse beweisen, dass die Koreferenz mit den linguistischen, in den theoretischen Studien eingebrachten Merkmalen interagiert. Die Arbeit zeigt aber auch, dass die Abdeckung der untersuchten theoretischen Aussagen verbessert werden kann. Die Merkmale stellen die Grundlage für den Aufbau eines einerseits linguistisch gesehen reichen andererseits auf dem Machinellen Lerner basierten, d.h. eines flexiblen und robusten Systems zur Koreferenzauflösung. Die aufgestellten Untersuchungen weisen darauf hin dass das wissensreiche Model erfolgversprechende Leistung zeigt und im Vergleich mit den Algorithmen, die sich auf eine einzelne Informationsquelle verlassen, sowie mit anderen existierenden Anwendungen herausragt. Das System erreicht einen F-wert von 65.4% auf dem MUC-7 Korpus. In den bereits veröffentlichen Studien ist kein besseres Ergebnis verzeichnet. Die Lernkurven zeigen keine Konvergenzzeichen. Somit kann der Ansatz eine gute Basis fuer weitere Experimente bilden: eine noch bessere Leistung kann dadurch erreicht werden, dass man entweder mehr Texte annotiert oder die bereits existierende Daten effizienter einsetzt. Diese Arbeit beweist, dass statistiche Algorithmen fuer Koreferenzauflösung stark von den theoretischen linguistischen Studien profitiern können und sollen: auch unvollständige Informationen, die automatische fehleranfällige Sprachmodule liefern, können die Leistung der Anwendung signifikant verbessern.This thesis addresses the problem of statistical coreference resolution. Theoretical studies describe coreference as a complex linguistic phenomenon, affected by various different factors. State-of-the-art statistical approaches, on the contrary, rely on rather simple knowledge-poor modeling. This thesis aims at bridging the gap between the theory and the practice. We use insights from linguistic theory to identify relevant linguistic parameters of co-referring descriptions. We consider different types of information, from the most shallow name-matching measures to deeper syntactic, semantic, and discourse knowledge. We empirically assess the validity of the investigated theoretic predictions for the corpus data. Our data-driven evaluation experiments confirm that various linguistic parameters, suggested by theoretical studies, interact with coreference and may therefore provide valuable information for resolution systems. At the same time, our study raises several issues concerning the coverage of theoretic claims. It thus brings feedback to linguistic theory. We use the investigated knowledge sources to build a linguistically informed statistical coreference resolution engine. This framework allows us to combine the flexibility and robustness of a machine learning-based approach with wide variety of data from different levels of linguistic description. Our evaluation experiments with different machine learners show that our linguistically informed model, on the one side, outperforms algorithms, based on a single knowledge source and, on the other side, yields the best result on the MUC-7 data, reported in the literature (F-score of 65.4% with the SVM-light learning algorithm). The learning curves for our classifiers show no signs of convergence. This suggests that our approach makes a good basis for further experimentation: one can obtain even better results by annotating more material or by using the existing data more intelligently. Our study proves that statistical approaches to the coreference resolution task may and should benefit from linguistic theories: even imperfect knowledge, extracted from raw text data with off-the-shelf error-prone NLP modules, helps achieve significant improvements

    Linguistics parameters for zero anaphora resolution

    Get PDF
    Dissertação de mest., Natural Language Processing and Human Language Technology, Univ. do Algarve, 2009This dissertation describes and proposes a set of linguistically motivated rules for zero anaphora resolution in the context of a natural language processing chain developed for Portuguese. Some languages, like Portuguese, allow noun phrase (NP) deletion (or zeroing) in several syntactic contexts in order to avoid the redundancy that would result from repetition of previously mentioned words. The co-reference relation between the zeroed element and its antecedent (or previous mention) in the discourse is here called zero anaphora (Mitkov, 2002). In Computational Linguistics, zero anaphora resolution may be viewed as a subtask of anaphora resolution and has an essential role in various Natural Language Processing applications such as information extraction, automatic abstracting, dialog systems, machine translation and question answering. The main goal of this dissertation is to describe the grammatical rules imposing subject NP deletion and referential constraints in the Brazilian Portuguese, in order to allow a correct identification of the antecedent of the deleted subject NP. Some of these rules were then formalized into the Xerox Incremental Parser or XIP (Ait-Mokhtar et al., 2002: 121-144) in order to constitute a module of the Portuguese grammar (Mamede et al. 2010) developed at Spoken Language Laboratory (L2F). Using this rule-based approach we expected to improve the performance of the Portuguese grammar namely by producing better dependency structures with (reconstructed) zeroed NPs for the syntactic-semantic interface. Because of the complexity of the task, the scope of this dissertation had to be limited: (a) subject NP deletion; b) within sentence boundaries and (c) with an explicit antecedent; besides, (d) rules were formalized based solely on the results of the shallow parser (or chunks), that is, with minimal syntactic (and no semantic) knowledge. A corpus of different text genres was manually annotated for zero anaphors and other zero-shaped, usually indefinite, subjects. The rule-based approached is evaluated and results are presented and discussed

    Choosing referring expressions

    Get PDF
    This thesis focuses on the issue of how language users refer to an entity during discourse production, by investigating representations and processes that underlie the choice between pronouns and repeated noun phrases. Past research has shown that the use of pronouns (relative to more explicit expressions) is affected by the referent’s salience in the prior linguistic context, but much less is known about how non-linguistic context affects the referent’s salience and the choice of expression. Recent research has suggested that the referent’s non-linguistic salience has no effect on the choice of pronouns and names (Arnold ;Griffin, 2007). One of the major findings of the research reported in this thesis is that the referent's salience in the visual context plays an important role in the form of reference: Pronouns were less frequent (relative to repeated noun phrases) when the competitor was present than absent in the visual context. My second major finding is that similarity-based interference affects the choice of referring expressions. Pronouns are less frequent when discourse entities are similar in terms of their inherent conceptual properties as well as extrinsic properties, suggesting that the more similar the competitor to the referent, the stronger the interference, reducing pronoun usage. My third major finding is that contrary to many linguistic theories that assume that speakers choose referring expressions that are optimally helpful for their addressee (Ariel, 1990; Clark ;Marshall, 1981; Givón, 1983), speakers do not choose expressions by adopting the addressee's discourse model: Pronouns are more frequent when the referent is salient to the speaker, not to the addressee. I argue that the explicitness of referring expressions is affected by the degree of conceptual access that is needed to initiate production processes: The more conceptual access is needed, the more elaborate expressions tend to be produced.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Choosing referring expressions

    Get PDF

    L2 Influence on L1 : Chinese subject realisation in Chinese-English bilinguals

    Get PDF
    This study aims to investigate the influence of the second language (L2) on the use of the first language (L1) in late bilinguals within an L1 dominant environment. Cross-linguistic influence (Kellerman & Smith, 1986) has been usually studied in the forward direction: how bilinguals’ L1 influences the acquisition and use of their L2. The other direction (i.e., the influence of L2 on L1), on the other hand, has not been sufficiently investigated. The current study looks at Chinese-speaking learners who acquire their L2 English through instruction in an L1 dominant environment. It does so by examining ‘subject realisation’, an area where Chinese and English exhibit substantial typological contrasts since Chinese allows both overt and null arguments under certain discourse-pragmatic conditions, whereas subjects in English are, under most circumstances, obligatorily expressed (Huang, 1984).. It is then hypothesized that long-time learning and regularly using English as L2 would increase the use of overt subjects realised in the bilingual’s first language, i.e., Chinese, with the consequent use of fewer null subjects in their L1. In addition, following Grosjean (1998), the interaction between the bilingual’s two languages is expected to be stronger when bilinguals produce language in the so called ‘bilingual mode’, i.e., when both languages are highly activated, than in a ‘monolingual mode’, i.e., when only one language is predominately activated. Such ‘language mode’ factor leads naturally to a futher hypothesis: fewer null subjects are realised in speech produced by Chinese-English bilinguals within a bilingual mode compared to monolingual mode

    Anaphoric resolution of zero pronouns in Chinese in translation and reading comprehension

    Get PDF
    The primary aim of the thesis is to investigate some of the processes of reading Chinese text by means of comparing and analysing approximately 100 parallel translations of four texts from Chinese to English. The translations are answers to A Level examination questions. The focus of the investigation is interpretation of the zero pronoun, a common phenomenon in Chinese, which often requires explicitation when translated into English. The secondary aim is to show how translation gives evidence of comprehension, as shown by the variation in interpretation of zero pronouns. The thesis reviews relevant psycholinguistic research into reading, particularly reading of Chinese text. This is followed by reviews of relevant research into translation as a reading activity, and a discussion of its role in language teaching and testing.The core of the thesis is the discussion of the zero pronoun in Chinese, including discussion of anaphoric choice - the writer's decision on when to use zero in preference to an explicit anaphoric form - and of anaphoric resolution - how a reader decides what a zero pronoun refers to. Anaphoric resolution may be problematic for less experienced readers of Chinese owing to its lack of rich morphological inflection which, in other languages, provides the reader with information. Some of the key ideas on anaphoric choice and resolution are then applied to the analysis of the data in the parallel translations. It would appear that factors in Chinese texts which have an effect on comprehending zero pronouns are antecedent distance, topic persistence, abstraction, multiplicity of arguments and the meaning of the verb. Characteristics of the reader which may affect comprehension of the zero pronoun include personal schemata which may lead to elaborative inferences. On the basis of the data I suggest that mark schemes could be devised on a scalar system encompassing optimal solution, proximal solution and nonsolution, which might help to solve the problem of variability in marking translation.A by-product of the thesis, and an avenue for further research, is the apparent close relationship between idea units, clause length, punctuation breaks and antecedent distance in Chinese texts and saccade length and working memory capacity in the reader of Chinese

    The Importance of Semantics: Visual World Studies on Drawing Inferences and Resolving Anaphors

    Get PDF
    The present thesis investigated the importance of semantics in generating inferences during discourse processing. Three aspects of semantics, gender stereotypes, implicit causality information and proto-role properties, were used to investigate whether semantics is activated elaboratively during discourse comprehension and what its relative importance is in backward inferencing compared to discourse/structural cues. Visual world eye-tracking studies revealed that semantics plays an important role in both backward and forward inferencing: Gender stereotypes and implicit causality information is activated elaboratively during online discourse comprehension. Moreover, gender stereotypes, implicit causality and proto-role properties of verbs are all used in backward inferencing. Importantly, the studies demonstrated that semantic cues are weighed against discourse/structural cues. When the structural cues consist of a combination of cues that have been independently shown to be important in backward inferencing, semantic effects may be masked, whereas when the structural cues consist of a combination of fewer prominent cues, semantics can have an earlier effect than structural factors in pronoun resolution. In addition, the type of inference matters, too: During anaphoric inferencing semantics has a prominent role, while discourse/structural salience attains more prominence during non-anaphoric inferencing. Finally, semantics exhibits a strong role in inviting new inferences to revise earlier made inferences even in the case the additional inference is not needed to establish coherence in discourse. The findings are generally in line with the Mental Model approaches. Two extended model versions are presented that incorporate the current findings into the earlier literature. These models allow both forward and backward inferencing to occur at any given moment during the course of processing; they also allow semantic and discourse/structural cues to contribute to both of these processes. However, while the Mental Model 1 does not assume interactions between semantic and discourse/structural factors in forward inferencing, the Mental Model 2 does assume such a link.Siirretty Doriast
    • …
    corecore