20 research outputs found

    Signalling conditional relations

    Get PDF
    We investigate how discourse relations and their subtypes are signalled, extending the set of discourse signals from connectives and lexical cue phrases to the wide range of semantic, syntactic, and orthographic signals of the RST Signalling Corpus (Das, Debopam & Maite Taboada. 2018. RST signalling corpus. Language Resources and Evaluation 52. 149–184). This extension requires re-evaluating previous predictions on discourse signalling, in particular, those of Sanders, Ted. 2005. Coherence, causality and cognitive complexity in discourse. In M. Aurnague, M. Bras, A. Le Draoulec & L. Vieu (eds.), Proceedings/Actes SEM-05, first international symposium on the exploration and modelling of meaning, 105–114. Biarritz causality-by-default hypothesis, the hypothesis of uniform information density (Frank, Austin & Florian Jaeger. 2008. Speaking rationally: Uniform information density as an optimal strategy for language production. In Proceedings of the 30th annual meeting of the Cognitive Science Society, 933–938. https://escholarship.org/uc/item/7d08h6j4 (accessed 18 May 2022)), and the hypothesis that discourse is continuous by preference (Segal, Erwin, Judith Duchan & Paula Scott. 1991. The role of interclausal connectives in narrative structuring. Discourse Processes 14. 27–54; Murray, John. 1997. Connectives and narrative text. Memory and Cognition 25. 227–236). We evaluate the predictions of these theories on the conditional relations in the RST Discourse Treebank (Carlson, Lynn, Daniel Marcu & Mary Ellen Okurowski. 2002. RST Discourse Treebank. LDC2002T07. Philadelphia: Linguistic Data Consortium), using causal relations as a control group. Informativity and continuity are operationalized in terms of semantic complexity and Givón, Talmy. 1993. English grammar: A function-based introduction, vol. 2. Amsterdam: John Benjamins dimensions of deictic shift. Our results show that the hypotheses make accurate predictions only for the relation groups in their entirety but not for the observed in-group variation, in particular, the low amount of marking for the hypothetical subtype of conditional relations. We attribute this difference to the distribution of intra- and inter-sentential occurrences across the conditional subtypes: intra-sentential relations are consistently more marked than inter-sentential ones, and hypothetical relations are special in that they appear predominantly inter-sententially.Peer Reviewe

    Marcadores discursivos em traduções em inglês e português europeu: definição de equivalentes funcionais e tipos de omissão

    Get PDF
    Tendo como base traduções de um corpus paralelo bidirecional inglês-português, este artigo visa a examinar alguns marcadores discursivos (daqui em adiante MDs) em inglês (tais como bem, sabe, quer dizer). O artigo tem dois objetivos. Primariamente, a análise das traduções estabelece equivalentes funcionais de MDs de inglês para português europeu, complementando, desta forma, os estudos existentes sobre traduções de MD em corpus paralelo. Por outro lado, e mais importante, este trabalho procura abordar o fenômeno de omissão de MDs frequentemente observado em traduções do ponto de vista empírico e não teórico. Em particular, o estudo focaliza a omissão dos marcadores discursivos em inglês e português. A análise do corpus resultou na identificação de três tipos mais comuns de omissão: eliminação de marcador discursivo (ou seja, uma exclusão ou omissão simples do marcador), eliminação parcial de marcador (ou seja, quando um dos dois marcadores foram omitidos na tradução, ficando apenas um deles) e adição de marcador (ou seja, quando não há marcador no idioma original, mas o tradutor o adicionou).Based on the translations of a bidirectional English-Portuguese parallel corpus, this paper examines some English discourse markers (henceforth ‘DMs’, such as well, you know, I mean). The goal is twofold: firstly, the analysis of the translations establishes functional equivalents of the English DMs in European Portuguese, thus complementing the existing studies on translation of DMs in parallel corpus. Secondly and most importantly, this paper aims to approach the phenomenon of DMs omission frequently observed in translations from the empirical, rather than theoretical point of view. In particular, the study focuses on omission of DMs in the target languages. The corpus analysis resulted in the identification of three most common types of omission: DM deletion (i.e. a common DM deletion or omission in the target language), partial DM deletion (i.e. when one of the two DMs in the original language drops, resulting in translation of only one of them in the target language), DM addition (i.e. when there is no DM in the original language, but the translator has added it)

    An Exploratory Analysis of TED Talks in English and Lithuanian, Portuguese and Turkish Translations

    Get PDF
    CC BY 4.0This paper contributes to the question of how discourse relations are realised in TED talks. Drawing on an annotated, multilingual discourse corpus of TED talk transcripts, we examine discourse relations in English and Lithuanian, Portuguese and Turkish translations by concentrating on three aspects: the degree of explicitness in discourse relations, the extent to which explicit and implicit relations are encoded inter- or intra-sententially, and whether top-level discourse relation senses employed in English differ in the target languages. The study shows that while the target languages differ from English in the first two dimensions, they do not display considerable differences in the third dimension. The paper thus reveals variations in the realisation of discourse relations in translated transcripts of a spoken genre in three languages and offers some methodological insights for dealing with the issues surrounding discourse relations

    The online processing of causal and concessive discourse connectives

    Get PDF
    While there is a substantial amount of evidence for language processing being a highly incremental and predictive process, we still know relatively little about how top-down discourse based expectations are combined with bottom-up information such as discourse connectives. The present article reports on three experiments investigating this question using different methodologies (visual world paradigm and ERPs) in two languages (German and English). We find support for highly incremental processing of causal and concessive discourse connectives, causing anticipation of upcoming material. Our visual world study shows that anticipatory looks depend on the discourse connective; furthermore, the German ERP study revealed an N400 effect on a gender-marked adjective preceding the target noun, when the target noun was inconsistent with the expectations elicited by the combination of context and discourse connective. Moreover, our experiments reveal that the facilitation of downstream material based on earlier connectives comes at the cost of reversing original expectations, as evidenced by a P600 effect on the concessive relative to the causal connective

    A Psycholinguistic Model for the Marking of Discourse Relations

    Get PDF
    Discourse relations can either be explicitly marked by discourse connectives (DCs), such as therefore and but, or implicitly conveyed in natural language utterances. How speakers choose between the two options is a question that is not well understood. In this study, we propose a psycholinguistic model that predicts whether or not speakers will produce an explicit marker given the discourse relation they wish to express. Our model is based on two information-theoretic frameworks: (1) the Rational Speech Acts model, which models the pragmatic interaction between language production and interpretation by Bayesian inference, and (2) the Uniform Information Density theory, which advocates that speakers adjust linguistic redundancy to maintain a uniform rate of information transmission. Specifically, our model quantifies the utility of using or omitting a DC based on the expected surprisal of comprehension, cost of production, and availability of other signals in the rest of the utterance. Experiments based on the Penn Discourse Treebank show that our approach outperforms the state-of-the-art performance at predicting the presence of DCs (Patterson and Kehler, 2013), in addition to giving an explanatory account of the speaker’s choice

    Scolding the child who threw the scissors:Shaping discourse expectations by restricting referents

    Get PDF
    Coherence relations are often assumed to hold between clauses, but restrictive relative clauses (RCs) are usually not granted discourse segment status because they are syntactically and conceptually integrated in their matrix clauses. This paper investigates whether coherence relations can be inferred between restrictive RCs and their matrix clauses. Three experiments provide converging evidence that restrictive RCs can indeed play a role at the discourse level and should not categorically be excluded from receiving discourse segment status in discourse annotation practices. At the same time, the studies provide new insights into implicit causality verb biases, specifically about next-mention biases in concessive coherence relations, and expectations about discourse structure, upcoming referents, and upcoming coherence relations

    establishing functional equivalents and types of omission

    Get PDF
    UIDB/03213/2020 UIDP/03213/2020 PD/BD/105766/2014Based on the translations of a bidirectional English-Portuguese parallel corpus, this paper examines some English discourse markers (henceforth ‘DMs’, such as well, you know, I mean). The goal is twofold: firstly, the analysis of the translations establishes functional equivalents of the English DMs in European Portuguese, thus complementing the existing studies on translation of DMs in parallel corpus. Secondly and most importantly, this paper aims to approach the phenomenon of DMs omission frequently observed in translations from the empirical, rather than theoretical point of view. In particular, the study focuses on omission of DMs in the target languages. The corpus analysis resulted in the identification of three most common types of omission: DM deletion (i.e. a common DM deletion or omission in the target language), partial DM deletion (i.e. when one of the two DMs in the original language drops, resulting in translation of only one of them in the target language), DM addition (i.e. when there is no DM in the original language, but the translator has added it). Tendo como base traduções de um corpus paralelo bidirecional inglêsportuguês, este artigo visa a examinar alguns marcadores discursivos (daqui em adiante MDs) em inglês (tais como bem, sabe, quer dizer). O artigo tem dois objetivos. Primariamente, a análise das traduções estabelece equivalentes funcionais de MDs de inglês para português europeu, complementando, desta forma, os estudos existentes sobre traduções de MD em corpus paralelo. Por outro lado, e mais importante, este trabalho procura abordar o fenômeno de omissão de MDs frequentemente observado em traduções do ponto de vista empírico e não teórico. Em particular, o estudo focaliza a omissão dos marcadores discursivos em inglês e português. A análise do corpus resultou na identificação de três tipos mais comuns de omissão: eliminação de marcador discursivo (ou seja, uma exclusão ou omissão simples do marcador), eliminação parcial de marcador (ou seja, quando um dos dois marcadores foram omitidos na tradução, ficando apenas um deles) e adição de marcador (ou seja, quando não há marcador no idioma original, mas o tradutor o adicionou).publishersversionpublishe

    Inducing Discourse Resources Using Annotation Projection

    Get PDF
    An important aspect of natural language understanding and generation involves the recognition and processing of discourse relations. Building applications such as text summarization, question answering and natural language generation needs human language technology beyond the level of the sentence. To address this need, large scale discourse annotated corpora such as the Penn Discourse Treebank (PDTB; Prasad et al., 2008a) have been developed. Manually constructing discourse resources (e.g. discourse annotated corpora) is expensive, both in terms of time and expertise. As a consequence, such resources are only available for a few languages. In this thesis, we propose an approach that automatically creates two types of discourse resources from parallel texts: 1) PDTB-style discourse annotated corpora and 2) lexicons of discourse connectives. Our approach is based on annotation projection where linguistic annotations are projected from a source language to a target language in parallel texts. Our work has made several theoretical contributions as well as practical contributions to the field of discourse analysis. From a theoretical perspective, we have proposed a method to refine the naive method of discourse annotation projection by filtering annotations that are not supported by parallel texts. Our approach is based on the intersection between statistical word-alignment models and can automatically identify 65% of unsupported projected annotations. We have also proposed a novel approach for annotation projection that is independent of statistical word-alignment models. This approach is more robust to longer discourse connectives than approaches based on statistical word-alignment models. From a practical perspective, we have automatically created the Europarl ConcoDisco corpora from English-French parallel texts of the Europarl corpus (Koehn, 2009). In the Europarl ConcoDisco corpora, around 1 million occurrences of French discourse connectives are automatically aligned to their translation. From the French side of \parcorpus, we have extracted our first significant resource, the FrConcoDisco corpora. To our knowledge, the FrConcoDisco corpora are the first PDTB-style discourse annotated corpora for French where French discourse connectives are annotated with the discourse relations that they signaled. The FrConcoDisco corpora are significant in size as they contain more than 25 times more annotations than the PDTB. To evaluate the FrConcoDisco corpora, we showed how they can be used to train a classifier for the disambiguation of French discourse connectives with a high performance. The second significant resource that we automatically extracted from parallel texts is ConcoLeDisCo. ConcoLeDisCo is a lexicon of French discourse connectives mapped to PDTB discourse relations. While ConcoLeDisCo is useful by itself, as we showed in this thesis, it can be used to improve the coverage of manually constructed lexicons of discourse connectives such as LEXCONN (Roze et al., 2012)
    corecore