15 research outputs found

    Optimizing scientific communication : the role of relative clauses as markers of complexity in English and German scientific writing between 1650 and 1900

    Get PDF
    The aim of this thesis is to show that both scientific English and German have become increasingly optimized for scientific communication from 1650 to 1900 by adapting the usage of relative clauses as markers of grammatical complexity. While the lexico-grammatical changes in terms of features and their frequency distribution in scientific writing during this period are well documented, in the present work we are interested in the underlying factors driving these changes and how they affect efficient scientific communication. As the scientific register emerges and evolves, it continuously adapts to the changing communicative needs posed by extra-linguistic pressures arising from the scientific community and its achievements. We assume that, over time, scientific language maintains communicative efficiency by balancing lexico-semantic expansion with a reduction in (lexico-)grammatical complexity on different linguistic levels. This is based on the idea that linguistic complexity affects processing difficulty and, in turn, communicative efficiency. To achieve optimization, complexity is adjusted on the level of lexico-grammar, which is related to expectation-based processing cost, and syntax, which is linked to working memory-based processing cost. We conduct five corpus-based studies comparing English and German scientific writing to general language. The first two investigate the development of relative clauses in terms of lexico-grammar, measuring the paradigmatic richness and syntagmatic predictability of relativizers as indicators of expectation-based processing cost. The results confirm that both levels undergo a reduction in complexity over time. The other three studies focus on the syntactic complexity of relative clauses, investigating syntactic intricacy, locality, and accessibility. Results show that intricacy and locality decrease, leading to lower grammatical complexity and thus mitigating memory-based processing cost. However, accessibility is not a factor of complexity reduction over time. Our studies reveal a register-specific diachronic complexity reduction in scientific language both in lexico-grammar and syntax. The cross-linguistic comparison shows that English is more advanced in its register-specific development while German lags behind due to a later establishment of the vernacular as a language of scientific communication.This work is supported by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – Project-ID 232722074 – SFB 110

    Cross-lingual Incongruences in the Annotation of Coreference

    Get PDF
    In the present paper, we deal with incongruences in English-German multilingual coreference annotation and present automated methods to discover them. More specifically, we automatically detect full coreference chains in parallel texts and analyse discrepancies in their annotations. In doing so, we wish to find out whether the discrepancies rather derive from language typological constraints, from the translation or the actual annotation process. The results of our study contribute to the referential analysis of similarities and differences across languages and support evaluation of cross-lingual coreference annotation. They are also useful for cross-lingual coreference resolution systems and contrastive linguistic studies

    Tracing Syntactic Change in the Scientific Genre: Two Universal Dependency-parsed Diachronic Corpora of Scientific English and German

    Get PDF
    We present two comparable diachronic corpora of scientific English and German from the Late Modern Period (17th c.--19th c.) annotated with Universal Dependencies. We describe several steps of data pre-processing and evaluate the resulting parsing accuracy showing how our pre-processing steps significantly improve output quality. As a sanity check for the representativity of our data, we conduct a case study comparing previously gained insights on grammatical change in the scientific genre with our data. Our results reflect the often reported trend of English scientific discourse towards heavy noun phrases and a simplification of the sentence structure (Halliday, 1988; Halliday and Martin, 1993; Biber and Gray, 2011; Biber and Gray, 2016). We also show that this trend applies to German scientific discourse as well. The presented corpora are valuable resources suitable for the contrastive analysis of syntactic diachronic change in the scientific genre between 1650 and 1900. The presented pre-processing procedures and their evaluations are applicable to other languages and can be useful for a variety of Natural Language Processing tasks such as syntactic parsing.This work is supported by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – Project-ID 232722074 – SFB 1102

    ParCorFull: a Parallel Corpus Annotated with Full Coreference

    Get PDF
    ParCorFull is a parallel corpus annotated with full coreference chains that has been created to address an important problem that machine translation and other multilingual natural language processing (NLP) technologies face -- translation of coreference across languages. Our corpus contains parallel texts for the language pair English-German, two major European languages. Despite being typologically very close, these languages still have systemic differences in the realisation of coreference, and thus pose problems for multilingual coreference resolution and machine translation. Our parallel corpus covers the genres of planned speech (public lectures) and newswire. It is richly annotated for coreference in both languages, including annotation of both nominal coreference and reference to antecedents expressed as clauses, sentences and verb phrases. This resource supports research in the areas of natural language processing, contrastive linguistics and translation studies on the mechanisms involved in coreference translation in order to develop a better understanding of the phenomenon

    Cross-lingual Incongruences in the Annotation of Coreference

    Get PDF
    In the present paper, we deal with incongruences in English-German multilingual coreference annotation and present automated methods to discover them. More specifically, we automatically detect full coreference chains in parallel texts and analyse discrepancies in their annotations. In doing so, we wish to find out whether the discrepancies rather derive from language typological constraints, from the translation or the actual annotation process. The results of our study contribute to the referential analysis of similarities and differences across languages and support evaluation of cross-lingual coreference annotation. They are also useful for cross-lingual coreference resolution systems and contrastive linguistic studies

    Relativizers as markers of grammatical complexity: A diachronic, cross-register study of English and German

    No full text
    In this paper, we investigate grammatical complexity as a register feature of scientific English and German. Specifically, we carry out a diachronic comparison between general and scientific discourse in the two languages from the 17th to the 19th century, using relativizers as proxies for grammatical complexity. We ground our study in register theory (Halliday and Hasan, 1985), assuming that language use reflects contextual factors, which contribute to the formation of registers (Quirk et al., 1985; Biber et al., 1999; Teich et al., 2016). Our findings show a clear tendency towards grammatical simplification in scientific discourse in both languages with English spearheading the trend early on and German following later

    Coreference Strategies in English-German Translation

    No full text
    We present a study focusing on variation of coreferential devices in English original TED talks and news texts and their German translations. Using exploratory techniques we contemplate a diverse set of coreference devices as features which we assume indicate language-specific and register-based variation as well as potential translation strategies. Our findings reflect differences on both dimensions with stronger variation along the lines of register than between languages. By exposing interactions between text type and cross-linguistic variation, they can also inform multilingual NLP applications, especially machine translation

    Coreference Strategies in English-German Translation

    No full text
    We present a study focusing on variation of coreferential devices in English original TED talks and news texts and their German translations. Using exploratory techniques we contemplate a diverse set of coreference devices as features which we assume indicate language-specific and register-based variation as well as potential translation strategies. Our findings reflect differences on both dimensions with stronger variation along the lines of register than between languages. By exposing interactions between text type and cross-linguistic variation, they can also inform multilingual NLP applications, especially machine translation

    Cross-lingual Inconguences in the Annotation of Coreference

    No full text
    In the present paper, we deal with incongruences in English-German multilingual coreference annotation and present automated methods to discover them. More specifically, we automatically detect full coreference chains in parallel texts and analyse discrepancies in their annotations. In doing so, we wish to find out whether the discrepancies rather derive from language typological constraints, from the translation or the actual annotation process. The results of our study contribute to the referential analysis of similarities and differences across languages and support evaluation of cross-lingual coreference annotation. They are also useful for cross-lingual coreference resolution systems and contrastive linguistic studies
    corecore