756 research outputs found

    Entropy and Graph Based Modelling of Document Coherence using Discourse Entities: An Application

    Full text link
    We present two novel models of document coherence and their application to information retrieval (IR). Both models approximate document coherence using discourse entities, e.g. the subject or object of a sentence. Our first model views text as a Markov process generating sequences of discourse entities (entity n-grams); we use the entropy of these entity n-grams to approximate the rate at which new information appears in text, reasoning that as more new words appear, the topic increasingly drifts and text coherence decreases. Our second model extends the work of Guinaudeau & Strube [28] that represents text as a graph of discourse entities, linked by different relations, such as their distance or adjacency in text. We use several graph topology metrics to approximate different aspects of the discourse flow that can indicate coherence, such as the average clustering or betweenness of discourse entities in text. Experiments with several instantiations of these models show that: (i) our models perform on a par with two other well-known models of text coherence even without any parameter tuning, and (ii) reranking retrieval results according to their coherence scores gives notable performance gains, confirming a relation between document coherence and relevance. This work contributes two novel models of document coherence, the application of which to IR complements recent work in the integration of document cohesiveness or comprehensibility to ranking [5, 56]

    Comprehensible legal texts – utopia or a question of wording? On processing rephrased German court decisions

    Get PDF
    This paper presents a study on the comprehensibility of rephrased syntactic structures in German court decisions. While there are a number of studies using psycholinguistic methods to investigate the comprehensibility of original legal texts, we are not aware of any study looking into the effect resolving complex structures has on the comprehensibility. Our study combines three methodological steps. First, we analyse an annotated corpus of court decisions, press releases and newspaper reports on these decisions in order to detect those complex structures in the decisions which distinguish them from the other text types. Secondly, these structures are rephrased into two increasingly simple versions. Finally, all versions are subjected to a self paced reading experiment. The findings suggest that rephrasing greatly enhances the comprehensibility for the lay reader

    Issues on topics

    Get PDF
    The present volume contains papers that bear mainly on issues concerning the topic concept. This concept is of course very broad and diverse. Also, different views are expressed in this volume. Some authors concentrate on the status of topics and non-topics in so-called topic prominent languages (i.e. Chinese), others focus on the syntactic behavior of topical constituents in specific European languages (German, Greek, Romance languages). The last contribution tries to bring together the concept of discourse topic (a non-syntactic notion) and the concept of sentence topic, i.e. that type of topic that all the preceding papers are concerned with

    Comprehensible legal texts - utopia or a question of wording? On processing rephrased German court decisions

    Get PDF
    This paper presents a study on the comprehensibility of rephrased syntactic structures in German court decisions. While there are a number of studies using psycholinguistic methods to investigate the comprehensibility of original legal texts, we are not aware of any study looking into the effect resolving complex structures has on the comprehensibility. Our study combines three methodological steps. First, we analyse an annotated corpus of court decisions, press releases and newspaper reports on these decisions in order to detect those complex structures in the decisions which distinguish them from the other text types. Secondly, these structures are rephrased into two increasingly simple versions. Finally, all versions are subjected to a self paced reading experiment. The findings suggest that rephrasing greatly enhances the comprehensibility for the lay reader

    Lexical and contextual cue effects in discourse expectations: Experimenting with German 'zwar...aber' and English 'true/sure...but'

    Get PDF
    Existing literature shows that readers and listeners rapidly adjust their expectations about likely discourse continuations through discourse markers, as well as through other linguistic and extra-linguistic cues. However, it is unclear whether (i) the facilitative effects of various (extra-)linguistic cues differ in quality and (ii) whether the effects interact with one another in any principled manner. We conducted two self-paced reading experiments on concessive constructions in German and English wherein optional lexical and/or contextual cues appeared ahead of the concessive discourse marker. The results demonstrate that readers can use both types of cues to anticipate the upcoming discourse relation. Our study thus provides novel evidence for expectation-driven accounts of discourse processing and elucidates the functions of discourse signals. Furthermore, the results also show that the role that a type of cues plays is subject to cross-linguistic variation

    Information structure and grammaticalization. Discourse markers and utterance position in Catalan and German

    Get PDF
    This paper explores the relation between the position of discourse markers and the instructions they provide on the information structure of utterances. We assume that, next to other types of indications, discourse markers encode during their grammaticalization process instructions about the informative status of the discourse constituents on which they operate and about their relevance for text progression. Our aim is to account for these indications and show the benefits of using a model of discourse units for the description of markers. For this purpose, we will adopt the Basel model for discourse segmentation, which regards text as a pragmatic unit consisting of hierarchically organized information units. The study concludes that metalinguistic operations such as reformulation in written language can be better explained on the grounds of the dynamics governing text construction and organization

    Teaching the form-function mapping of German ‘prefield’ elements using Concept-Based Instruction

    Get PDF
    This publication is with permission of the rights owner freely accessible due to an Alliance licence and a national licence (funded by the DFG, German Research Foundation) respectively.Empirical findings in Second Language Acquisition suggest that the basic structure of German declarative sentences, described in terms of topological fields, poses certain challenges to learners of German as a foreign language. The problem of multiple prefield elements, resulting in ungrammatical verb-third sentences, figures most prominently in the literature. While the so-called V2 constraint is usually treated as a purely formal feature of German syntax both in the empirical as well as in the pedagogical literature, the present paper adopts a usage-based perspective, viewing language as an inventory of form-function mappings. Basic functions of prefield elements have already been identified in research on textual grammar and information structure. This paper presents results from a pilot study with Japanese elementary learners of German as a foreign language, where the form-function mapping of German prefield elements was explicitly taught following the guidelines of an approach called Concept-Based Instruction. The findings indicate that, with a focus on the function-function mapping, it is in fact possible to explicitly teach these rather abstract regularities of German to beginning learners. The participants’ language production exhibits a prefield variation pattern similar to that of L1 German speakers; at the same time the learners produce very few ungrammatical verb-third sentences.Peer Reviewe
    corecore