9 research outputs found

    A Corpus-Based Analysis of Cohesion in L2 Writing by Undergraduates in Ecuador

    Get PDF
    In finding out the nature of cohesion in L2 writing, the present study set out to address three research questions: (1) What types of cohesion relations occur in L2 writing at the sentence, paragraph, and whole-text levels? (2) What is the relationship between lexico-grammatical cohesion features and teachers’ judgements of writing quality? (3) Do expectations of cohesion suggested by the CEFR match what is found in student writing? To answer those questions, a corpus of 240 essays and 240 emails from college- level students learning English as a foreign language in Ecuador enabled the analysis of cohesion. Each text included the scores, or teachers’ judgements of writing quality aligned to the upper-intermediate level (or B2) as proposed by the Common European Framework of Reference for learning, teaching, and assessing English as a foreign language. Lexical and grammatical items used by L2 students to build relationships of meaning in sentences, paragraphs, and the entire text were considered to analyse cohesion in L2 writing. Utilising Natural Language Processing tools (e.g., TAACO, TextInspector, NVivo), the analysis focused on determining which cohesion features (e.g., word repetition/overlap, semantical similarity, connective words) predicted the teachers’ judgements of writing quality in the collected essays and emails. The findings indicate that L2 writing is characterised by word overlap and synonyms occurring at the paragraph level and, to a lesser degree, cohesion between sentences and the entire text (e.g., connective words). Whilst these cohesion features positively and negatively predicted the teachers’ scores, a cautious interpretation of these findings is required, as many other factors beyond cohesion features must have also influenced the allocation of scores in L2 writing

    Topics, Presuppositions, and Theticity: An Empirical Study of Verb-Subject Clauses in Albanian, Greek, and Serbo-Croat

    Get PDF
    Verb-Subject order is often claimed to be the surface expression of thetic utterances, which are supposed to be ontologically different from the classical Aristotelian categoric type: thetic utterances are not divided in two parts (subject and predicate, topic and comment), but represent the information they convey as a cognitive whole. The purpose of the present study is to offer a detailed description of the clauses with this word order in Albanian, Greek, and Serbo-Croat, in which the verb-subject strategy is a very prominent one, and, based on these data, to reexamine the postulates of the theory of two basic utterance types. The results may be subsumed in two claims: (1) The equation "VS = thetic" does not hold true, because subject postponement is a distinctive feature of at least three constructions, which I labeled Inversion, VsX-Construction and vS-Construction. Of these, only the latter resembles what is usually called thetic. (2) The existence of a non-categoric utterance type does not automatically follow from the existence of vS-Construction, since this construction also displays a specific kind of topic-comment articulation, explainable in terms of certain word order and intonation rules of the three languages in question

    Topics, presuppositions, and theticity: An empirical study of verb-subject clauses in Albanian, Greek, and Serbo-Croat

    Get PDF
    LoC Class: PG9522, LoC Subject Headings: Albanian language--Clauses, Greek language/Modern--Clauses, Serbo-Croatian language--Clause

    The use of and-coordination in terms of its syntactic (a)symmetry in argumentative essays : a corpus-based study of three university learner groups in MICUSP and NUCLE.

    Get PDF
    Studies found EL learners overuse and as an additive connector at the sentence-initial position (Bolton, Hung, & Nelson, 2002), and they underuse and as a coordinator (Leung, 2005). Generally, the use of the and-coordinator has often been overlooked in corpus research and in English teaching because of its seemingly simplicity. To test previous findings about the and-coordinator and to examine the influence of English proficiency on the use of and in academic writing, three learner corpora--MICUSP-NNS (advanced level), MICUSP-NS (advanced level), and NUCLE-NNS (upper-intermediate) were compared, with regard to the use of (a)symmetric structures of the and-coordination. Each corpus contains 31 argumentative essays written by 31 university students

    Language and Literacy Development in Children Learning English as an Additional Language: a Longitudinal Cohort and Vocabulary Intervention Study

    Get PDF
    Children learning English as an Additional Language (EAL) are a growing but understudied population of learners in English primary schools. As EAL learners vary in their amount of exposure to English, they often begin formal education with relatively lower levels of English language proficiency than their monolingual peers. Little is known about the English language and literacy developmental trajectories of EAL learners in England, and particularly, the extent to which the two groups of learners converge or diverge over time. Additionally, no studies to date have assessed the efficacy of explicit, targeted vocabulary instruction in this group of learners in the run up to the end of primary school. The present study comprised a longitudinal cohort study of 48 EAL learners and 33 monolingual peers who were assessed at three time points between Year 4 (age 8-9) and Year 5 (age 9-10) on a battery of English language and literacy measures. All EAL learners had received English-medium education since at least Year 1 (age 5-6). Relative to their monolingual peers, EAL learners showed strengths in rapid naming, single-word reading efficiency, and spelling, but weaknesses in vocabulary knowledge, expressive syntax, and passage reading accuracy. Where they exhibited weaknesses, EAL learners generally did not make sufficient progress in order to close gaps with their monolingual peers. A subgroup of nine EAL learners with English vocabulary weaknesses also participated in short-term vocabulary intervention. Working one-to-one with speech and language therapy students, children showed significant gains in receptive and productive knowledge of target vocabulary which were maintained six months later. Together, results indicate that regular classroom instruction may be insufficient for EAL learners to close gaps with their monolingual peers in certain domains of oral language, but that targeted vocabulary instruction may be an effective means of achieving this end

    Discourse analysis of arabic documents and application to automatic summarization

    Get PDF
    Dans un discours, les textes et les conversations ne sont pas seulement une juxtaposition de mots et de phrases. Ils sont plutôt organisés en une structure dans laquelle des unités de discours sont liées les unes aux autres de manière à assurer à la fois la cohérence et la cohésion du discours. La structure du discours a montré son utilité dans de nombreuses applications TALN, y compris la traduction automatique, la génération de texte et le résumé automatique. L'utilité du discours dans les applications TALN dépend principalement de la disponibilité d'un analyseur de discours performant. Pour aider à construire ces analyseurs et à améliorer leurs performances, plusieurs ressources ont été annotées manuellement par des informations de discours dans des différents cadres théoriques. La plupart des ressources disponibles sont en anglais. Récemment, plusieurs efforts ont été entrepris pour développer des ressources discursives pour d'autres langues telles que le chinois, l'allemand, le turc, l'espagnol et le hindi. Néanmoins, l'analyse de discours en arabe standard moderne (MSA) a reçu moins d'attention malgré le fait que MSA est une langue de plus de 422 millions de locuteurs dans 22 pays. Le sujet de thèse s'intègre dans le cadre du traitement automatique de la langue arabe, plus particulièrement, l'analyse de discours de textes arabes. Cette thèse a pour but d'étudier l'apport de l'analyse sémantique et discursive pour la génération de résumé automatique de documents en langue arabe. Pour atteindre cet objectif, nous proposons d'étudier la théorie de la représentation discursive segmentée (SDRT) qui propose un cadre logique pour la représentation sémantique de phrases ainsi qu'une représentation graphique de la structure du texte où les relations de discours sont de nature sémantique plutôt qu'intentionnelle. Cette théorie a été étudiée pour l'anglais, le français et l'allemand mais jamais pour la langue arabe. Notre objectif est alors d'adapter la SDRT à la spécificité de la langue arabe afin d'analyser sémantiquement un texte pour générer un résumé automatique. Nos principales contributions sont les suivantes : Une étude de la faisabilité de la construction d'une structure de discours récursive et complète de textes arabes. En particulier, nous proposons : Un schéma d'annotation qui couvre la totalité d'un texte arabe, dans lequel chaque constituant est lié à d'autres constituants. Un document est alors représenté par un graphe acyclique orienté qui capture les relations explicites et les relations implicites ainsi que des phénomènes de discours complexes, tels que l'attachement, la longue distance du discours pop-ups et les dépendances croisées. Une nouvelle hiérarchie des relations de discours. Nous étudions les relations rhétoriques d'un point de vue sémantique en se concentrant sur leurs effets sémantiques et non pas sur la façon dont elles sont déclenchées par des connecteurs de discours, qui sont souvent ambigües en arabe. o une analyse quantitative (en termes de connecteurs de discours, de fréquences de relations, de proportion de relations implicites, etc.) et une analyse qualitative (accord inter-annotateurs et analyse des erreurs) de la campagne d'annotation. Un outil d'analyse de discours où nous étudions à la fois la segmentation automatique de textes arabes en unités de discours minimales et l'identification automatique des relations explicites et implicites du discours. L'utilisation de notre outil pour résumer des textes arabes. Nous comparons la représentation de discours en graphes et en arbres pour la production de résumés.Within a discourse, texts and conversations are not just a juxtaposition of words and sentences. They are rather organized in a structure in which discourse units are related to each other so as to ensure both discourse coherence and cohesion. Discourse structure has shown to be useful in many NLP applications including machine translation, natural language generation and language technology in general. The usefulness of discourse in NLP applications mainly depends on the availability of powerful discourse parsers. To build such parsers and improve their performances, several resources have been manually annotated with discourse information within different theoretical frameworks. Most available resources are in English. Recently, several efforts have been undertaken to develop manually annotated discourse information for other languages such as Chinese, German, Turkish, Spanish and Hindi. Surprisingly, discourse processing in Modern Standard Arabic (MSA) has received less attention despite the fact that MSA is a language with more than 422 million speakers in 22 countries. Computational processing of Arabic language has received a great attention in the literature for over twenty years. Several resources and tools have been built to deal with Arabic non concatenative morphology and Arabic syntax going from shallow to deep parsing. However, the field is still very vacant at the layer of discourse. As far as we know, the sole effort towards Arabic discourse processing was done in the Leeds Arabic Discourse Treebank that extends the Penn Discourse TreeBank model to MSA. In this thesis, we propose to go beyond the annotation of explicit relations that link adjacent units, by completely specifying the semantic scope of each discourse relation, making transparent an interpretation of the text that takes into account the semantic effects of discourse relations. In particular, we propose the first effort towards a semantically driven approach of Arabic texts following the Segmented Discourse Representation Theory (SDRT). Our main contributions are: A study of the feasibility of building a recursive and complete discourse structures of Arabic texts. In particular, we propose: An annotation scheme for the full discourse coverage of Arabic texts, in which each constituent is linked to other constituents. A document is then represented by an oriented acyclic graph, which captures explicit and implicit relations as well as complex discourse phenomena, such as long-distance attachments, long-distance discourse pop-ups and crossed dependencies. A novel discourse relation hierarchy. We study the rhetorical relations from a semantic point of view by focusing on their effect on meaning and not on how they are lexically triggered by discourse connectives that are often ambiguous, especially in Arabic. A thorough quantitative analysis (in terms of discourse connectives, relation frequencies, proportion of implicit relations, etc.) and qualitative analysis (inter-annotator agreements and error analysis) of the annotation campaign. An automatic discourse parser where we investigate both automatic segmentation of Arabic texts into elementary discourse units and automatic identification of explicit and implicit Arabic discourse relations. An application of our discourse parser to Arabic text summarization. We compare tree-based vs. graph-based discourse representations for producing indicative summaries and show that the full discourse coverage of a document is definitively a plus

    Studi di Linguistica Slava. Volume dedicato a Lucyna Gebert

    Get PDF

    Studi di linguistica slava

    Get PDF
    corecore