5 research outputs found

    Design and Annotation of the First Italian Corpus for Text Simplification

    Get PDF
    In this paper, we present design and construction of the first Italian corpus for automatic and semi--automatic text simplification. In line with current approaches, we propose a new annotation scheme specifically conceived to identify the typology of changes an original sentence undergoes when it is manually simplified. Such a scheme has been applied to two aligned Italian corpora, containing original texts with corresponding simplified versions, selected as representative of two different manual simplification strategies and addressing different target reader populations. Each corpus was annotated with the operations foreseen in the annotation scheme, covering different levels of linguistic description. Annotation results were analysed with the final aim of capturing peculiarities and differences of the different simplification strategies pursued in the two corpora

    The Corpus of Basque Simplified Texts (CBST)

    Get PDF
    In this paper we present the corpus of Basque simplified texts. This corpus compiles 227 original sentences of science popularisation domain and two simplified versions of each sentence. The simplified versions have been created following different approaches: the structural, by a court translator who considers easy-to-read guidelines and the intuitive, by a teacher based on her experience. The aim of this corpus is to make a comparative analysis of simplified text. To that end, we also present the annotation scheme we have created to annotate the corpus. The annotation scheme is divided into eight macro-operations: delete, merge, split, transformation, insert, reordering, no operation and other. These macro-operations can be classified into different operations. We also relate our work and results to other languages. This corpus will be used to corroborate the decisions taken and to improve the design of the automatic text simplification system for Basque.Cerrar texto de financiación Itziar Gonzalez-Dios's work was funded by a Ph.D. grant from the Basque Government and a postdoctoral grant for the new doctors from the Vice-rectory of Research of the University of the Basque Country (UPV/EHU). We are very grateful to the translator and teacher that simplified the texts. We also want to thank Dominique Brunato, Felice Dell'Orletta and Giulia Venturi for their help with the Italian annotation scheme and their suggestions when analysing the corpus and Oier Lopez de Lacalle for his help with the statistical analysis. We also want to express our gratitude to the anonymous reviewers for their comments and suggestions. This research was supported by the Basque Government (IT344-10), and the Spanish Ministry of Economy and Competitiveness, EXTRECM Project (TIN2013-46616-C2-1-R)

    Prominent linguistic features of pedagogical texts to provide consideration for authentic text simplification

    Get PDF
    Teaching materials are significant items that are unique and specific. Therefore, the selections should be relevant to students’ proficiency. This research aimed (1) to disclose lexical density, readability, nominalizations, and modifiers in pedagogical texts as teaching materials, (2) to reveal the linguistic features functional roles on text for pedagogical demand, and (3) to attempt to suggest consideration for simplification on authentic text. This research employed qualitative content analysis. The data sources were 18 pedagogical texts from senior high school textbooks by the Indonesian Ministry of Education. Human instruments and a text analyser for the automatic computation were utilized for the analysis under Systemic Functional Linguistics (SFL) pilots. This research disclosed the appropriate text lexical density for senior high school students is a fairly difficult construction. Then, nominalizations within the texts are unpreventable and process nominalization is frequently used. The nominalization and the modifiers affect sentence complexities; the nominalizations function to condense information, collocate words, create cohesiveness, interfere with conciseness, and use as trans-categorization while modifiers are to add explicitness to nouns. The simplification considerations are by utilizing lexical density and readability algorithm, de-nominalization, measuring modifiers, and splitting substance of modifiers to increase text accessibility

    Capire i documenti in L2: dall'analisi della comprensibilit\ue0 di un corpus di testi istituzionali per stranieri alla sperimentazione di approcci didattici e linguistici.

    Get PDF
    La tesi verte sull\u2019analisi della comprensibilita\u300 e della leggibilita\u300 di un corpus di testi istituzionali italiani destinati agli stranieri e sulla sperimentazione di soluzioni redazionali e didattiche efficaci. E\u300 noto che il linguaggio istituzionale italiano (soprattutto nelle varianti utilizzate dalle amministrazioni e dal diritto) spesso tende ad essere inutilmente complicato, soprattutto per gli stranieri che devono necessariamente far fronte a numerose pratiche burocratiche ed amministrative per poter risiedere sul territorio italiano; la ricerca intende agevolarne l\u2019integrazione facilitando l\u2019accesso ai documenti. Per poter raccogliere quante piu\u300 informazioni possibili sugli elementi che definiscono comprensibilita\u300 e leggibilita\u300 di questa tipologia di testi riguardo all\u2019utenza straniera, si e\u300 deciso di creare un corpus di testi istituzionali destinati ai migranti (ISTR) e di analizzarlo in maniera computazionale. Allo stesso tempo 101 studenti stranieri sono stati testati nella comprensione di diversi testi istituzionali a loro rivolti. Grazie all\u2019analisi dei dati e\u300 stato possibile stilare un elenco delle marche linguistiche di difficolta\u300 del linguaggio istituzionale destinato agli stranieri e dei fattori cognitivi, socio-linguistici ed emotivi che intervengono durante il processo di comprensione. In seguito, sono state selezionate due strategie per il miglioramento della comprensione: la semplificazione del testo e l\u2019ideazione di un corso di formazione sul linguaggio istituzionale italiano. L\u2019analisi statistica (dependent t-test e Anova test) dei dati e la comparazione tra i gruppi di studenti mostra che sia la semplificazione (df = 59, p- value = 1.066e-09) dei testi sia la frequenza del corso di formazione migliorano la comprensione dei testi istituzionali (F value = 4.56, p- value=0.037 *). I risultati mostrano che la congiunzione di sforzi istituzionali (con la redazione di testi a difficolta\u300 controllata per gli stranieri) ed educativi (con la creazione di corsi di lingua specifici) puo\u300 effettivamente agevolare l\u2019integrazione dei migranti nel tessuto socio- politico delle comunita\u300 di accoglienza