Search CORE

52,845 research outputs found

measuring text complexity for italian as a second language learning purposes

Author: Alfredo Milani
Filippo Santarelli
Luciana Forti
Luisa Piersanti
Stefania Spina
Valentino Santucci
Publication venue
Publication date: 01/01/2019
Field of study

Crossref

Open Access Repository

Building a Corpus of 2L English for Automatic Assessment: the CLEC Corpus

Author: Calderón López María Isabel
Merino Ferradá María del Carmen
Noya Gallardo María del Carmen
Zarco Tejada María Ángeles
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

In this paper we describe the CLEC corpus, an ongoing project set up at the University of Cádiz with the purpose of building up a large corpus of English as a 2L classified according to CEFR proficiency levels and formed to train statistical models for automatic proficiency assessment. The goal of this corpus is twofold: on the one hand it will be used as a data resource for the development of automatic text classification systems and, on the other, it has been used as a means of teaching innovation techniques

Elsevier - Publisher Connector

Repositorio de Objetos de Docencia e Investigación de la Universidad de Cádiz

Examining Scientific Writing Styles from the Perspective of Linguistic Complexity

Author: Bu Yi
Ding Ying
Lu Chao
Schnaars Matthew
Torvik Vetle
Wang Jie
Zhang Chengzhi
Publication venue
Publication date: 12/09/2018
Field of study

Publishing articles in high-impact English journals is difficult for scholars around the world, especially for non-native English-speaking scholars (NNESs), most of whom struggle with proficiency in English. In order to uncover the differences in English scientific writing between native English-speaking scholars (NESs) and NNESs, we collected a large-scale data set containing more than 150,000 full-text articles published in PLoS between 2006 and 2015. We divided these articles into three groups according to the ethnic backgrounds of the first and corresponding authors, obtained by Ethnea, and examined the scientific writing styles in English from a two-fold perspective of linguistic complexity: (1) syntactic complexity, including measurements of sentence length and sentence complexity; and (2) lexical complexity, including measurements of lexical diversity, lexical density, and lexical sophistication. The observations suggest marginal differences between groups in syntactical and lexical complexity.Comment: 6 figure

arXiv.org e-Print Archive

IUScholarWorks Open

Artificial Sequences and Complexity Measures

In this paper we exploit concepts of information theory to address the fundamental problem of identifying and defining the most suitable tools to extract, in a automatic and agnostic way, information from a generic string of characters. We introduce in particular a class of methods which use in a crucial way data compression techniques in order to define a measure of remoteness and distance between pairs of sequences of characters (e.g. texts) based on their relative information content. We also discuss in detail how specific features of data compression techniques could be used to introduce the notion of dictionary of a given sequence and of Artificial Text and we show how these new tools can be used for information extraction purposes. We point out the versatility and generality of our method that applies to any kind of corpora of character strings independently of the type of coding behind them. We consider as a case study linguistic motivated problems and we present results for automatic language recognition, authorship attribution and self consistent-classification.Comment: Revised version, with major changes, of previous "Data Compression approach to Information Extraction and Classification" by A. Baronchelli and V. Loreto. 15 pages; 5 figure

arXiv.org e-Print Archive

City Research Online

Crossref

Archivio della ricerca- Università di Roma La Sapienza

A linguistically-driven methodology for detecting impending and unfolding emergencies from social media messages

Author: Musacchio MARIA TERESA
Panizzon Raffaella
Zhang X.
Zorzi Virginia
Publication venue
Publication date: 01/01/2016
Field of study

Natural disasters have demonstrated the crucial role of social media before, during and after emergencies (Haddow & Haddow 2013). Within our EU project Sland \ub4 ail, we aim to ethically improve \ub4 the use of social media in enhancing the response of disaster-related agen-cies. To this end, we have collected corpora of social and formal media to study newsroom communication of emergency management organisations in English and Italian. Currently, emergency management agencies in English-speaking countries use social media in different measure and different degrees, whereas Italian National Protezione Civile only uses Twitter at the moment. Our method is developed with a view to identifying communicative strategies and detecting sentiment in order to distinguish warnings from actual disasters and major from minor disasters. Our linguistic analysis uses humans to classify alert/warning messages or emer-gency response and mitigation ones based on the terminology used and the sentiment expressed. Results of linguistic analysis are then used to train an application by tagging messages and detecting disaster- and/or emergency-related terminology and emotive language to simulate human rating and forward information to an emergency management system

Archivio istituzionale della ricerca - Università di Trieste

Archivio istituzionale della ricerca - Università di Padova

Measuring complexity with zippers

Author: Baronchelli Andrea
Caglioti Emanuele
Loreto Vittorio
Publication venue: 'IOP Publishing'
Publication date: 01/01/2005
Field of study

Physics concepts have often been borrowed and independently developed by other fields of science. In this perspective a significant example is that of entropy in Information Theory. The aim of this paper is to provide a short and pedagogical introduction to the use of data compression techniques for the estimate of entropy and other relevant quantities in Information Theory and Algorithmic Information Theory. We consider in particular the LZ77 algorithm as case study and discuss how a zipper can be used for information extraction.Comment: 10 pages, 3 figure

arXiv.org e-Print Archive

CiteSeerX

City Research Online

CERN Document Server

Archivio della ricerca- Università di Roma La Sapienza

The relation between pitch and gestures in a story-telling task

Author: Brugnerotto S.
Busa' MARIA GRAZIA
Publication venue: University of Nantes, France
Publication date: 01/01/2015
Field of study

Anecdotal evidence suggests that both pitch range and gestures contribute to the perception of speakers\u2019 liveliness in speech. However, the relation between speakers\u2019 pitch range and gestures has received little attention. It is possible that variations in pitch range might be accompanied by variations in gestures, and vice versa. In second language speech, the relation between pitch range and gestures might also be affected by speakers\u2019 difficulty in speaking the L2. In this pilot study we compare global pitch range and gesture rate in the speech of 3 native Italian speakers, telling the same story once in Italian and twice in English as part of an in-class oral presentation task. The hypothesis tested is that contextual factors, such as speakers\u2019 nervousness with the task, cause speakers to use narrow pitch range and limited gestures; a greater ease with the task, due to its repetition, cause speakers to use a wider pitch range and more gestures. This experimental hypothesis is partially confirmed by the results of this study

Archivio istituzionale della ricerca - Università di Padova