Search CORE

84,463 research outputs found

Examining Scientific Writing Styles from the Perspective of Linguistic Complexity

Author: Bu Yi
Ding Ying
Lu Chao
Schnaars Matthew
Torvik Vetle
Wang Jie
Zhang Chengzhi
Publication venue
Publication date: 12/09/2018
Field of study

Publishing articles in high-impact English journals is difficult for scholars around the world, especially for non-native English-speaking scholars (NNESs), most of whom struggle with proficiency in English. In order to uncover the differences in English scientific writing between native English-speaking scholars (NESs) and NNESs, we collected a large-scale data set containing more than 150,000 full-text articles published in PLoS between 2006 and 2015. We divided these articles into three groups according to the ethnic backgrounds of the first and corresponding authors, obtained by Ethnea, and examined the scientific writing styles in English from a two-fold perspective of linguistic complexity: (1) syntactic complexity, including measurements of sentence length and sentence complexity; and (2) lexical complexity, including measurements of lexical diversity, lexical density, and lexical sophistication. The observations suggest marginal differences between groups in syntactical and lexical complexity.Comment: 6 figure

arXiv.org e-Print Archive

IUScholarWorks Open

Detecting Hate Speech in Social Media

Author: Malmasi Shervin
Zampieri Marcos
Publication venue
Publication date: 26/12/2017
Field of study

In this paper we examine methods to detect hate speech in social media, while distinguishing this from general profanity. We aim to establish lexical baselines for this task by applying supervised classification methods using a recently released dataset annotated for this purpose. As features, our system uses character n-grams, word n-grams and word skip-grams. We obtain results of 78% accuracy in identifying posts across three classes. Results demonstrate that the main challenge lies in discriminating profanity and hate speech from each other. A number of directions for future work are discussed.Comment: Proceedings of Recent Advances in Natural Language Processing (RANLP). pp. 467-472. Varna, Bulgari

arXiv.org e-Print Archive

Crossref

From Statistical to Geolinguistic Data: Mapping and Measuring Linguistic Diversity

Author: Monica Barni
Publication venue
Publication date
Field of study

The aim of this paper is describing a new methodology for mapping and measuring linguistic diversity in a territory. The three methods that have been created by the Centro di eccellenza della ricerca Osservatorio linguistico permanente dell’italiano diffuso fra stranieri e delle lingue immigrate in Italia at the Università per Stranieri di Siena are the following: - the Toscane favelle model, a procedural application which passes from quantitative statistical data to a demolinguistic paradigm; - the Monterotondo-Mentana model. The surveys of quantitative and qualitative data are carried out using traditional tools (questionnaires, audio and video recordings) as well as advanced technologies; - the Esquilino model. Digital maps are created which present the distribution of the immigrant languages through the presence of signs in linguistic landscape. The final objective is putting together the data surveyed by the three methods in order to have a “speaking” territory, in which each point surveyed identifies the languages spoken and the various linguistic manifestations.Language Contact, Linguistic Diversity, Immigrant Languages, Geolinguistic Data, New Methodologies in Sociolinguistic Research

Research Papers in Economics

Conservation and use of genetic resources of underutilized crops in the Americas - A continental analysis

Author: Galluzzi Gea
López Noriega Isabel
Publication venue: 'MDPI AG'
Publication date: 01/01/2014
Field of study

Latin America is home to dramatically diverse agroecological regions which harbor a high concentration of underutilized plant species, whose genetic resources hold the potential to address challenges such as sustainable agricultural development, food security and sovereignty, and climate change. This paper examines the status of an expert-informed list of underutilized crops in Latin America and analyses how the most common features of underuse apply to these. The analysis pays special attention to if and how existing international policy and legal frameworks on biodiversity and plant genetic resources effectively support or not the conservation and sustainable use of underutilized crops. Results show that not all minor crops are affected by the same degree of neglect, and that the aspects under which any crop is underutilized vary greatly, calling for specific analyses and interventions. We also show that current international policy and legal instruments have so far provided limited stimulus and funding for the conservation and sustainable use of the genetic resources of these crops. Finally, the paper proposes an analytical framework for identifying and evaluating a crop’s underutilization, in order to define the most appropriate type and levels of intervention (international, national, local) for improving its statu

Multidisciplinary Digital Publishing Institute

CiteSeerX

Directory of Open Access Journals

CGSpace

#Bieber + #Blast = #BieberBlast: Early Prediction of Popular Hashtag Compounds

Author: Bagasheva A.
Caleffi P.-M.
Cassell J.
Cook P.
Croft W.
Cunha E.
Eisenstein J.
Eisenstein J.
Giegerich H. J.
Hacken P.
Hong L.
Hu Y.
Lee C.-y.
Lerman K.
Lin Y.-R.
Lui M.
Léturgie A.
Medler D. A.
Milroy J.
Nguyen T.
Owoputi O.
Ritter A.
Ritter A.
Weng L.
Yang J.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/10/2015
Field of study

Compounding of natural language units is a very common phenomena. In this paper, we show, for the first time, that Twitter hashtags which, could be considered as correlates of such linguistic units, undergo compounding. We identify reasons for this compounding and propose a prediction model that can identify with 77.07% accuracy if a pair of hashtags compounding in the near future (i.e., 2 months after compounding) shall become popular. At longer times T = 6, 10 months the accuracies are 77.52% and 79.13% respectively. This technique has strong implications to trending hashtag recommendation since newly formed hashtag compounds can be recommended early, even before the compounding has taken place. Further, humans can predict compounds with an overall accuracy of only 48.7% (treated as baseline). Notably, while humans can discriminate the relatively easier cases, the automatic framework is successful in classifying the relatively harder cases.Comment: 14 pages, 4 figures, 9 tables, published in CSCW (Computer-Supported Cooperative Work and Social Computing) 2016. in Proceedings of 19th ACM conference on Computer-Supported Cooperative Work and Social Computing (CSCW 2016

arXiv.org e-Print Archive

Crossref

Language identification with suprasegmental cues: A study based on speech resynthesis

Author: Mehler Jacques
Ramus Franck
Publication venue
Publication date: 01/01/1999
Field of study

This paper proposes a new experimental paradigm to explore the discriminability of languages, a question which is crucial to the child born in a bilingual environment. This paradigm employs the speech resynthesis technique, enabling the experimenter to preserve or degrade acoustic cues such as phonotactics, syllabic rhythm or intonation from natural utterances. English and Japanese sentences were resynthesized, preserving broad phonotactics, rhythm and intonation (Condition 1), rhythm and intonation (Condition 2), intonation only (Condition 3), or rhythm only (Condition 4). The findings support the notion that syllabic rhythm is a necessary and sufficient cue for French adult subjects to discriminate English from Japanese sentences. The results are consistent with previous research using low-pass filtered speech, as well as with phonological theories predicting rhythmic differences between languages. Thus, the new methodology proposed appears to be well-suited to study language discrimination. Applications for other domains of psycholinguistic research and for automatic language identification are considered

CogPrints Cognitive Sciences Eprint Archive

Diasporic Indigeneity: Indigenizing Indigenous Immigrants and Nativizing Native Nations

Author: Fox Tree Erich
Publication venue: Scholars Commons @ Laurier
Publication date: 23/11/2015
Field of study

Wilfrid Laurier University

Formulaic Sequences as Fluency Devices in the Oral Production of Native Speakers of Polish

Author: Aijmer
Aijmer
Anderson
Anderson
Biber
Biber
Boersma
Boersma
Caie
Caie
Chambers
Chambers
Corrigan
Corrigan
Corrigan
Corrigan
Cowie
Cowie
Dahlmann
Dahlmann
De Jong
De Jong
De Jong
De Jong
Dechert
Dechert
Erman
Erman
Ewa Guz
Fillmore
Fillmore
Forsberg
Forsberg
Forsberg
Forsberg
Freed
Freed
Freed
Freed
Gatbonton
Gatbonton
Gatbonton
Gatbonton
Goldman
Goldman
Guillot
Guillot
Housen
Housen
Hunston
Hunston
Knutsson
Knutsson
Kormos
Kormos
Kormos
Kormos
Kuiper
Kuiper
Lennon
Lennon
Levelt
Levelt
Meunier
Meunier
Meunier
Meunier
Moon
Moon
Nattinger
Nattinger
Nonnative
Nonnative
Olofsson
Olofsson
Pawley
Pawley
Pawley
Pawley
Pawley
Pawley
Peters
Peters
Raupach
Raupach
Read
Read
Renouf
Renouf
Schmitt
Schmitt
Schmitt
Schmitt
Segalowitz
Segalowitz
Segalowitz
Segalowitz
Segalowitz
Segalowitz
Sinclair
Sinclair
Skehan
Skehan
Skehan
Skehan
Skehan
Skehan
Tavakoli
Tavakoli
Towell
Towell
Weinert
Weinert
Wiktorsson
Wiktorsson
Wiktorsson
Wiktorsson
Wood
Wood
Wood
Wood
Wood
Wood
Wood
Wood
Wood
Wood
Wood
Wood
Wood
Wood
Wray
Wray
Wray
Wray
Wray
Wray
Wray
Wray
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/01/2014
Field of study

In this paper we attempt to determine the nature and strength of the relationship between the use of formulaic sequences and productive fluency of native speakers of Polish. In particular, we seek to validate the claim that speech characterized by a higher incidence of formulaic sequences is produced more rapidly and with fewer hesitation phenomena. The analysis is based on monologic speeches delivered by 45 speakers of L1 Polish. The data include both the recordings and their transcriptions annotated for a number of objective fluency measures. In the first part of the study the total of formulaic sequences is established for each sample. This is followed by determining a set of temporal measures of the speakers’ output (speech rate, articulation rate, mean length of runs, mean length of pauses, phonation time ratio). The study provides some preliminary evidence of the fluency-enhancing role of formulaic language. Our results show that the use of formulaic sequences is positively and significantly correlated with speech rate, mean length of runs and phonation time ratio. This suggests that a higher concentration of formulaic material in output is associated with faster speed of speech, longer stretches of speech between pauses and an increased amount of time filled with speech

Crossref

Biblioteka Nauki - repozytorium artykuÅÃ³w

Repozytorium Uniwersytetu Łódzkiego (University of Lodz Repository)