Search CORE

52,671 research outputs found

Language identification with suprasegmental cues: A study based on speech resynthesis

Author: Mehler Jacques
Ramus Franck
Publication venue
Publication date: 01/01/1999
Field of study

This paper proposes a new experimental paradigm to explore the discriminability of languages, a question which is crucial to the child born in a bilingual environment. This paradigm employs the speech resynthesis technique, enabling the experimenter to preserve or degrade acoustic cues such as phonotactics, syllabic rhythm or intonation from natural utterances. English and Japanese sentences were resynthesized, preserving broad phonotactics, rhythm and intonation (Condition 1), rhythm and intonation (Condition 2), intonation only (Condition 3), or rhythm only (Condition 4). The findings support the notion that syllabic rhythm is a necessary and sufficient cue for French adult subjects to discriminate English from Japanese sentences. The results are consistent with previous research using low-pass filtered speech, as well as with phonological theories predicting rhythmic differences between languages. Thus, the new methodology proposed appears to be well-suited to study language discrimination. Applications for other domains of psycholinguistic research and for automatic language identification are considered

CogPrints Cognitive Sciences Eprint Archive

Byte-based Language Identification with Deep Convolutional Networks

Author: Bjerva Johannes
Publication venue
Publication date: 28/10/2016
Field of study

We report on our system for the shared task on discriminating between similar languages (DSL 2016). The system uses only byte representations in a deep residual network (ResNet). The system, named ResIdent, is trained only on the data released with the task (closed training). We obtain 84.88% accuracy on subtask A, 68.80% accuracy on subtask B1, and 69.80% accuracy on subtask B2. A large difference in accuracy on development data can be observed with relatively minor changes in our network's architecture and hyperparameters. We therefore expect fine-tuning of these parameters to yield higher accuracies.Comment: 7 pages. Adapted reviewer comments. arXiv admin note: text overlap with arXiv:1609.0705

arXiv.org e-Print Archive

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Presenting GECO : an eyetracking corpus of monolingual and bilingual sentence reading

Author: Cop Uschi
Dirix Nicolas
Drieghe Denis
Duyck Wouter
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

This paper introduces GECO, the Ghent Eye-tracking Corpus, a monolingual and bilingual corpus of eye-tracking data of participants reading a complete novel. English monolinguals and Dutch-English bilinguals read an entire novel, which was presented in paragraphs on the screen. The bilinguals read half of the novel in their first language, and the other half in their second language. In this paper we describe the distributions and descriptive statistics of the most important reading time measures for the two groups of participants. This large eye-tracking corpus is perfectly suited for both exploratory purposes as well as more directed hypothesis testing, and it can guide the formulation of ideas and theories about naturalistic reading processes in a meaningful context. Most importantly, this corpus has the potential to evaluate the generalizability of monolingual and bilingual language theories and models to reading of long texts and narratives

Southampton (e-Prints Soton)

Ghent University Academic Bibliography

Archivsystem Ask23