20,366 research outputs found
Computational Sociolinguistics: A Survey
Language is a social phenomenon and variation is inherent to its social
nature. Recently, there has been a surge of interest within the computational
linguistics (CL) community in the social dimension of language. In this article
we present a survey of the emerging field of "Computational Sociolinguistics"
that reflects this increased interest. We aim to provide a comprehensive
overview of CL research on sociolinguistic themes, featuring topics such as the
relation between language and social identity, language use in social
interaction and multilingual communication. Moreover, we demonstrate the
potential for synergy between the research communities involved, by showing how
the large-scale data-driven methods that are widely used in CL can complement
existing sociolinguistic studies, and how sociolinguistics can inform and
challenge the methods and assumptions employed in CL studies. We hope to convey
the possible benefits of a closer collaboration between the two communities and
conclude with a discussion of open challenges.Comment: To appear in Computational Linguistics. Accepted for publication:
18th February, 201
Reflecting on English Lingua Franca today: Expanding scenarios and growing dilemmas. An overview with introductory notes
Introduction to the volume
Lost in translation: the problems of using mainstream MT evaluation metrics for sign language translation
In this paper we consider the problems of applying corpus-based techniques to minority languages that are neither politically recognised nor have a formally accepted writing system, namely sign languages. We discuss the adoption of an annotated form of sign language data as a suitable corpus for the development of a data-driven machine translation (MT) system, and deal with issues that arise from its use. Useful software tools that facilitate easy annotation of video data are also discussed. Furthermore, we address the problems of using traditional MT evaluation metrics for sign language translation. Based on the candidate translations produced from our example-based machine translation system, we discuss why standard metrics fall short of providing an accurate evaluation and suggest more suitable evaluation methods
Diachronic and/or synchronic variation? The acquisition of sociolinguistic competence in L2 French.
A majority of the early research in Second Language Acquisition focused on diachronic variation in the learnersâ interlanguage (IL), that is, differences in the IL linked to a supposed increase in knowledge between two points in time (cf. Tarone 1988). The last decade has seen an increase in studies combining a diachronic perspective with a synchronic one, that is, where variation in production is seen as the consequence of individual differences among learners (gender, extraversion, learning strategies, attitudes, motivation, sociobiographical variables linked to the language learning experience and the use of the target language (TL)). In this perspective, non-native-like patterns are not automatically assumed to be the result of incomplete knowledge, but other possible causes are taken into consideration such as temporary inaccessibility of information in stressful situations or even a conscious decision by the L2 user to deviate from the TL norm
An Empirical Evaluation of Zero Resource Acoustic Unit Discovery
Acoustic unit discovery (AUD) is a process of automatically identifying a
categorical acoustic unit inventory from speech and producing corresponding
acoustic unit tokenizations. AUD provides an important avenue for unsupervised
acoustic model training in a zero resource setting where expert-provided
linguistic knowledge and transcribed speech are unavailable. Therefore, to
further facilitate zero-resource AUD process, in this paper, we demonstrate
acoustic feature representations can be significantly improved by (i)
performing linear discriminant analysis (LDA) in an unsupervised self-trained
fashion, and (ii) leveraging resources of other languages through building a
multilingual bottleneck (BN) feature extractor to give effective cross-lingual
generalization. Moreover, we perform comprehensive evaluations of AUD efficacy
on multiple downstream speech applications, and their correlated performance
suggests that AUD evaluations are feasible using different alternative language
resources when only a subset of these evaluation resources can be available in
typical zero resource applications.Comment: 5 pages, 1 figure; Accepted for publication at ICASSP 201
- âŠ