159,804 research outputs found

    Multi-Dialect Speech Recognition With A Single Sequence-To-Sequence Model

    Full text link
    Sequence-to-sequence models provide a simple and elegant solution for building speech recognition systems by folding separate components of a typical system, namely acoustic (AM), pronunciation (PM) and language (LM) models into a single neural network. In this work, we look at one such sequence-to-sequence model, namely listen, attend and spell (LAS), and explore the possibility of training a single model to serve different English dialects, which simplifies the process of training multi-dialect systems without the need for separate AM, PM and LMs for each dialect. We show that simply pooling the data from all dialects into one LAS model falls behind the performance of a model fine-tuned on each dialect. We then look at incorporating dialect-specific information into the model, both by modifying the training targets by inserting the dialect symbol at the end of the original grapheme sequence and also feeding a 1-hot representation of the dialect information into all layers of the model. Experimental results on seven English dialects show that our proposed system is effective in modeling dialect variations within a single LAS model, outperforming a LAS model trained individually on each of the seven dialects by 3.1 ~ 16.5% relative.Comment: submitted to ICASSP 201

    The origin of the Japanese and Korean accent systems

    Get PDF
    S.R. Ramsey writes (1979: 162): "The patterning of tone marks in Old Kyoto texts divides the vocabulary into virtually the same classes as those arrived at by comparing the accent distinctions found in the modern dialects. This means that the Old Kyoto dialect had a pitch system similar to that of proto-Japanese. The standard language of the Heian period may not actually be the ancestor of all the dialects of Japan, but at least as far as the accent system is concerned, it is close enough to the proto system to be used as a working model. The significance of this fact is important: It means that each of the dialects included in the comparison has as much to tell, at least potentially, as any other dialect about Old Kyoto accent.

    Parallel grammaticalizations in Tibeto-Burman : evidence of Sapir's 'Drift'

    Get PDF
    In chapters seven and eight of his book Language, Sapir talked about what he called ‘drift’, the changes that a language undergoes through time [...]. Dialects of a language are formed when that language is broken into different segments that no longer move along the same exact drift. Even so, the general drift of a language has its deep and its shallow currents; those features that distinguish closely related dialects will be of the rapid, shallow currents, while the deeper, slower currents may remain consistent between the dialects for millennia. It is this latter type that Sapir felt is ‘fundamental to the genius of the language’ (p. 172), and he said that ‘The momentum of the more fundamental, the pre-dialectal, drift is often such that languages long disconnected will pass through the same or strikingly similar phases’ (p. 172)

    Dialectal variation in German 3-verb clusters : looking for the best analysis

    Get PDF
    German dialects vary in which of the possible orders of the verbs in a 3-verb cluster they allow. In a still ongoing empirical investigation that I am undertaking together with Tanja Schmid, University of Stuttgart (Schmid and Vogel (2004)) we already found that each of the six logically possible permutations of the 3-verb cluster in (1) can be found in German dialects

    Phonetic convergence in temporal organization during shadowed speech

    Get PDF
    The goal of this study was to examine phonetic convergence (when one imitates the phonetic characteristics of another talker) in various measures of temporal organization during shadowed speech across different American English dialects. Participants from the Northern and Midland American English dialect regions, plus several "mobile" talkers, were asked to read 72 sentences to establish a baseline for temporal organization, and then to repeat the same 72 sentences after Northern, Midland, and Southern model talkers. Measures of temporal organization (i.e., %V, ΔC, ΔV, rPVI-C, and nPVI-V) were calculated for the read sentences, shadowed sentences, and model talker sentences. Statistical analysis of the differences in distance between the model talker sentences and the shadowers' read and shadowed sentences, respectively, revealed significant convergence by all three shadowing groups toward the model dialects for ΔV, and significant divergence by Mobile talkers away from the model talkers for nPVI-V. Though the result of divergence by Mobile talkers was unexpected, both results provide evidence that support previous studies, which claim that social perception is a large contributing factor in convergence and divergence. These results are also consistent with previous findings demonstrating variation across dialects in temporal organization and, in addition, provide evidence for variation across dialects in convergence in temporal organization.The Ohio State University College of Arts and Sciences Undergraduate Research ScholarshipNo embargoAcademic Major: Linguistic

    MAINTAINING INDIGENOUS LANGUAGE THROUGH UNDERSTANDING THE PHILOSOPHY AND CULTURE

    Get PDF
    Overseas Chinese, as the third biggest tribe in Indonesia, and one of the big minority groups in other South East Asia countries, speak in various dialects in their daily life. Those dialects are their indigenous languages, based on their ancestors’. Most of them speak in Fukien (Hokkian) or Hakka dialects. Some of them even can speak in the both dialects. They prefer speak in those dialects to speak in Mandarin. The Chinese cultural value and philosophy which are taught by the parents and learned by the children continuously in the family take part in maintaining the indigenous language. Overseas Chinese are still using the language among their family and peer group who have the same cultural backgrounds. This paper will discuss in detail how and what efforts have been done by Overseas Chinese ‘Fukien’ and ‘Hakka’ society in Medan, in order to maintain their dialects, which strongly related and influenced by the Chinese philosophy and culture

    Pre-pausal devoicing and glottalisation in varieties of the south-western Arabian peninsula

    Get PDF
    A wide range of modern Arabic dialects exhibit devoicing in pre-pausal (utterance-final) position. These include Cairene [20], Gulf Arabic, San’ani [8], [18], Manaxah [19], Central Highland Yemeni dialects [1], Rijal Alma‘ (Asiri p.c.), Central Sudanese (Dickins p.c.), Çukurova [15], Kinderib [9], E. Fayyum [2]. In some dialects, pausal devoicing is reported to be accompanied by aspiration (e.g. Cairene, [19]), in others by glottalisation (e.g. Fayyum, [2]; Manaxah, [18]; San’ani, [8], [18]). As preliminary work to a study of pausal phenomena in the south-western Arabian Peninsula, we examine data from two Arabic dialects – San’ani (SA), spoken in the Old City of San’a, Yemen, and the Asiri dialect of Rijal Alma‘ (RA) – and from Mehriyōt, an eastern dialect of the modern south Arabian language, Mehri, spoken in Yemen. We begin by presenting a summary of pausal phenomena in SA. We then consider the behaviour of final oral stops – velar, coronal and labial – final coronal fricatives, final nasals and liquids, and final vowels. Initial comparison with data from RA and Mehriyōt indicates that utterance-final devoicing is more advanced in SA than in the other varieties, and involves a greater range of segment types. The first set of pausal examples were extracted from Watson’s recordings of spontaneous SA monologues on the Semitic Spracharchiv. The main speaker is a young semi educated woman.1 Those forms which exist as lexemes in RA, plus lexemes involving similar pre-pausal segments in comparable syllable types, were recorded utterance-finally by Yahya Asiri, a native speaker of RA. Pausal forms for Mehriyōt were extracted from the late Alexander Sima’s recordings of spontaneous speech on the Semitic sound archive [16]. The Mehriyōt speaker is a low- to semi-educated early middle-aged man. Data were analysed using the phonetic analysis programme PRAAT (www.praat.org)
    corecore