9,246 research outputs found

    Robust language recognition via adaptive language factor extraction

    Get PDF
    This paper presents a technique to adapt an acoustically based language classifier to the background conditions and speaker accents. This adaptation improves language classification on a broad spectrum of TV broadcasts. The core of the system consists of an iVector-based setup in which language and channel variabilities are modeled separately. The subsequent language classifier (the backend) operates on the language factors, i.e. those features in the extracted iVectors that explain the observed language variability. The proposed technique adapts the language variability model to the background conditions and to the speaker accents present in the audio. The effect of the adaptation is evaluated on a 28 hours corpus composed of documentaries and monolingual as well as multilingual broadcast news shows. Consistent improvements in the automatic identification of Flemish (Belgian Dutch), English and French are demonstrated for all broadcast types

    Prosodic modules for speech recognition and understanding in VERBMOBIL

    Get PDF
    Within VERBMOBIL, a large project on spoken language research in Germany, two modules for detecting and recognizing prosodic events have been developed. One module operates on speech signal parameters and the word hypothesis graph, whereas the other module, designed for a novel, highly interactive architecture, only uses speech signal parameters as its input. Phrase boundaries, sentence modality, and accents are detected. The recognition rates in spontaneous dialogs are for accents up to 82,5%, for phrase boundaries up to 91,7%

    Prosody, focus, and focal structure : some remarks on methodology

    Get PDF
    Prosody falls between several established fields as e.g. phonetics, phonology, syntax, and dialogue structure. It is therefore prone to misconceptions: often, its relevancy is overestimated, and often, it is underestimated. The traditional method in linguistics in general and in phonology in particular is the construction and evaluation of sometimes rather complex examples based on the intuition of the linguist. This intuition is replaced by more or less naive and thus non-expert subjects and inferential statistics in experimental phonetics but the examples, i.e. the experimental material, are often rather complex as well. It is a truism that in both cases, conclusions are made on an "as if\u27; basis: as if a final proof had been found that the phenomenon A really exists regularily in the language B. In fact, it only can be proven that the phenomenon A sometimes can be detected in the production of some speakers of a variety of language B. This dilemma matters if prosody has to be put into practice, e.g. in automatic speech and language processing. In this field, large speech databases are already available for English and will be available for other languages as e.g. German in the near future. At least in the beginning, the problems that can - hopefully - be solved with the help of such databases might look trivial and thus not interesting - a step backwards and not forwards. "As if\u27; statements (concerning, e.g., narrow vs. broad focus) and problems that are trivial at face value (concerning, e.g., the relationship between phrasing units and accentuation and the ontology of sentence accent) will be illustrated with own material. I will argue that such trivial problems have to be dealt with in the beginning, and that they can constitute the very basis for the proper treatment of more far reaching and complex problems

    The acquisition of English L2 prosody by Italian native speakers: experimental data and pedagogical implications

    Get PDF
    This paper investigates Yes-No question intonation patterns in English L2, Italian L1, and English L1. The aim is to test the hypothesis that L2 learners may show different acquisition strategies for different dimensions of intonation, and particularly the phonological and phonetic components. The study analyses the nuclear intonation contours of 4 target English words and 4 comparable Italian words consisting of sonorant segments, stressed on the semi-final or final syllable, and occurring in Yes-No questions in sentence-final position (e.g., Will you attend the memorial?, Hai sentito la Melania?). The words were contained in mini-dialogues of question-answer pairs, and read 5 times by 4 Italian speakers (Padova area, North-East Italy) and 3 English female speakers (London area, UK). The results show that: 1) different intonation patterns may be used to realize the same grammatical function; 2) different developmental processes are at work, including transfer of L1 categories and the acquisition of L2 phonological categories. These results suggest that the phonetic dimension of L2 intonation may be more difficult to learn than the phonological one

    Strategies for focal accent detection in spontaneous speech

    Get PDF
    In this paper a new method for detection of focus is developed. Speech data consists of German spontaneous speech from several speakers. At present the algorithm uses only the fundamental frequency values. By computing a nonlinear reference line through significant anchor points in the F_{0} course, points of highest prominence are determined. The global recognition rate is 78,5% and the mean recognition rate is 66,6%

    CommonAccent: Exploring Large Acoustic Pretrained Models for Accent Classification Based on Common Voice

    Full text link
    Despite the recent advancements in Automatic Speech Recognition (ASR), the recognition of accented speech still remains a dominant problem. In order to create more inclusive ASR systems, research has shown that the integration of accent information, as part of a larger ASR framework, can lead to the mitigation of accented speech errors. We address multilingual accent classification through the ECAPA-TDNN and Wav2Vec 2.0/XLSR architectures which have been proven to perform well on a variety of speech-related downstream tasks. We introduce a simple-to-follow recipe aligned to the SpeechBrain toolkit for accent classification based on Common Voice 7.0 (English) and Common Voice 11.0 (Italian, German, and Spanish). Furthermore, we establish new state-of-the-art for English accent classification with as high as 95% accuracy. We also study the internal categorization of the Wav2Vev 2.0 embeddings through t-SNE, noting that there is a level of clustering based on phonological similarity. (Our recipe is open-source in the SpeechBrain toolkit, see: https://github.com/speechbrain/speechbrain/tree/develop/recipes)Comment: To appear in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 202

    What´s in the "pure" prosody?

    Get PDF
    Detectors for accents and phrase boundaries have been developed which derive prosodic features from the speech signal and its fundamental frequency to support other modules of a speech understanding system in an early analysis stage, or in cases where no word hypotheses are available.The detectors underlying Gaussian distribution classifiers were trained with 50 minutes and tested with 30 minutes of spontaneous speech, yielding recognition rates of 74% for accents and 86% for phrase boundaries. Since this material was prosodically hand labelled, the question was, which labels for phrase boundaries and accentuation were only guided by syntactic or semantic knowledge, and which ones are really prosodically marked.Therefore a small test subset has been resynthesized in such a way that comprehensibility was lost, but the prosodic characteristics were kept. This subset has been re-labelled by 11listeners with nearly the same accuracy as the detectors
    corecore