1,987 research outputs found

    Current trends in multilingual speech processing

    Get PDF
    In this paper, we describe recent work at Idiap Research Institute in the domain of multilingual speech processing and provide some insights into emerging challenges for the research community. Multilingual speech processing has been a topic of ongoing interest to the research community for many years and the field is now receiving renewed interest owing to two strong driving forces. Firstly, technical advances in speech recognition and synthesis are posing new challenges and opportunities to researchers. For example, discriminative features are seeing wide application by the speech recognition community, but additional issues arise when using such features in a multilingual setting. Another example is the apparent convergence of speech recognition and speech synthesis technologies in the form of statistical parametric methodologies. This convergence enables the investigation of new approaches to unified modelling for automatic speech recognition and text-to-speech synthesis (TTS) as well as cross-lingual speaker adaptation for TTS. The second driving force is the impetus being provided by both government and industry for technologies to help break down domestic and international language barriers, these also being barriers to the expansion of policy and commerce. Speech-to-speech and speech-to-text translation are thus emerging as key technologies at the heart of which lies multilingual speech processin

    Temporal Parameters of Spontaneous Speech in Forensic Speaker Identification in Case of Language Mismatch: Serbian as L1 and English as L2

    Get PDF
    Celem badania jest analiza możliwości identyfikacji mówcy kryminalistycznego i sądowego podczas zadawania pytań w różnych językach, z wykorzystaniem parametrów temporalnych. (wskaźnik artykulcji, wskaźnik mowy, stopień niezdecydowania, odsetek pauz, średnia czas trwania pauzy). Korpus obejmuje 10 mówców kobiet z Serbii, które znają język angielksi na poziomie zaawwansowanym. Patrametry są badane z wykorzystaniem beayesowskiego wzoru wskaźnika prawdopodobieństwa w 40 parach tcyh samych mówców i w 230 parach różnych mówców, z uwzględnieniem szacunku wskaźnika błędu, równiego wskaźnika błędu i Całościowego Wskaźnika Prawdopodobieństwa. badanie ma charakter pionierski w zakresie językoznawstwa sądowego i kryminalistycznego por1) ónawczego w parze jezyka serbskiego i angielskiego, podobnie, jak analiza parametrów temporalnych mówców bilingwalnych. Dalsze badania inny skoncentrować się na porównaniu języków z rytmem akcentowym i z rytmem sylabicznym. The purpose of the research is to examine the possibility of forensic speaker identification if question and suspect sample are in different languages using temporal parameters (articulation rate, speaking rate, degree of hesitancy, percentage of pauses, average pause duration). The corpus includes 10 female native speakers of Serbian who are proficient in English. The parameters are tested using Bayesian likelihood ratio formula in 40 same-speaker and 360 different-speaker pairs, including estimation of error rates, equal error rates and Overall Likelihood Ratio. One-way ANOVA is performed to determine whether inter-speaker variability is higher than intra- speaker variability across languages. The most successful discriminant is degree of hesitancy with ER of 42.5%/28%, (EER: 33%), followed by average pause duration with ER 35%/45.56%, (EER: 40%). Although the research features a closed-set comparison, which is not very common in forensic reality, the results are still relevant for forensic phoneticians working on criminal cases or as expert witnesses. This study pioneers in forensically comparing Serbian and English as well as in forensically testing temporal parameters on bilingual speakers. Further research should focus on comparing two stress-timed or two syllable-timed languages to test whether they will be more comparable in terms of temporal aspects of speech.

    Identyfikacja parametrów czasowych mowy spontanicznej mówców kryminalistycznych w przypadku niedopasowania językowego: język serbski jako L1 i język angielski jako L2

    Get PDF
    The purpose of the research is to examine the possibility of forensic speaker identification if question and suspect sample are in different languages using temporal parameters (articulation rate, speaking rate, degree of hesitancy, percentage of pauses, average pause duration). The corpus includes 10 female native speakers of Serbian who are proficient in English. The parameters are tested using Bayesian likelihood ratio formula in 40 same-speaker and 360 different-speaker pairs, including estimation of error rates, equal error rates and Overall Likelihood Ratio. One-way ANOVA is performed to determine whether inter-speaker variability is higher than intra- speaker variability across languages. The most successful discriminant is degree of hesitancy with ER of 42.5%/28%, (EER: 33%), followed by average pause duration with ER 35%/45.56%, (EER: 40%). Although the research features a closed-set comparison, which is not very common in forensic reality, the results are still relevant for forensic phoneticians working on criminal cases or as expert witnesses. This study pioneers in forensically comparing Serbian and English as well as in forensically testing temporal parameters on bilingual speakers. Further research should focus on comparing two stress-timed or two syllable-timed languages to test whether they will be more comparable in terms of temporal aspects of speech. Celem badania jest analiza możliwości identyfikacji mówcy kryminalistycznego i sądowego podczas zadawania pytań w różnych językach, z wykorzystaniem parametrów temporalnych. (wskaźnik artykulcji, wskaźnik mowy, stopień niezdecydowania, odsetek pauz, średnia czas trwania pauzy). Korpus obejmuje 10 mówców kobiet z Serbii, które znają język angielksi na poziomie zaawwansowanym. Patrametry są badane z wykorzystaniem beayesowskiego wzoru wskaźnika prawdopodobieństwa w 40 parach tcyh samych mówców i w 230 parach różnych mówców, z uwzględnieniem szacunku wskaźnika błędu, równiego wskaźnika błędu i Całościowego Wskaźnika Prawdopodobieństwa. badanie ma charakter pionierski w zakresie językoznawstwa sądowego i kryminalistycznego por1) ónawczego w parze jezyka serbskiego i angielskiego, podobnie, jak analiza parametrów temporalnych mówców bilingwalnych. Dalsze badania inny skoncentrować się na porównaniu języków z rytmem akcentowym i z rytmem sylabicznym.

    Cross-Lingual Neural Network Speech Synthesis Based on Multiple Embeddings

    Get PDF
    The paper presents a novel architecture and method for speech synthesis in multiple languages, in voices of multiple speakers and in multiple speaking styles, even in cases when speech from a particular speaker in the target language was not present in the training data. The method is based on the application of neural network embedding to combinations of speaker and style IDs, but also to phones in particular phonetic contexts, without any prior linguistic knowledge on their phonetic properties. This enables the network not only to efficiently capture similarities and differences between speakers and speaking styles, but to establish appropriate relationships between phones belonging to different languages, and ultimately to produce synthetic speech in the voice of a certain speaker in a language that he/she has never spoken. The validity of the proposed approach has been confirmed through experiments with models trained on speech corpora of American English and Mexican Spanish. It has also been shown that the proposed approach supports the use of neural vocoders, i.e. that they are able to produce synthesized speech of good quality even in languages that they were not trained on

    Articulatory features for conversational speech recognition

    Get PDF

    Regularized Subspace Gaussian Mixture Models for Speech Recognition

    Full text link

    Phonetic Temporal Neural Model for Language Identification

    Get PDF
    Deep neural models, particularly the LSTM-RNN model, have shown great potential for language identification (LID). However, the use of phonetic information has been largely overlooked by most existing neural LID methods, although this information has been used very successfully in conventional phonetic LID systems. We present a phonetic temporal neural model for LID, which is an LSTM-RNN LID system that accepts phonetic features produced by a phone-discriminative DNN as the input, rather than raw acoustic features. This new model is similar to traditional phonetic LID methods, but the phonetic knowledge here is much richer: it is at the frame level and involves compacted information of all phones. Our experiments conducted on the Babel database and the AP16-OLR database demonstrate that the temporal phonetic neural approach is very effective, and significantly outperforms existing acoustic neural models. It also outperforms the conventional i-vector approach on short utterances and in noisy conditions.Comment: Submitted to TASL

    Lingual articulation in children with developmental speech disorders

    Get PDF
    This thesis presents thirteen research papers published between 1987-97, and a summary and discussion of their contribution to the field of developmental speech disorders. The publications collectively constitute a body of work with two overarching themes. The first is methodological: all the publications report articulatory data relating to tongue movements recorded using the instrumental technique of electropalatography (EPG). The second is the clinical orientation of the research: the EPG data are interpreted throughout for the purpose of informing the theory and practice of speech pathology. The majority of the publications are original, experimental studies of lingual articulation in children with developmental speech disorders. At the same time the publications cover a broad range of theoretical and clinical issues relating to lingual articulation including: articulation in normal speakers, the clinical applications of EPG, data analysis procedures, articulation in second language learners, and the effect of oral surgery on articulation. The contribution of the publications to the field of developmental speech disorders of unknown origin, also known as phonological impairment or functional articulation disorder, is summarised and discussed. In total, EPG data from fourteen children are reported. The collective results from the publications do not support the cognitive/linguistic explanation of developmental speech disorders. Instead, the EPG findings are marshalled to build the case that specific deficits in speech motor control can account for many of the diverse speech error characteristics identified by perceptual analysis in previous studies. Some of the children studied had speech motor deficits that were relatively discrete, involving, for example, an apparently isolated difficulty with tongue tiplblade groove formation for sibilant targets. Articulatory difficulties of the 'discrete' or specific type are consistent with traditional views of functional lingual articulation in developmental speech disorders articulation disorder. EPG studies of tongue control in normal adults provided insights into a different type of speech motor control deficit observed in the speech of many of the children studied. Unlike the children with discrete articulatory difficulties, others produced abnormal EPG patterns for a wide range of lingual targets. These abnormal gestures were characterised by broad, undifferentiated tongue-palate contact, accompanied by variable approach and release phases. These 'widespread', undifferentiated gestures are interpreted as constituting a previously undescribed form of speech motor deficit, resulting from a difficulty in controlling the tongue tip/blade system independently of the tongue body. Undifferentiated gestures were found to result in variable percepts depending on the target and the timing of the particular gesture, and may manifest as perceptually acceptable productions, phonological substitutions or phonetic distortions. It is suggested that discrete and widespread speech motor deficits reflect different stages along a developmental or severity continuum, rather than distinct subgroups with different underlying deficits. The children studied all manifested speech motor control deficits of varying degrees along this continuum. It is argued that it is the unique anatomical properties of the tongue, combined with the high level of spatial and temporal accuracy required for tongue tiplblade and tongue body co-ordination, that put lingual control specifically at risk in young children. The EPG findings question the validity of assumptions made about the presence/absence of speech motor control deficits, when such assumptions are based entirely on non-instrumental assessment procedures. A novel account of the sequence of acquisition of alveolar stop articulation in children with normal speech development is proposed, based on the EPG data from the children with developmental speech disorders. It is suggested that broad, undifferentiated gestures may occur in young normal children, and that adult-like lingual control develops gradually through the processes of differentiation and integration. Finally, the EPG fmdings are discussed in relation to two recent theoretical frameworks, that of psycho linguistic models and a dynamic systems approach to speech acquisition
    corecore