16,488 research outputs found

    Bio-inspired broad-class phonetic labelling

    Get PDF
    Recent studies have shown that the correct labeling of phonetic classes may help current Automatic Speech Recognition (ASR) when combined with classical parsing automata based on Hidden Markov Models (HMM).Through the present paper a method for Phonetic Class Labeling (PCL) based on bio-inspired speech processing is described. The methodology is based in the automatic detection of formants and formant trajectories after a careful separation of the vocal and glottal components of speech and in the operation of CF (Characteristic Frequency) neurons in the cochlear nucleus and cortical complex of the human auditory apparatus. Examples of phonetic class labeling are given and the applicability of the method to Speech Processing is discussed

    Synthesis using speaker adaptation from speech recognition DB

    Get PDF
    This paper deals with the creation of multiple voices from a Hidden Markov Model based speech synthesis system (HTS). More than 150 Catalan synthetic voices were built using Hidden Markov Models (HMM) and speaker adaptation techniques. Training data for building a Speaker-Independent (SI) model were selected from both a general purpose speech synthesis database (FestCat;) and a database design ed for training Automatic Speech Recognition (ASR) systems (Catalan SpeeCon database). The SpeeCon database was also used to adapt the SI model to different speakers. Using an ASR designed database for TTS purposes provided many different amateur voices, with few minutes of recordings not performed in studio conditions. This paper shows how speaker adaptation techniques provide the right tools to generate multiple voices with very few adaptation data. A subjective evaluation was carried out to assess the intelligibility and naturalness of the generated voices as well as the similarity of the adapted voices to both the original speaker and the average voice from the SI model.Peer ReviewedPostprint (published version

    A summary of the 2012 JHU CLSP Workshop on Zero Resource Speech Technologies and Models of Early Language Acquisition

    Get PDF
    We summarize the accomplishments of a multi-disciplinary workshop exploring the computational and scientific issues surrounding zero resource (unsupervised) speech technologies and related models of early language acquisition. Centered around the tasks of phonetic and lexical discovery, we consider unified evaluation metrics, present two new approaches for improving speaker independence in the absence of supervision, and evaluate the application of Bayesian word segmentation algorithms to automatic subword unit tokenizations. Finally, we present two strategies for integrating zero resource techniques into supervised settings, demonstrating the potential of unsupervised methods to improve mainstream technologies.5 page(s

    Temporal Parameters of Spontaneous Speech in Forensic Speaker Identification in Case of Language Mismatch: Serbian as L1 and English as L2

    Get PDF
    Celem badania jest analiza możliwości identyfikacji mówcy kryminalistycznego i sądowego podczas zadawania pytań w różnych językach, z wykorzystaniem parametrów temporalnych. (wskaźnik artykulcji, wskaźnik mowy, stopień niezdecydowania, odsetek pauz, średnia czas trwania pauzy). Korpus obejmuje 10 mówców kobiet z Serbii, które znają język angielksi na poziomie zaawwansowanym. Patrametry są badane z wykorzystaniem beayesowskiego wzoru wskaźnika prawdopodobieństwa w 40 parach tcyh samych mówców i w 230 parach różnych mówców, z uwzględnieniem szacunku wskaźnika błędu, równiego wskaźnika błędu i Całościowego Wskaźnika Prawdopodobieństwa. badanie ma charakter pionierski w zakresie językoznawstwa sądowego i kryminalistycznego por1) ónawczego w parze jezyka serbskiego i angielskiego, podobnie, jak analiza parametrów temporalnych mówców bilingwalnych. Dalsze badania inny skoncentrować się na porównaniu języków z rytmem akcentowym i z rytmem sylabicznym. The purpose of the research is to examine the possibility of forensic speaker identification if question and suspect sample are in different languages using temporal parameters (articulation rate, speaking rate, degree of hesitancy, percentage of pauses, average pause duration). The corpus includes 10 female native speakers of Serbian who are proficient in English. The parameters are tested using Bayesian likelihood ratio formula in 40 same-speaker and 360 different-speaker pairs, including estimation of error rates, equal error rates and Overall Likelihood Ratio. One-way ANOVA is performed to determine whether inter-speaker variability is higher than intra- speaker variability across languages. The most successful discriminant is degree of hesitancy with ER of 42.5%/28%, (EER: 33%), followed by average pause duration with ER 35%/45.56%, (EER: 40%). Although the research features a closed-set comparison, which is not very common in forensic reality, the results are still relevant for forensic phoneticians working on criminal cases or as expert witnesses. This study pioneers in forensically comparing Serbian and English as well as in forensically testing temporal parameters on bilingual speakers. Further research should focus on comparing two stress-timed or two syllable-timed languages to test whether they will be more comparable in terms of temporal aspects of speech.

    Identyfikacja parametrów czasowych mowy spontanicznej mówców kryminalistycznych w przypadku niedopasowania językowego: język serbski jako L1 i język angielski jako L2

    Get PDF
    The purpose of the research is to examine the possibility of forensic speaker identification if question and suspect sample are in different languages using temporal parameters (articulation rate, speaking rate, degree of hesitancy, percentage of pauses, average pause duration). The corpus includes 10 female native speakers of Serbian who are proficient in English. The parameters are tested using Bayesian likelihood ratio formula in 40 same-speaker and 360 different-speaker pairs, including estimation of error rates, equal error rates and Overall Likelihood Ratio. One-way ANOVA is performed to determine whether inter-speaker variability is higher than intra- speaker variability across languages. The most successful discriminant is degree of hesitancy with ER of 42.5%/28%, (EER: 33%), followed by average pause duration with ER 35%/45.56%, (EER: 40%). Although the research features a closed-set comparison, which is not very common in forensic reality, the results are still relevant for forensic phoneticians working on criminal cases or as expert witnesses. This study pioneers in forensically comparing Serbian and English as well as in forensically testing temporal parameters on bilingual speakers. Further research should focus on comparing two stress-timed or two syllable-timed languages to test whether they will be more comparable in terms of temporal aspects of speech. Celem badania jest analiza możliwości identyfikacji mówcy kryminalistycznego i sądowego podczas zadawania pytań w różnych językach, z wykorzystaniem parametrów temporalnych. (wskaźnik artykulcji, wskaźnik mowy, stopień niezdecydowania, odsetek pauz, średnia czas trwania pauzy). Korpus obejmuje 10 mówców kobiet z Serbii, które znają język angielksi na poziomie zaawwansowanym. Patrametry są badane z wykorzystaniem beayesowskiego wzoru wskaźnika prawdopodobieństwa w 40 parach tcyh samych mówców i w 230 parach różnych mówców, z uwzględnieniem szacunku wskaźnika błędu, równiego wskaźnika błędu i Całościowego Wskaźnika Prawdopodobieństwa. badanie ma charakter pionierski w zakresie językoznawstwa sądowego i kryminalistycznego por1) ónawczego w parze jezyka serbskiego i angielskiego, podobnie, jak analiza parametrów temporalnych mówców bilingwalnych. Dalsze badania inny skoncentrować się na porównaniu języków z rytmem akcentowym i z rytmem sylabicznym.

    Production and perception of speaker-specific phonetic detail at word boundaries

    Get PDF
    Experiments show that learning about familiar voices affects speech processing in many tasks. However, most studies focus on isolated phonemes or words and do not explore which phonetic properties are learned about or retained in memory. This work investigated inter-speaker phonetic variation involving word boundaries, and its perceptual consequences. A production experiment found significant variation in the extent to which speakers used a number of acoustic properties to distinguish junctural minimal pairs e.g. 'So he diced them'—'So he'd iced them'. A perception experiment then tested intelligibility in noise of the junctural minimal pairs before and after familiarisation with a particular voice. Subjects who heard the same voice during testing as during the familiarisation period showed significantly more improvement in identification of words and syllable constituents around word boundaries than those who heard different voices. These data support the view that perceptual learning about the particular pronunciations associated with individual speakers helps listeners to identify syllabic structure and the location of word boundaries
    corecore