45 research outputs found

    Evaluation of preprocessors for neural network speaker verification

    Get PDF

    Speech Recognition

    Get PDF
    Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes

    Cross-lingual acoustic model adaptation for speaker-independent speech recognition

    Get PDF
    Laadukas puheentunnistus vaatii tunnistussysteemiltä kykyä mukautua puhujan ääneen ja puhetapaan. Suurin osa puheentunnistusjärjestelmistä on rakennettu kielellisesti yhtenäisten ryhmien käyttöön. Kun erilaisista kielellisistä taustoista tulevat ihmiset muodostavat enemmän ja enemmän käyttäjäryhmiä, tarve lisääntyy tehokkaalle monikieliselle puheentunnistukselle, joka ottaa huomioon murteiden ja painotusten lisäksi myös eri kielet. Tässä työssä tutkittiin, miten englannin ja suomen puheen akustisia malleja voidaan yhdistellä ja näin rakentaa monikielinen puheentunnistin. Työssä tutkittiin myös miten puhuja-adaptaatio toimii näissä järjestelmissä kielten sisällä ja kielirajan yli niin, että yhden kielen puhedataa käytetään adaptaatioon toisella kielellä. Puheentunnistimia rakennettiin suurilla suomen- ja englanninkielisillä puhekorpuksilla ja testattiin sekä yksi- että kaksikielisellä aineistolla. Tulosten perusteella voidaan todeta, että englannin ja suomen akustisten mallien yhdistelemisessä turvallisen klusteroinnin raja on niin alhaalla, että yhdistely ei juurikaan kannata tunnistimen tehokkuuden parantamiseksi. Tuloksista nähdään myös, että äidinkielenä puhutun suomen tunnistamista voitiin parantaa käyttämällä vieraana kielenä puhutun englannin dataa. Tämä mekanismi toimi vain yksisuuntaisesti: Vieraana kielenä puhutun englannin tunnistusta ei voinut parantaa äidinkielenä puhutun suomen datan avulla.For good quality speech recognition, the ability of the recognition system to adapt itself to each speaker's voice and speaking style is more than necessary. Most of speech recognition systems are developed for very specific purposes for a linguistically homogenous group. However, as user groups are formed out of people from differing linguistic backgrounds, there is an ever-growing demand for efficient multi-lingual speech technology that takes into account not only varying dialects and accents but also different languages. This thesis investigated how the acoustic models for English and Finnish can be efficiently combined to create a multilingual speech recognition system. Also how these combined systems perform speaker adaptation within languages and across languages using data from one language to improve recognition of the same speaker speaking another language was investigated. Recognition systems were trained based on large Finnish and English corpora, and tested both on monolingual and bilingual material. This study shows that the thresholds for safe merging of the model sets of Finnish and English are so low that the merging can hardly be motivated from the point of view of efficiency. Also it was found out that the recognition of native Finnish can be improved with the use of English speech data from the same speaker. This only works one-way, as the foreign English recognition could not be significantly improved with the help of Finnish speech data

    Max Planck Institute for Psycholinguistics: Annual report 1996

    No full text

    Evaluation of Foreign Accent Using Synthetic Speech (Perception).

    Get PDF
    A meaningful sentence loaded with appropriate phonemic and syllabic forms was synthesized as a standard stimulus, and 60 accented versions of the sentence were made to stimulate varying degrees of a moderate and a strong Spanish accent by manipulating the following Spanish cues singly and in combination: (1) fundamental frequency, (2) voice onset time for syllable-initial voiceless stops, (3) Duration of medial stressed vowels, (4) F1, F2 and F3 for full vowels, and (5) F1, F2, and F3 for reduced vowels. Two tapes for each level of accent were prepared on which 30 accented stimulus sentences were each paired with the standard sentence in four randomized sequences. Forty-two English speakers rated how different each accented sentence was from the standard sentence on a 10-point scale; they also gave a confidence rating on a 5-point scale for each item. It was demonstrated that synthesized sentences can be reliably rated for cue modifications indicative of a moderate Spanish accent in English. Statistical analysis revealed that an increase in the number of cues (from 1 to 2, to 3, to 4, to 5) resulted in the perception of increased accentedness in both the moderate- and strong-accent condition. In addition, subjects\u27 confidence in their judgments increased along with an increase in number of cues. A factor analysis showed that the suprasegmental cue, fundamental frequency (intonation), was the most perceptually prominent cue signalling a moderate Spanish accent in English. The segmental cue, stressed vowel quality, was the next prominent cue. The presence of these cues also resulted in an increase in the subjects\u27 confidence in their ratings of stimuli. The two strongest accent-bearing cues signalling the strong accent were both segmental, stressed vowel quality and VOT, but the strong-accent data was determined to be generally unreliable, possibly because of errors in its generation

    Discriminative classifiers for speaker recognition

    Get PDF
    Speaker Recognition, Speaker Verification, Sparse Kernel Logistic Regression, Support Vector MachineMagdeburg, Univ., Fak. für Elektrotechnik und Informationstechnik, Diss., 2008von Marcel Kat

    Proceedings of the ACM SIGIR Workshop ''Searching Spontaneous Conversational Speech''

    Get PDF
    corecore