Search CORE

4 research outputs found

Impact of frame rate on automatic speech-text alignment for corpus-based phonetic studies

Author: Bartkova Katarina
Jouvet Denis
Publication venue: HAL CCSD
Publication date: 10/08/2015
Field of study

International audiencePhonetic segmentation is the basis for many phonetic and linguistic studies. As manual segmentation is a lengthy and tedious task, automatic procedures have been developed over the years. They rely on acoustic Hidden Markov Models. Many studies have been conducted, and refinements developed for corpus based speech synthesis, where the technology is mainly used in a speaker-dependent context and applied on good quality speech signals. In a different research direction, automatic speech-text alignment is also used for phonetic and linguistic studies on large speech corpora. In this case, speaker independent acoustic models are mandatory, and the speech quality may not be so good. The speech models rely on 10 ms shift between acoustic frames, and their topology leads to strong minimum duration constraints. This paper focuses on the acoustic analysis frame rate, and gives a first insight on the impact of the frame rate on corpus-based phonetic studies

INRIA a CCSD electronic archive server

Hal-Diderot

A METHOD FOR AUTOMATIC ANALYSIS OF SPEECH TEMPO

Author: Stojanović Aleksandar
Publication venue
Publication date: 01/01/2021
Field of study

U ovom radu opisana je metoda analize brzine govora ili tempa na osnovu uzoraka govora dobivenih s televizijskih kanala koji sadrže tekst izgovorenog u obliku titlova. Za prepoznavanje govora korištena je nepovratna neuronska mreža (engl. feed-forward neural network) trenirana s oko 160 sekundi govora. Da bi se odredile granice pojedinačnih riječi napravljena je komponenta za poravnavanje govora s tekstom koja pronalazi prihvatljivo podudaranje slova teksta s fonemima koje je klasificirala neuronska mreža. Komponenta za poravnavanje uzima u obzir kategorije fonema za koje neuronska mreža ima veću preciznost klasifikacije. Preliminarni rezultati pokazuju prosječne promašaje poravnavanja od jednog do tri fonema, zavisno od govornika, sadržaja izgovorenog i kvalitete snimke.This paper describes a method for analysing speed of speech or tempo using speech recordings from Croatian TV news channels with subtitles. A feed-forward neural network was used for phoneme classification, trained with 160 seconds of recorded speech. To determine individual word positions a component for speech-to-text alignment was created which finds aproximate alignments of text from the subtitles and phonemes classified by the neural network. The alignment component relies on the fact that the neural network recognizes some groups of phonemes better than others. Preliminary results showed an average alignment offset of one to about three phonemes, depending on the recording quality, speaker and the content

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

Detecting autism, emotions and social signals using AdaBoost

Author: Busa-Fekete Róbert
Gosztolya Gábor
Tóth László
Publication venue: Interspeech
Publication date: 01/01/2013
Field of study

SZTE Publicatio Repozitórium - SZTE - Repository of Publications