5 research outputs found
GĂ©pi beszĂ©d termĂ©szetessĂ©gĂ©nek növelĂ©se automatikus, beszĂ©djel alapĂş hangsĂşlycĂmkĂ©zĹ‘ algoritmussal
A minĂ©l termĂ©szetesebb hangzás elĂ©rĂ©se a gĂ©ppel előállĂtott beszĂ©dben napjainkban is igen fontos kutatási terĂĽlet. A hangzás termĂ©szetessĂ©gĂ©t számos más tĂ©nyezĹ‘ mellett a prozĂłdia is nagyban befolyásolja, ezĂ©rt alapvetĹ‘ követelmĂ©ny egy olyan, precĂzen annotált korpusz meglĂ©te, amely alapján gĂ©pi tanulással pontos generatĂv modelleket állĂthatunk elĹ‘. A korpusz kĂ©zi cĂmkĂ©zĂ©se költsĂ©ges Ă©s hosszadalmas, mĂ©g a prozĂłdiai egysĂ©gekre, hangsĂşlyokra vonatkozĂłan is, ráadásul nemzetközi tapasztalatok is igazolják, hogy a szakĂ©rtĹ‘ cĂmkĂ©zĹ‘k ĂtĂ©lete is szubjektĂv, hiszen a kĂĽlönbözĹ‘ szakĂ©rtĹ‘k által előállĂtott hangsĂşlyozásra vonatkozĂł annotáciĂłk közötti átfedĂ©s ritkán haladja meg a 80%-ot. A fentiek miatt gyakran használnak automatikus cĂmkĂ©zĹ‘ eljárásokat. A hangsĂşlycĂmkĂ©zĂ©st leggyakrabban a szöveges átirat alapján vĂ©gzik el, ami azonban szerĂ©nyebb pontosságot szolgáltat az emberi annotáláshoz kĂ©pest. AlternatĂvakĂ©nt jelen munkában egy beszĂ©djel alapĂş hangsĂşlycĂmkĂ©zĹ‘ algoritmust valĂłsĂtunk meg. Az Ăgy nyert hangsĂşlycĂmkĂ©zĂ©s ellenĹ‘rzĂ©sĂ©re hat (3-3 fĂ©rfi Ă©s nĹ‘i) HMM-TTS rendszert tanĂtunk, majd szubjektĂv lehallgatási tesztekkel (CMOS) hasonlĂtjuk össze a rendszereket
Using Phonological Phrase Segmentation to Improve Automatic Keyword Spotting for the Highly Agglutinating Hungarian Language
This paper investigates the usage of prosody for the improvement of keyword spotting, focusing on the highly agglutinating Hungarian language, where keyword spotting cannot be effectively performed using LVCSR, as such systems are either unavailable or hard to operate due to high OOV rates and poor N-gram language modelling capabilities. Therefore, the applied keyword spotting system is based on confidence scores computed as a ratio of acoustic scores obtained in two ways: firstly, by decoding with an universal background model; and secondly, by decoding with a keyword model embedded into filler models. Prosody is used to perform an automatic phonological phrase alignment for speech, proven to be useful for automatic partial word boundary detection in fixed stress languages. Several features deduced from the phonological phrase alignment are investigated to rescore baseline confidence scores both in a rule-based and in a data-driven manner. Results show that in relevant operating points of the system, a false alarm reduction of 10% - 40% can be reached by the same miss probability rates