656 research outputs found

    Croatian Speech Recognition

    Get PDF

    Primjena automatskog međujezičnog akustičnog modeliranja na HMM sintezu govora za oskudne jezične baze

    Get PDF
    Nowadays Human Computer Interaction (HCI) can also be achieved with voice user interfaces (VUIs). To enable devices to communicate with humans by speech in the user\u27s own language, low-cost language portability is often discussed and analysed. One of the most time-consuming parts for the language-adaptation process of VUI-capable applications is the target-language speech-data acquisition. Such data is further used in the development of VUIs subsystems, especially of speech-recognition and speech-production systems.The tempting idea to bypass a long-term process of data acquisition is considering the design and development of an automatic algorithms, which can extract the similar target-language acoustic from different language speech databases.This paper focus on the cross-lingual phoneme mapping between an under-resourced and a well-resourced language. It proposes a novel automatic phoneme-mapping technique that is adopted from the speaker-verification field. Such a phoneme mapping is further used in the development of the HMM-based speech-synthesis system for the under-resourced language. The synthesised utterances are evaluated with a subjective evaluation and compared by the expert knowledge cross-language method against to the baseline speech synthesis based just from the under-resourced data. The results reveals, that combining data from well-resourced and under-resourced language with the use of the proposed phoneme-mapping technique, can improve the quality of under-resourced language speech synthesis.U danaÅ”nje vrijeme interakcija čovjeka i računala (HCI) može se ostvariti i putem govornih sučelja (VUIs). Da bi se omogućila komunikacija uređaja i korisnika putem govora na vlastitom korisnikovom jeziku, često se raspravlja i analizira o jeftinom rjeÅ”enju prijevoda govora na različite jezike. Jedan od vremenski najzahtjevnijih dijelova procesa prilagodbe jezika za aplikacije koje podržavaju VUI je prikupljanje govornih podataka za ciljani jezik. Ovakvi podaci dalje se koriste za razvoj VUI podsustava, posebice za prepoznavanje i produkciju govora. Primamljiva ideja za izbjegavanje dugotrajnog postupka prikupljanja podataka jeste razmatranje sinteze i razvoja automatskih algoritama koji su sposobni izvesti slična akustična svojstva za ciljani jezik iz postojećih baza različitih jezika.Ovaj rad fokusiran je na povezivanje međujezičnih fonema između oskudnih i bogatih jezičnih baza. Predložena je nova tehnika automatskog povezivanja fonema, usvojena i prilagođena iz područja govorne autentikacije. Ovakvo povezivanje fonema kasnije se koristi za razvoj sustava za sintezu govora zasnovanom na HMM-u za manje poznate jezike. Načinjene govorne izjave ocijenjene su subjektivnim pristupom kroz usporedbu međujezičnih metoda visoke razine poznavanja jezika u odnosu na sintezu govora načinjenu iz oskudne jezične baze. Rezultati otkrivaju da kombinacija oskudne i bogate baze jezika uz primjenu predložene tehnike povezivanja fonema može unaprijediti kvalitetu sinteze govora iz oskudne jezične baze

    Razvoj akustičkog modela hrvatskog jezika pomoću alata HTK

    Get PDF
    Paper presents development of the acoustic model for Croatian language for automatic speech recognition (ASR). Continuous speech recognition is performed by means of the Hidden Markov Models (HMM) implemented in the HMM Toolkit (HTK). In order to adjust the HTK to the native language a novel algorithm for Croatian language transcription (CLT) has been developed. It is based on phonetic assimilation rules that are applied within uttered words. Phonetic questions for state tying of different triphone models have also been developed. The automated system for training and evaluation of acoustic models has been developed and integrated with the new graphical user interface (GUI). Targeted applications of this ASR system are stress inoculation training (SIT) and virtual reality exposure therapy (VRET). Adaptability of the model to a closed set of speakers is important for such applications and this paper investigates the applicability of the HTK tool for typical scenarios. Robustness of the tool to a new language was tested in matched conditions by a parallel training of an English model that was used as a baseline. Ten native Croatian speakers participated in experiments. Encouraging results were achieved and reported with the developed model for Croatian language.Rad opisuje razvoj akustičkog modela hrvatskog jezika za potrebe sustava za automatsko prepoznavanje govora. Prepoznavanje prirodnog spojenog izgovora ostvaruje se koriÅ”tenjem skrivenih Markovljevih modela (HMM) u okviru alata HTK. U svrhu prilagodbe ovog alata na hrvatski jezik razvijen je novi algoritam za automatsku fonetsku transkripciju hrvatskih riječi. Zasniva se na načelu fonetske asimilacije unutar izgovorenih riječi. Razvijen je i skup fonetskih pitanja koji se koristi za klasifikaciju prilikom udruživanja trifonskih modela sličnih glasova. Razvijena je automatizirana aplikacija za gradnju i evaluaciju akustičkih modela, integrirana s novo razvijenim grafičkim sučeljem. Primjene ovog sustava za prepoznavanje su trening s doziranim izlaganjem stresu (SIT) i terapija izlaganjem primjenom virtualne stvarnosti (VRET). Prilagodljivost akustičkog modela na zatvoren skup govornika vrlo je važna za takve primjene, pa se u radu istražuje primjenjivost alata HTK u tipičnim scenarijima. Robusnost alata na promjenu jezika istražuje se uparenim treniranjem i evaluacijom ekvivalentnog modela engleskog jezika u jednakim uvjetima. U eksperimentima je sudjelovalo deset izvornih hrvatskih govornika. Ostvareni rezultati za hrvatski jezik prikazani u radu pokazuju zadovoljavajuća svojstva razvijenog akustičkog modela hrvatskog jezika

    The listening talker: A review of human and algorithmic context-induced modifications of speech

    Get PDF
    International audienceSpeech output technology is finding widespread application, including in scenarios where intelligibility might be compromised - at least for some listeners - by adverse conditions. Unlike most current algorithms, talkers continually adapt their speech patterns as a response to the immediate context of spoken communication, where the type of interlocutor and the environment are the dominant situational factors influencing speech production. Observations of talker behaviour can motivate the design of more robust speech output algorithms. Starting with a listener-oriented categorisation of possible goals for speech modification, this review article summarises the extensive set of behavioural findings related to human speech modification, identifies which factors appear to be beneficial, and goes on to examine previous computational attempts to improve intelligibility in noise. The review concludes by tabulating 46 speech modifications, many of which have yet to be perceptually or algorithmically evaluated. Consequently, the review provides a roadmap for future work in improving the robustness of speech output

    A Croatian Weather Domain Spoken Dialog System Prototype

    Get PDF
    Speech technologies and language technologies have been already in use in IT for a certain time. Because of their great impact and fast growth, it is necessary to introduce these technologies for Croatian language. In this paper we propose a solution for developing a domain-oriented spoken dialog system for Croatian language. We have chosen a weather domain because it has limited vocabulary, it has easily accessible data and it is highly applicable. The Croatian weather dialog system provides information about weather in different regions of Croatia. The modules of the spoken dialog system perform automatic word recognition, semantic analysis, dialog management, response generation and text-to-speech synthesis. This is a first attempt to develop such a system for Croatian language and some new approaches are presented

    Croatian HMM-based Speech Synthesis

    Get PDF
    The paper describes the development of a trainable speech synthesis system, based on hidden Markov models. An approach to speech signal generation using a source-filter model is presented. Inputs into the synthesis system are speech utterances and their phone level transcriptions. A method using context-dependent acoustic models and Croatian phonetic rules for speech synthesis is proposed. Croatian HMM-based speech synthesis experiments are presented and generated speech results are discussed

    The SP2 SCOPES Project on Speech Prosody

    Get PDF
    This is an overview of a Joint Research Project within the Scientific co-operation between Eastern Europe and Switzerland (SCOPES) Program of the Swiss National Science Foundation (SNFS) and Swiss Agency for Development and Cooperation (SDC). Within the SP2 SCOPES Project on Speech Prosody, in the course of the following two years, the four partners aim to collaborate on the subject of speech prosody and advance the extraction, processing, modeling and transfer of prosody for a large portfolio of European languages: French, German, Italian, English, Hungarian, Serbian, Croatian, Bosnian, Montenegrin, and Macedonian. Through the intertwined four research plans, synergies are foreseen to emerge that will build a foundation for submitting strong joint proposals for EU funding

    Implementation of Bipolar Adjective Pairs in Analysis of Urban Acoustic Environments

    Get PDF
    Four different acoustic environments with different loudness levels and spectral distributions were recorded and reproduced to two groups of listeners - control group and experimental group. The questionnaire used in this research relies on the semantic differential method implemented by defining adjective pairs of opposite meaning where each pair describes a sound characteristic for a particular acoustic environment. In analyzing the results, psychological research methodology was used in order to determine statistically significant bipolar adjectives that can appropriately evaluate some given acoustic environment and thus serve as a starting point for a questionnaire and methodology standardization in soundscape research
    • ā€¦
    corecore