26 research outputs found

    An Overview of the Slovenian Spoken Dialog System

    Get PDF
    In the paper we present the modules of the Slovenian spoken dialog system, developed within the joint project in multilingual speech recognition and understanding “Spoken Queries in European Languages” (SQEL-Copernicus-1634). The system can handle spontaneous speech and provide the user with correct information in the domain of air flight information retrieval. The major modules of the system perform word recognition, linguistic analysis, dialog management and speech synthesis. Some results with respect to word accuracy, semantic accuracy and dialog success rate are given, too

    Analytic Assessment of Telephone Transmission Impact on ASR Performance Using a Simulation Model

    Get PDF
    This paper addresses the impact of telephone transmission channels on automatic speech recognition (ASR) performance. A real-time simulation model is described and implemented, which allows impairments that are encountered in traditional as well as modern (mobile, IP-based) networks to be flexibly and efficiently generated. The model is based on input parameters which are known to telephone network planners; thus, it can be applied without measuring specific network characteristics. It can be used for an analytic assessment of the impact of channel impairments on ASR performance, for producing training material with defined transmission characteristics, or for testing spoken dialogue systems in realistic network environments. In the present paper, we present an investigation of the first point. Two speech recognizers which are integrated into a spoken dialogue system for information retrieval are assessed in relation to controlled amounts of transmission degradations. The measured ASR performance degradation is compared to speech quality degradation in human-human communication. It turns out that different behavior can be expected for some impairments. This fact has to be taken into account in both telephone network planning as well as in speech and language technology development

    Crowd-supervised training of spoken language systems

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2012.Cataloged from PDF version of thesis.Includes bibliographical references (p. 155-166).Spoken language systems are often deployed with static speech recognizers. Only rarely are parameters in the underlying language, lexical, or acoustic models updated on-the-fly. In the few instances where parameters are learned in an online fashion, developers traditionally resort to unsupervised training techniques, which are known to be inferior to their supervised counterparts. These realities make the development of spoken language interfaces a difficult and somewhat ad-hoc engineering task, since models for each new domain must be built from scratch or adapted from a previous domain. This thesis explores an alternative approach that makes use of human computation to provide crowd-supervised training for spoken language systems. We explore human-in-the-loop algorithms that leverage the collective intelligence of crowds of non-expert individuals to provide valuable training data at a very low cost for actively deployed spoken language systems. We also show that in some domains the crowd can be incentivized to provide training data for free, as a byproduct of interacting with the system itself. Through the automation of crowdsourcing tasks, we construct and demonstrate organic spoken language systems that grow and improve without the aid of an expert. Techniques that rely on collecting data remotely from non-expert users, however, are subject to the problem of noise. This noise can sometimes be heard in audio collected from poor microphones or muddled acoustic environments. Alternatively, noise can take the form of corrupt data from a worker trying to game the system - for example, a paid worker tasked with transcribing audio may leave transcripts blank in hopes of receiving a speedy payment. We develop strategies to mitigate the effects of noise in crowd-collected data and analyze their efficacy. This research spans a number of different application domains of widely-deployed spoken language interfaces, but maintains the common thread of improving the speech recognizer's underlying models with crowd-supervised training algorithms. We experiment with three central components of a speech recognizer: the language model, the lexicon, and the acoustic model. For each component, we demonstrate the utility of a crowd-supervised training framework. For the language model and lexicon, we explicitly show that this framework can be used hands-free, in two organic spoken language systems.by Ian C. McGraw.Ph.D

    Rapid Generation of Pronunciation Dictionaries for new Domains and Languages

    Get PDF
    This dissertation presents innovative strategies and methods for the rapid generation of pronunciation dictionaries for new domains and languages. Depending on various conditions, solutions are proposed and developed. Starting from the straightforward scenario in which the target language is present in written form on the Internet and the mapping between speech and written language is close up to the difficult scenario in which no written form for the target language exists

    A study on reusing resources of speech synthesis for closely-related languages

    Get PDF
    This thesis describes research on building a text-to-speech (TTS) framework that can accommodate the lack of linguistic information of under-resource languages by using existing resources from another language. It describes the adaptation process required when such limited resource is used. The main natural languages involved in this research are Malay and Iban language. The thesis includes a study on grapheme to phoneme mapping and the substitution of phonemes. A set of substitution matrices is presented which show the phoneme confusion in term of perception among respondents. The experiments conducted study the intelligibility as well as perception based on context of utterances. The study on the phonetic prosody is then presented and compared to the Klatt duration model. This is to find the similarities of cross language duration model if one exists. Then a comparative study of Iban native speaker with an Iban polyglot TTS using Malay resources is presented. This is to confirm that the prosody of Malay can be used to generate Iban synthesised speech. The central hypothesis of this thesis is that by using a closely-related language resource, a natural sounding speech can be produced. The aim of this research was to show that by sticking to the indigenous language characteristics, it is possible to build a polyglot synthesised speech system even with insufficient speech resources

    Un voyage du son par les fils électroacoustiques : l'art et les nouvelles technologies en Amérique Latine

    Get PDF
    L'histoire de la musique électroacoustique latino-américaine est longue, intéressante et prolifique, mais peu connue, même régionalement. De nombreux compositeurs nés ou vivants en Amérique latine ont été très actifs à ce titre, dans certains pays depuis plus de 50 ans, mais la disponibilité de l'information et des enregistrements de musique électroacoustique à cet égard et dans cette région a posé de sérieux problèmes aux éducateurs, compositeurs, interprètes, chercheurs, étudiants et au public en général.\ud Compte tenu de cette situation, la question suivante s'est imposée comme point de départ de ma thèse: comment s'est développée la tradition de la création musicale avec les médias électroacoustiques en Amérique Latine. Pour y répondre, j'ai adopté une approche historique en utilisant une méthodologie ethnographique (caractérisée par une immersion à long terme dans le domaine, par des contacts personnels avec des compositeurs et par ma participation et mon souci en ce qui concerne l'évolution des arts faisant appel aux nouvelles technologies en Amérique Latine) dans toute ma recherche. Ayant commencé à travailler dans le domaine de la musique électroacoustique au milieu des années 1970 dans mon Argentine natale, il m'a été très difficile d'obtenir de l'information sur les activités reliées à ce domaine dans des pays voisins et même dans ma propre ville. Bien que difficile, il était néanmoins possible de trouver les enregistrements de compositeurs vivant en Europe ou en Amérique du Nord, mais plus ardu de trouver ceux réalisés par des compositeurs locaux ou régionaux. Dans divers pays d'Amérique latine, les universités, les organismes d'état et de grandes fondations privées avaient de temps en temps pris l'initiative de soutenir la recherche en art et le recours aux nouveaux médias, mais la plupart avaient cessé leurs activités avant même de développer les ressources pour documenter les processus et préserver les résuItats. J'ai obtenu chaque enregistrement et information que j'ai rassemblés, depuis le milieu des années 1970, en contactant directement chacun des compositeurs. Avec le temps, j'ai constitué des archives personnelles, modestes mais croissantes, comprenant des notes de programme de concerts, livres, bulletins, magazines et revues, partitions, lettres, courriels et des enregistrements sur bobines, cassettes analogiques et quelques vinyles 33 tours. J'ai décidé de partager mes trésors avec des collègues et étudiants et d'explorer des solutions pour les rendre accessibles au plus grand nombre possible. Il y a quelques années, l'UNESCO m'a demandé de rédiger des rapports sur la musique électroacoustique latino-américaine et les arts médiatiques. Les textes de cette recherche ont contribué à diffuser de l'information sur le travail de beaucoup d'artistes latino-américains. Afin de rendre également accessible au public les oeuvres musicales, et sauvegarder le matériel, j'ai cherché un endroit où la préservation des enregistrements était non seulement importante mais aussi possible. J'estimais que la fondation Daniel Langlois pour l'art, la science et la technologie à Montréal était le lieu idéal pour mon projet. Mes activités continues durant près de 28 mois comme chercheur en résidence à la fondation Daniel Langlois m'ont permis de numériser et convertir des enregistrements à partir de différents formats, faire du montage au besoin et verser dans la base de données de la Fondation tous les renseignements sur les pièces (titre, compositeur, année de composition, instrumentation, notes de programme, studio de production, version, durée, bio du compositeur, etc.). À ce jour, janvier 2006, il y a 2152 fichiers audio numériques qui sont archivés au Centre de recherche et de documentation (CR+D) de la fondation. En complément à cette thèse de doctorat, j'ai développé une collection d'enregistrements musicaux maintenant disponibles au public. Cette collection est constituée du résultat de mes recherches (textes, oeuvres musicales, quelques partitions et photographies historiques, entrevues) et diffusée sur le site Web de la fondation Daniel Langlois. Les archives comptent des pièces pour médias fixes ainsi que des oeuvres mixtes pour instruments acoustiques ou voix et médias fixes ou systèmes électroniques interactifs en direct (1722 compositions). Les archives comprennent aussi des enregistrements audio et audiovisuels d'entrevues avec des compositeurs et des novateurs techniques ainsi que des\ud photographies, des vidéos, et quelques très rares partitions.\ud Une grande partie de l'information textuelle contenue dans la base de données des fichiers de musique est accessible par le site Web de la fondation Daniel Langlois. L'information complète (ex. notes de programme) et tous les enregistrements sont accessibles au CR+D. Une courte sélection de pièces est aussi accessible pour écoute sur le site Web. La plupart des compositeurs représentés dans ces archives et dans cette dissertation sont nés dans des pays d'Amérique latine. Il y a aussi quelques compositeurs qui, bien que n'étant pas originaires de la région, ont poursuivi au moins une partie de leur carrière musicale en Amérique latine.\ud Cette thèse renferme de l'information sur des compositeurs liés à 18 pays d'Amérique latine: Argentine, Bolivie, Brésil, Chili, Colombie, Costa Rica, Cuba, République dominicaine, Équateur, El Salvador, Guatemala, Mexico, Panama, Paraguay, Pérou, Porto Rico, Uruguay et Venezuela. Les archives contiennent des enregistrements de compositeurs de tous les pays mentionnés. J'espère que ce texte incitera à explorer ce merveilleux univers musical plutôt inconnu, créé par des centaines de compositeurs latino-américains au cours des dernières décennies. ______________________________________________________________________________ MOTS-CLÉS DE L’AUTEUR : musique électroacoustique, Amérique latine, art et nouvelles technologies, éthique, mémoire, culture, contexte, pionniers, interdisciplinarité

    Unit selection and waveform concatenation strategies in Cantonese text-to-speech.

    Get PDF
    Oey Sai Lok.Thesis (M.Phil.)--Chinese University of Hong Kong, 2005.Includes bibliographical references.Abstracts in English and Chinese.Chapter 1. --- Introduction --- p.1Chapter 1.1 --- An overview of Text-to-Speech technology --- p.2Chapter 1.1.1 --- Text processing --- p.2Chapter 1.1.2 --- Acoustic synthesis --- p.3Chapter 1.1.3 --- Prosody modification --- p.4Chapter 1.2 --- Trends in Text-to-Speech technologies --- p.5Chapter 1.3 --- Objectives of this thesis --- p.7Chapter 1.4 --- Outline of the thesis --- p.9References --- p.11Chapter 2. --- Cantonese Speech --- p.13Chapter 2.1 --- The Cantonese dialect --- p.13Chapter 2.2 --- Phonology of Cantonese --- p.14Chapter 2.2.1 --- Initials --- p.15Chapter 2.2.2 --- Finals --- p.16Chapter 2.2.3 --- Tones --- p.18Chapter 2.3 --- Acoustic-phonetic properties of Cantonese syllables --- p.19References --- p.24Chapter 3. --- Cantonese Text-to-Speech --- p.25Chapter 3.1 --- General overview --- p.25Chapter 3.1.1 --- Text processing --- p.25Chapter 3.1.2 --- Corpus based acoustic synthesis --- p.26Chapter 3.1.3 --- Prosodic control --- p.27Chapter 3.2 --- Syllable based Cantonese Text-to-Speech system --- p.28Chapter 3.3 --- Sub-syllable based Cantonese Text-to-Speech system --- p.29Chapter 3.3.1 --- Definition of sub-syllable units --- p.29Chapter 3.3.2 --- Acoustic inventory --- p.31Chapter 3.3.3 --- Determination of the concatenation points --- p.33Chapter 3.4 --- Problems --- p.34References --- p.36Chapter 4. --- Waveform Concatenation for Sub-syllable Units --- p.37Chapter 4.1 --- Previous work in concatenation methods --- p.37Chapter 4.1.1 --- Determination of concatenation point --- p.38Chapter 4.1.2 --- Waveform concatenation --- p.38Chapter 4.2 --- Problems and difficulties in concatenating sub-syllable units --- p.39Chapter 4.2.1 --- Mismatch of acoustic properties --- p.40Chapter 4.2.2 --- "Allophone problem of Initials /z/, Id and /s/" --- p.42Chapter 4.3 --- General procedures in concatenation strategies --- p.44Chapter 4.3.1 --- Concatenation of unvoiced segments --- p.45Chapter 4.3.2 --- Concatenation of voiced segments --- p.45Chapter 4.3.3 --- Measurement of spectral distance --- p.48Chapter 4.4 --- Detailed procedures in concatenation points determination --- p.50Chapter 4.4.1 --- Unvoiced segments --- p.50Chapter 4.4.2 --- Voiced segments --- p.53Chapter 4.5 --- Selected examples in concatenation strategies --- p.58Chapter 4.5.1 --- Concatenation at Initial segments --- p.58Chapter 4.5.1.1 --- Plosives --- p.58Chapter 4.5.1.2 --- Fricatives --- p.59Chapter 4.5.2 --- Concatenation at Final segments --- p.60Chapter 4.5.2.1 --- V group (long vowel) --- p.60Chapter 4.5.2.2 --- D group (diphthong) --- p.61References --- p.63Chapter 5. --- Unit Selection for Sub-syllable Units --- p.65Chapter 5.1 --- Basic requirements in unit selection process --- p.65Chapter 5.1.1 --- Availability of multiple copies of sub-syllable units --- p.65Chapter 5.1.1.1 --- "Levels of ""identical""" --- p.66Chapter 5.1.1.2 --- Statistics on the availability --- p.67Chapter 5.1.2 --- Variations in acoustic parameters --- p.70Chapter 5.1.2.1 --- Pitch level --- p.71Chapter 5.1.2.2 --- Duration --- p.74Chapter 5.1.2.3 --- Intensity level --- p.75Chapter 5.2 --- Selection process: availability check on sub-syllable units --- p.77Chapter 5.2.1 --- Multiple copies found --- p.79Chapter 5.2.2 --- Unique copy found --- p.79Chapter 5.2.3 --- No matched copy found --- p.80Chapter 5.2.4 --- Illustrative examples --- p.80Chapter 5.3 --- Selection process: acoustic analysis on candidate units --- p.81References --- p.88Chapter 6. --- Performance Evaluation --- p.89Chapter 6.1 --- General information --- p.90Chapter 6.1.1 --- Objective test --- p.90Chapter 6.1.2 --- Subjective test --- p.90Chapter 6.1.3 --- Test materials --- p.91Chapter 6.2 --- Details of the objective test --- p.92Chapter 6.2.1 --- Testing method --- p.92Chapter 6.2.2 --- Results --- p.93Chapter 6.2.3 --- Analysis --- p.96Chapter 6.3 --- Details of the subjective test --- p.98Chapter 6.3.1 --- Testing method --- p.98Chapter 6.3.2 --- Results --- p.99Chapter 6.3.3 --- Analysis --- p.101Chapter 6.4 --- Summary --- p.107References --- p.108Chapter 7. --- Conclusions and Future Works --- p.109Chapter 7.1 --- Conclusions --- p.109Chapter 7.2 --- Suggested future works --- p.111References --- p.113Appendix 1 Mean pitch level of Initials and Finals stored in the inventory --- p.114Appendix 2 Mean durations of Initials and Finals stored in the inventory --- p.121Appendix 3 Mean intensity level of Initials and Finals stored in the inventory --- p.124Appendix 4 Test word used in performance evaluation --- p.127Appendix 5 Test paragraph used in performance evaluation --- p.128Appendix 6 Pitch profile used in the Text-to-Speech system --- p.131Appendix 7 Duration model used in Text-to-Speech system --- p.13

    Electronic musical instruments as interactive exhibits in museums

    Get PDF
    Whilst recent museum exhibitions have explored electronic musical instruments, the interpretational focus has been on materiality rather than sounds produced. Similarly, whilst authors have ‘followed the instruments’ to find the people who used and designed them, those who create and shape their sounds remain comparatively hidden. To address this problem, this thesis introduces sound genealogy – a methodology towards following the evolution of a sound through material networks and people - as an interpretational framework to support exhibition teams in explicitly connecting sounds to instrument interfaces using multi-sensory interactive exhibits. Adopting this methodology will improve visitors’ experiences of music and sound content, helping them connect sounds from their lived experiences to the instruments associated with them: demonstrating how material networks can influence a sound’s popularity and musical value over time, whilst drawing attention to the people involved in the design and use of both sounds and instruments. Chapter one positions this research within contemporary exhibition practices and analyses the methodologies and literature that define the scope for upcoming discussions. The involvement of the UK’s Science Museum Group institutions is also highlighted. Chapters two to four present three case-study insights based on observations of objects and their sounds, and the use of representative exhibits, in North American, European, and British museums. These case studies were chosen so as to represent a range of instrument categories (synthesizers, samplers, drum machines) and interpretational foci (interface, sound, function). Interview data obtained from exhibition team members highlights the strategies and challenges in co-creating positive exhibit experiences for diverse audiences. Evidence from these case studies also supports the analyses of theories and concepts from museum studies, science and technology studies, and sound studies in chapters five and six. This helps to position - and advocate for - the adoption of a sound genealogy methodology in demonstrating the value of sound through interactivity. Additionally, the anticipation and management of visitor behaviours is considered in the context of successfully attaining learning and entertainment goals. Finally, chapters seven and eight document the creation and evaluation of an original interactive exhibit by the author, supported by the sound genealogy methodology
    corecore