41 research outputs found

    Vocal Attractiveness Of Statistical Speech Synthesisers

    Get PDF
    The European Community’s Seventh Framework Programme (FP7/2007-2013) under Grant agreement 213845 (the EMIME project)Our previous analysis of speaker-adaptive HMM-based speech synthesis methods suggested that there are two possible reasons why average voices can obtain higher subjective scores than any individual adapted voice: 1) model adaptation degrades speech quality proportionally to the distance ‘moved’ by the transforms, and 2) psychoacoustic effects relating to the attractiveness of the voice. This paper is a follow-on from that analysis and aims to separate these effects out. Our latest perceptual experiments focus on attractiveness, using average voices and speaker-dependent voices without model transformation, and show that using several speakers to create a voice improves smoothness (measured by Harmonics-to-Noise Ratio), reduces distance from the the average voice in the log F0-F1 space of the final voice and hence makes it more attractive at the segmental level. However, this is weakened or overridden at supra-segmental or sentence levels

    Analysis of Speaker Clustering Strategies for HMM-Based Speech Synthesis

    Get PDF
    This paper describes a method for speaker clustering, with the application of building average voice models for speakeradaptive HMM-based speech synthesis that are a good basis for adapting to specific target speakers. Our main hypothesis is that using perceptually similar speakers to build the average voice model will be better than use unselected speakers, even if the amount of data available from perceptually similar speakers is smaller. We measure the perceived similarities among a group of 30 female speakers in a listening test and then apply multiple linear regression to automatically predict these listener judgements of speaker similarity and thus to identify similar speakers automatically. We then compare a variety of average voice models trained on either speakers who were perceptually judged to be similar to the target speaker, or speakers selected by the multiple linear regression, or a large global set of unselected speakers. We find that the average voice model trained on perceptually similar speakers provides better performance than the global model, even though the latter is trained on more data, confirming our main hypothesis. However, the average voice model using speakers selected automatically by the multiple linear regression does not reach the same level of performance. Index Terms: Statistical parametric speech synthesis, hidden Markov models, speaker adaptatio

    Roles of the Average Voice in Speaker-adaptive HMM-based Speech Synthesis

    Get PDF
    In speaker-adaptive HMM-based speech synthesis, there are typically a few speakers for which the output synthetic speech sounds worse than that of other speakers, despite having the same amount of adaptation data from within the same corpus. This paper investigates these fluctuations in quality and concludes that as melcepstral distance from the average voice becomes larger, the MOS naturalness scores generally become worse. Although this negative correlation is not that strong, it suggests a way to improve the training and adaptation strategies. We also draw comparisons between our findings and the work of other researchers regarding ``vocal attractiveness.'

    Music in the international market : differences and distribution : the case of Italy and China

    Get PDF
    Historically there has been limited transmission of musical ideas between Italy and China. When music travels between cultures it is subject to change and transformation and this cultural exchange is the foundation for popular music as we know it today. Within this dissertation, we will firstly analyse what makes music enjoyable for people through an analysis of genre. Then, perform a comparative analysis of their respective regional music genres and analyse similarities between them. Through this we can understand the similarities between the two markets and understand possible modes of entry for Italian musicians into the Chinese market. The motivation for this analysis is to ascertain whether there is a space for Italian musicians to find an audience in China. By understand the similarities between the countries we can find elements within Italian musicians’ product that will reduce the amount of alienation within the Chinese market.Tradicionalmente, tem sido reduzida a transmissão de noções e conceitos de música entre a Itália e a China. Quando a música viaja entre culturas está sujeita a mudanças e transformações, sendo este intercâmbio cultural a base da música popular tal como a conhecemos hoje. Com esta dissertação, pretende-se, em primeiro lugar, analisar o que leva a música ter um efeito positivo nas pessoas, através de uma análise de género. De seguida, far-se-á uma análise comparativa entre os diferentes géneros musicais regionais, analisando as semelhanças entre aqueles. Com este estudo, será possível compreender as semelhanças entre os dois países, e perceber como é que a música italiana poderá entrar no contexto chinês. O objetivo desta análise é verificar se existe público na China para os músicos italianos. Ao compreender as semelhanças entre estes dois países, poder-se-á encontrar elementos no espectro musical italiano que contribua para reduzir uma elevada indiferença à música italiana no mercado chinês

    Augmentative communication device design, implementation and evaluation

    Get PDF
    The ultimate aim of this thesis was to design and implement an advanced software based Augmentative Communication Device (ACD) , or Voice Output Communication Aid NOCA), for non-vocal Learning Disabled individuals by applying current psychological models, theories, and experimental techniques. By taking account of potential user's cognitive and linguistic abilities a symbol based device (Easy Speaker) was produced which outputs naturalistic digitised human speech and sound and makes use of a photorealistic symbol set. In order to increase the size of the available symbol set a hypermedia style dynamic screen approach was employed. The relevance of the hypermedia metaphor in relation to models of knowledge representation and language processing was explored.Laboratory based studies suggested that potential user's could learn to productively operate the software, became faster and more efficient over time when performing set conversational tasks. Studies with unimpaired individuals supported the notion that digitised speech was less cognitively demanding to decode, or listen to.With highly portable, touch based, PC compatible systems beginning to appear it is hoped that the otherwise silent will be able to use the software as their primary means of communication with the speaking world. Extensive field trials over a six month period with a prototype device and in collaboration with user's caregivers strongly suggested this might be the case.Off-device improvements were also noted suggesting that Easy Speaker, or similar software has the potential to be used as a communication training tool. Such training would be likely 10 improve overall communicative effectiveness.To conclude, a model for successful ACD development was proposed

    Designing Sound for Social Robots: Advancing Professional Practice through Design Principles

    Full text link
    Sound is one of the core modalities social robots can use to communicate with the humans around them in rich, engaging, and effective ways. While a robot's auditory communication happens predominantly through speech, a growing body of work demonstrates the various ways non-verbal robot sound can affect humans, and researchers have begun to formulate design recommendations that encourage using the medium to its full potential. However, formal strategies for successful robot sound design have so far not emerged, current frameworks and principles are largely untested and no effort has been made to survey creative robot sound design practice. In this dissertation, I combine creative practice, expert interviews, and human-robot interaction studies to advance our understanding of how designers can best ideate, create, and implement robot sound. In a first step, I map out a design space that combines established sound design frameworks with insights from interviews with robot sound design experts. I then systematically traverse this space across three robot sound design explorations, investigating (i) the effect of artificial movement sound on how robots are perceived, (ii) the benefits of applying compositional theory to robot sound design, and (iii) the role and potential of spatially distributed robot sound. Finally, I implement the designs from prior chapters into humanoid robot Diamandini, and deploy it as a case study. Based on a synthesis of the data collection and design practice conducted across the thesis, I argue that the creation of robot sound is best guided by four design perspectives: fiction (sound as a means to convey a narrative), composition (sound as its own separate listening experience), plasticity (sound as something that can vary and adapt over time), and space (spatial distribution of sound as a separate communication channel). The conclusion of the thesis presents these four perspectives and proposes eleven design principles across them which are supported by detailed examples. This work contributes an extensive body of design principles, process models, and techniques providing researchers and designers with new tools to enrich the way robots communicate with humans

    Attitudes towards Euskera : using the matched-guise technique among school children in the Basque Country.

    Get PDF
    SIGLEAvailable from British Library Document Supply Centre- DSC:D90325 / BLDSC - British Library Document Supply CentreGBUnited Kingdo

    Blind date : mate selection in visually impaired and sighted populations

    Get PDF
    EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    A virtual musical instrument exhibit for a science centre.

    Get PDF
    Virtual reality is a technology rapidly gaining interest from research and commercial groups around the world, but it's introduction into New Zealand has been slow. The majority of the general public have no concept of virtual reality, and only a few research institutes have begun virtual reality programmes of any sort. Partially this is due to the high cost of 'off the shelf' virtual reality systems, which is usually beyond the range of many organisations. Also the complexity of the software and the knowledge required to create and manipulate this software makes it a daunting prospect for many. This work describes the development of an economical system for the demonstration of virtual reality and some of its concepts and applications to the general public, in the form of an educational science centre exhibit. The system creates virtual musical instruments, overlayed onto the real world, and the user experiences these instruments as if they were in physical existence

    Spoken dialogue systems: architectures and applications

    Get PDF
    171 p.Technology and technological devices have become habitual and omnipresent. Humans need to learn tocommunicate with all kind of devices. Until recently humans needed to learn how the devices expressthemselves to communicate with them. But in recent times the tendency has become to makecommunication with these devices in more intuitive ways. The ideal way to communicate with deviceswould be the natural way of communication between humans, the speech. Humans have long beeninvestigating and designing systems that use this type of communication, giving rise to the so-calledSpoken Dialogue Systems.In this context, the primary goal of the thesis is to show how these systems can be implemented.Additionally, the thesis serves as a review of the state-of-the-art regarding architectures and toolkits.Finally, the thesis is intended to serve future system developers as a guide for their construction. For that
    corecore