73,128 research outputs found

    The listening talker: A review of human and algorithmic context-induced modifications of speech

    Get PDF
    International audienceSpeech output technology is finding widespread application, including in scenarios where intelligibility might be compromised - at least for some listeners - by adverse conditions. Unlike most current algorithms, talkers continually adapt their speech patterns as a response to the immediate context of spoken communication, where the type of interlocutor and the environment are the dominant situational factors influencing speech production. Observations of talker behaviour can motivate the design of more robust speech output algorithms. Starting with a listener-oriented categorisation of possible goals for speech modification, this review article summarises the extensive set of behavioural findings related to human speech modification, identifies which factors appear to be beneficial, and goes on to examine previous computational attempts to improve intelligibility in noise. The review concludes by tabulating 46 speech modifications, many of which have yet to be perceptually or algorithmically evaluated. Consequently, the review provides a roadmap for future work in improving the robustness of speech output

    Auditory communication in domestic dogs: vocal signalling in the extended social environment of a companion animal

    Get PDF
    Domestic dogs produce a range of vocalisations, including barks, growls, and whimpers, which are shared with other canid species. The source–filter model of vocal production can be used as a theoretical and applied framework to explain how and why the acoustic properties of some vocalisations are constrained by physical characteristics of the caller, whereas others are more dynamic, influenced by transient states such as arousal or motivation. This chapter thus reviews how and why particular call types are produced to transmit specific types of information, and how such information may be perceived by receivers. As domestication is thought to have caused a divergence in the vocal behaviour of dogs as compared to the ancestral wolf, evidence of both dog–human and human–dog communication is considered. Overall, it is clear that domestic dogs have the potential to acoustically broadcast a range of information, which is available to conspecific and human receivers. Moreover, dogs are highly attentive to human speech and are able to extract speaker identity, emotional state, and even some types of semantic information

    Speech Synthesis Based on Hidden Markov Models

    Get PDF

    The Sound Manifesto

    Full text link
    Computing practice today depends on visual output to drive almost all user interaction. Other senses, such as audition, may be totally neglected, or used tangentially, or used in highly restricted specialized ways. We have excellent audio rendering through D-A conversion, but we lack rich general facilities for modeling and manipulating sound comparable in quality and flexibility to graphics. We need co-ordinated research in several disciplines to improve the use of sound as an interactive information channel. Incremental and separate improvements in synthesis, analysis, speech processing, audiology, acoustics, music, etc. will not alone produce the radical progress that we seek in sonic practice. We also need to create a new central topic of study in digital audio research. The new topic will assimilate the contributions of different disciplines on a common foundation. The key central concept that we lack is sound as a general-purpose information channel. We must investigate the structure of this information channel, which is driven by the co-operative development of auditory perception and physical sound production. Particular audible encodings, such as speech and music, illuminate sonic information by example, but they are no more sufficient for a characterization than typography is sufficient for a characterization of visual information.Comment: To appear in the conference on Critical Technologies for the Future of Computing, part of SPIE's International Symposium on Optical Science and Technology, 30 July to 4 August 2000, San Diego, C

    Fog Computing in Medical Internet-of-Things: Architecture, Implementation, and Applications

    Full text link
    In the era when the market segment of Internet of Things (IoT) tops the chart in various business reports, it is apparently envisioned that the field of medicine expects to gain a large benefit from the explosion of wearables and internet-connected sensors that surround us to acquire and communicate unprecedented data on symptoms, medication, food intake, and daily-life activities impacting one's health and wellness. However, IoT-driven healthcare would have to overcome many barriers, such as: 1) There is an increasing demand for data storage on cloud servers where the analysis of the medical big data becomes increasingly complex, 2) The data, when communicated, are vulnerable to security and privacy issues, 3) The communication of the continuously collected data is not only costly but also energy hungry, 4) Operating and maintaining the sensors directly from the cloud servers are non-trial tasks. This book chapter defined Fog Computing in the context of medical IoT. Conceptually, Fog Computing is a service-oriented intermediate layer in IoT, providing the interfaces between the sensors and cloud servers for facilitating connectivity, data transfer, and queryable local database. The centerpiece of Fog computing is a low-power, intelligent, wireless, embedded computing node that carries out signal conditioning and data analytics on raw data collected from wearables or other medical sensors and offers efficient means to serve telehealth interventions. We implemented and tested an fog computing system using the Intel Edison and Raspberry Pi that allows acquisition, computing, storage and communication of the various medical data such as pathological speech data of individuals with speech disorders, Phonocardiogram (PCG) signal for heart rate estimation, and Electrocardiogram (ECG)-based Q, R, S detection.Comment: 29 pages, 30 figures, 5 tables. Keywords: Big Data, Body Area Network, Body Sensor Network, Edge Computing, Fog Computing, Medical Cyberphysical Systems, Medical Internet-of-Things, Telecare, Tele-treatment, Wearable Devices, Chapter in Handbook of Large-Scale Distributed Computing in Smart Healthcare (2017), Springe

    Singing synthesis with an evolved physical model

    Get PDF
    A two-dimensional physical model of the human vocal tract is described. Such a system promises increased realism and control in the synthesis. of both speech and singing. However, the parameters describing the shape of the vocal tract while in use are not easily obtained, even using medical imaging techniques, so instead a genetic algorithm (GA) is applied to the model to find an appropriate configuration. Realistic sounds are produced by this method. Analysis of these, and the reliability of the technique (convergence properties) is provided
    corecore