3,086 research outputs found

    Intonation in a text-to-speech conversion system

    Get PDF

    Tagging Prosody and Discourse Structure in Elicited Spontaneous Speech

    Get PDF
    This paper motivates and describes the annotation and analysis of prosody and discourse structure for several large spoken language corpora. The annotation schema are of two types: tags for prosody and intonation, and tags for several aspects of discourse structure. The choice of the particular tagging schema in each domain is based in large part on the insights they provide in corpus-based studies of the relationship between discourse structure and the accenting of referring expressions in American English. We first describe these results and show that the same models account for the accenting of pronouns in an extended passage from one of the Speech Warehouse hotel-booking dialogues. We then turn to corpora described in Venditti [Ven00], which adapts the same models to Tokyo Japanese. Japanese is interesting to compare to English, because accent is lexically specified and so cannot mark discourse focus in the same way. Analyses of these corpora show that local pitch range expansion serves the analogous focusing function in Japanese. The paper concludes with a section describing several outstanding questions in the annotation of Japanese intonation which corpus studies can help to resolve.Work reported in this paper was supported in part by a grant from the Ohio State University Office of Research, to Mary E. Beckman and co-principal investigators on the OSU Speech Warehouse project, and by an Ohio State University Presidential Fellowship to Jennifer J. Venditti

    The listening talker: A review of human and algorithmic context-induced modifications of speech

    Get PDF
    International audienceSpeech output technology is finding widespread application, including in scenarios where intelligibility might be compromised - at least for some listeners - by adverse conditions. Unlike most current algorithms, talkers continually adapt their speech patterns as a response to the immediate context of spoken communication, where the type of interlocutor and the environment are the dominant situational factors influencing speech production. Observations of talker behaviour can motivate the design of more robust speech output algorithms. Starting with a listener-oriented categorisation of possible goals for speech modification, this review article summarises the extensive set of behavioural findings related to human speech modification, identifies which factors appear to be beneficial, and goes on to examine previous computational attempts to improve intelligibility in noise. The review concludes by tabulating 46 speech modifications, many of which have yet to be perceptually or algorithmically evaluated. Consequently, the review provides a roadmap for future work in improving the robustness of speech output

    VOICE BASED FOR BANKING SYSTEM

    Get PDF
    The trouble with traditional banking system service resulted difficulties, latency and low quality of service, not suitable for disable people and require extra manpower to perform simple bank activities. The goal of this project is to build a voice recognition based system which specifies on the banking activities element and specializes in using voice as a medium to run bank activities via telephony network system. Three fundamental objectives were addressed in the study. First, to develop two-way interactive program of banking system, which use voice as importantmechanism to receive instruction and response to user. Second, it support to first objective which to develop such a user friendly andhighsecurity voice banking system which requires the user first logs on to the system by furnishing the assigned customer identification number and personal identification number before user proceed for further actions. And therefore, there must have a strong database structure development of the application in the voice banking system that purposely to maintain the integrity of the data stored and responds to authorized user only. For third objective, is to determine the best programming in order to implement in telephony network system. There is a study and architecture on how voice can be accepted, manipulated and generated by using combination two types of programming which are Cold Fusion and VoiceXML, which is goes to the third objective. The functions of this system is proved and demanded by user as it provides such convenience and easy services with just use voice to transmit the instruction. Hence, this strategy will grab large number of customers and simultaneously will generate huge profit too to the bank institution that applies this system. It is hoping that, by developing this system it will be a platform for next developer to host the system and can be use a large number of customers simultaneously and efficiently. Keyword: Voice based, telephony, combination of programming, architectur

    Building and Designing Expressive Speech Synthesis

    Get PDF
    We know there is something special about speech. Our voices are not just a means of communicating. They also give a deep impression of who we are and what we might know. They can betray our upbringing, our emotional state, our state of health. They can be used to persuade and convince, to calm and to excite. As speech systems enter the social domain they are required to interact, support and mediate our social relationships with 1) each other, 2) with digital information, and, increasingly, 3) with AI-based algorithms and processes. Socially Interactive Agents (SIAs) are at the fore- front of research and innovation in this area. There is an assumption that in the future “spoken language will provide a natural conversational interface between human beings and so-called intelligent systems.” [Moore 2017, p. 283]. A considerable amount of previous research work has tested this assumption with mixed results. However, as pointed out “voice interfaces have become notorious for fostering frustration and failure” [Nass and Brave 2005, p.6]. It is within this context, between our exceptional and intelligent human use of speech to communicate and interact with other humans, and our desire to leverage this means of communication for artificial systems, that the technology, often termed expressive speech synthesis uncomfortably falls. Uncomfortably, because it is often overshadowed by issues in interactivity and the underlying intelligence of the system which is something that emerges from the interaction of many of the components in a SIA. This is especially true of what we might term conversational speech, where decoupling how things are spoken, from when and to whom they are spoken, can seem an impossible task. This is an even greater challenge in evaluation and in characterising full systems which have made use of expressive speech. Furthermore when designing an interaction with a SIA, we must not only consider how SIAs should speak but how much, and whether they should even speak at all. These considerations cannot be ignored. Any speech synthesis that is used in the context of an artificial agent will have a perceived accent, a vocal style, an underlying emotion and an intonational model. Dimensions like accent and personality (cross speaker parameters) as well as vocal style, emotion and intonation during an interaction (within-speaker parameters) need to be built in the design of a synthetic voice. Even a default or neutral voice has to consider these same expressive speech synthesis components. Such design parameters have a strong influence on how effectively a system will interact, how it is perceived and its assumed ability to perform a task or function. To ignore these is to blindly accept a set of design decisions that ignores the complex effect speech has on the user’s successful interaction with a system. Thus expressive speech synthesis is a key design component in SIAs. This chapter explores the world of expressive speech synthesis, aiming to act as a starting point for those interested in the design, building and evaluation of such artificial speech. The debates and literature within this topic are vast and are fundamentally multidisciplinary in focus, covering a wide range of disciplines such as linguistics, pragmatics, psychology, speech and language technology, robotics and human-computer interaction (HCI), to name a few. It is not our aim to synthesise these areas but to give a scaffold and a starting point for the reader by exploring the critical dimensions and decisions they may need to consider when choosing to use expressive speech. To do this, the chapter explores the building of expressive synthesis, highlighting key decisions and parameters as well as emphasising future challenges in expressive speech research and development. Yet, before these are expanded upon we must first try and define what we actually mean by expressive speech

    Speech Communication

    Get PDF
    Contains reports on four research projects.U. S. Air Force Cambridge Research Laboratories under Contract F19628-69-C-0044National Institutes of Health (Grant 5 RO1 NS 04332-08

    Suprasegmental transcription

    Get PDF
    No abstrac

    Eyebrow raising in dialogue: discourse structure, utterance function, and pitch accents

    Get PDF
    Some studies have suggested a relationship between eyebrow raising and different aspects of the verbal message, but our knowledge about this link is still very limited. If we could establish and characterise a relation between eyebrow raises and the linguistic signal we could better understand human multimodal communication behaviour. We could also improve the credibility and efficiency of computer animated conversational agents in multimodal communication systems.This thesis investigated eyebrow raising in a corpus of task-oriented English dialogues. Applying a standard dialogue coding scheme (Conversational Game Analysis, Carletta et al., 1997), eyebrow raises were studied in connection with discourse structure and utterance function. Supporting the prediction, more frequent and longer eyebrow raising occurred in the initial utterance of highlevel discourse segments than anywhere else in the dialogue (where 'high-level discourse segment' = transaction, and 'utterance' = move, following Carletta et al.). Additionally, eyebrow raises were more frequent in instructions than in requests for or acknowledgements of information. Instructions also had longer eyebrow raising than any other type of utterance. Contrary to the prediction, the start of a lower-level discourse segment (conversational game) did not have more eyebrow raising than any other position in the dialogue, and queries did not have more eyebrow raising than any other type of utterance.Eyebrow raises were also studied in relation to intonational events, namely pitch accents. Results showed evidence of alignment between the brow raise start and the start of a pitch accent. Most pitch accents were not associated with brow raising, but when brow raises occurred they tended to immediately precede a pitch accent on the speech signal. To investigate what could explain the alignment between the two events, pitch accents aligned with eyebrow raises were compared to all other pitch accents in terms of: phonological characteristics (primary vs. secondary pitch accents, and downstep-initial vs. non-initial pitch accents), information structure (given vs. new information in referring expressions, and the last quarter vs. earlier parts of the utterance length) and type of utterance in which they occurred (instruction vs. non-instruction). Those comparisons suggested that brow raises may be aligned more frequently with pitch accents in downstepinitial position and in instructions. No differences were found in terms of information structure or between primary/secondary accents.The results provide evidence of a link between eyebrow raising and spoken language. Eyebrow raises may signal the start of linguistic units such as discourse segments and some prosodic phenomena, they may be related to utterance function, and they are aligned with pitch accents. Possible linguistic functions are proposed, such as structuring and emphasising information in the verbal message

    Surface Structure, Intonation, and Meaning in Spoken Language

    Get PDF
    The paper briefly reviews a theory of intonational prosody and its relation syntax, and to certain oppositions of discourse meaning that have variously been called topic and comment , theme and rheme , given and new , or presupposition and focus . The theory, which is based on Combinatory Categorial Grammar, is presented in full elsewhere. the present paper examines its consequences for the automatic synthesis and analysis of speech
    corecore