7 research outputs found

    Acoustic correlates of encoded prosody in written conversation

    Get PDF
    This thesis presents an analysis of certain punctuation devices such as parenthesis, italics and emphatic spellings with respect to their acoustic correlates in read speech. The class of punctuation devices under investigation are referred to as prosodic markers. The thesis therefore presents an analysis of features of the spoken language which are represented symbolically in text. Hence it is a characterization of aspects of the spoken language which have been transcribed or symbolized in the written medium and then translated back into a spoken form by a reader. The thesis focuses in particular on the analysis of parenthesis, the examination of encoded prominence and emphasis, and also addresses the use of paralinguistic markers which signal attitude or emotion.In an effort to avoid the use of self constructed or artificial material containing arbitrary symbolic or prosodic encodings, all material used for empirical analysis was taken from examples of electronic written exchanges on the Internet, such as from electronic mail messages and from articles posted on electronic newsgroups and news bulletins. This medium of language, which is referred to here as written conversation, provides a rich source of material containing encoded prosodic markers. These occur in the form of 'smiley faces' expressing attitudes or feelings, words highlighted by a number of means such as capitalization, italics, underscore characters, or asterisks, and in the form of dashes or parentheses, which provide suggestions on how the information in a text or sentence may be structured with regard to its informational content.Chapter 2 investigates in detail the genre of written conversation with respect to its place in an emerging continuum between written and spoken language, concentrating on transcriptional devices and their function as indicators of prosody. The implications these symbolic representations bear on the task of reading, by humans as well as machines, are then examined.Chapters 3 and 4 turn to the acoustic analysis of parentheticals and emphasis markers respectively. The experimental work in this thesis is based on readings of a corpus of selected materials from written conversation with the acoustic analysis concentrating on the differences between readings of texts with prosodic markers and readings of the same texts from which prosodic markers have been removed. Finally, the effect of prosodic markers is tested in perception experiments involving both human and resynthesized utterances

    Development of isiXhosa text-to-speech modules to support e-Services in marginalized rural areas

    Get PDF
    Information and Communication Technology (ICT) projects are being initiated and deployed in marginalized areas to help improve the standard of living for community members. This has lead to a new field, which is responsible for information processing and knowledge development in rural areas, called Information and Communication Technology for Development (ICT4D). An ICT4D projects has been implemented in a marginalized area called Dwesa; this is a rural area situated in the wild coast of the former homelandof Transkei, in the Eastern Cape Province of South Africa. In this rural community there are e-Service projects which have been developed and deployed to support the already existent ICT infrastructure. Some of these projects include the e-Commerce platform, e-Judiciary service, e-Health and e-Government portal. Although these projects are deployed in this area, community members face a language and literacy barrier because these services are typically accessed through English textual interfaces. This becomes a challenge because their language of communication is isiXhosa and some of the community members are illiterate. Most of the rural areas consist of illiterate people who cannot read and write isiXhosa but can only speak the language. This problem of illiteracy in rural areas affects both the youth and the elderly. This research seeks to design, develop and implement software modules that can be used to convert isiXhosa text into natural sounding isiXhosa speech. Such an application is called a Text-to-Speech (TTS) system. The main objective of this research is to improve ICT4D eServices’ usability through the development of an isiXhosa Text-to-Speech system. This research is undertaken within the context of Siyakhula Living Lab (SLL), an ICT4D intervention towards improving the lives of rural communities of South Africa in an attempt to bridge the digital divide. Thedeveloped TTS modules were subsequently tested to determine their applicability to improve eServices usability. The results show acceptable levels of usability as having produced audio utterances for the isiXhosa Text-To-Speech system for marginalized areas

    Synthetic voice design and implementation.

    Get PDF
    The limitations of speech output technology emphasise the need for exploratory psychological research to maximise the effectiveness of speech as a display medium in human-computer interaction. Stage 1 of this study reviewed speech implementation research, focusing on general issues for tasks, users and environments. An analysis of design issues was conducted, related to the differing methodologies for synthesised and digitised message production. A selection of ergonomic guidelines were developed to enhance effective speech interface design. Stage 2 addressed the negative reactions of users to synthetic speech in spite of elegant dialogue structure and appropriate functional assignment. Synthetic speech interfaces have been consistently rejected by their users in a wide variety of application domains because of their poor quality. Indeed the literature repeatedly emphasises quality as being the most important contributor to implementation acceptance. In order to investigate this, a converging operations approach was adopted. This consisted of a series of five experiments (and associated pilot studies) which homed in on the specific characteristics of synthetic speech that determine the listeners varying perceptions of its qualities, and how these might be manipulated to improve its aesthetics. A flexible and reliable ratings interface was designed to display DECtalk speech variations and record listeners perceptions. In experiment one, 40 participants used this to evaluate synthetic speech variations on a wide range of perceptual scales. Factor analysis revealed two main factors: "listenability" accounting for 44.7% of the variance and correlating with the DECtalk "smoothness" parameter to . 57 (p<0.005) and "richness" to . 53 (p<0.005); "assurance" accounting for 12.6% of the variance and correlating with "average pitch" to . 42 (p<0.005) and "head size" to. 42 (p<0.005). Complimentary experiments were then required in order to address appropriate voice design for enhanced listenability and assurance perceptions. With a standard male voice set, 20 participants rated enhanced smoothness and attenuated richness as contributing significantly to speech listenability (p<0.001). Experiment three using a female voice set yielded comparable results, suggesting that further refinements of the technique were necessary in order to develop an effective methodology for speech quality optimization. At this stage it became essential to focus directly on the parameter modifications that are associated with the the aesthetically pleasing characteristics of synthetic speech. If a reliable technique could be developed to enhance perceived speech quality, then synthesis systems based on the commonly used DECtalk model might assume some of their considerable yet unfulfilled potential. In experiment four, 20 subjects rated a wide range of voices modified across the two main parameters associated with perceived listenability, smoothness and richness. The results clearly revealed a linear relationship between enhanced smoothness and attenuated richness and significant improvements in perceived listenability (p<0.001 in both cases). Planned comparisons conducted were between the different levels of the parameters and revealed significant listenability enhancements as smoothness was increased, and a similar pattern as richness decreased. Statistical analysis also revealed a significant interaction between the two parameters (p<0.001) and a more comprehensive picture was constructed. In order to expand the focus of and enhance the generality of the research, it was now necessary to assess the effects of synthetic speech modifications whilst subjects were undertaking a more realistic task. Passively rating the voices independent of processing for meaning is arguably an artificial task which rarely, if ever, would occur in 'real-world' settings. In order to investigate perceived quality in a more realistic task scenario, experiment five introduced two levels of information processing load. The purpose of this experiment was firstly to see if a comprehension load modified the pattern of listenability enhancements, and secondly to see if that pattern differed between high and and low load. Techniques for introducing cognitive load were investigated and comprehension load was selected as the most appropriate method in this case. A pilot study distinguished two levels of comprehension load from a set of 150 true/false sentences and these were recorded across the full range of parameter modifications. Twenty subjects then rated the voices using the established listenability scales as before but also performing the additional task of processing each spoken stimuli for meaning and determining the authenticity of the statements. Results indicated that listenability enhancements did indeed occur at both levels of processing although at the higher level variations in the pattern occured. A significant difference was revealed between optimal parameter modifications for conditions of high and low cognitive load (p<0.05). The results showed that subjects perceived the synthetic voices in the high cognitive load condition to be significantly less listenable than those same voices in the low cognitive load condition. The analysis also revealed that this effect was independent of the number of errors made. This result may be of general value because conclusions drawn from this findings are independent of any particular parameter modifications that may be exclusively available to DECtalk users. Overall, the study presents a detailed analysis of the research domain combined with a systematic experimental program of synthetic speech quality assessment. The experiments reported establish a reliable and replicable procedure for optimising the aesthetically pleasing characteristics of DECtalk speech, but the implications of the research extend beyond the boundaries of a particular synthesiser. Results from the experimental program lead to a number of conclusions, the most salient being that not only does the synthetic speech designer have to overcome the general rejection of synthetic voices based on their poor quality by sophisticated customisation of synthetic voice parameters, but that he or she needs to take into account the cognitive load of the task being undertaken. The interaction between cognitive load and optimal settings for synthesis requires direct consideration if synthetic speech systems are going to realise and maximise their potential in human computer interaction

    Some problems of designing for augmentative and alternative communication users: an enquiry through practical design activity

    Get PDF
    The submission is concerned with, and addresses, problems of designing for people with disabilities, with specific reference to people who are illiterate and cannot speak. People with such disabilities often depend on electronic AAC (Augmentative and Alternative Communication) devices for interpersonal communication. A central theme of the thesis, however, is that such products, and products intended for people with disabilities more generally, have characteristics that inadequately attend to users' needs. Through a combination of practical product development and literature reviews, the thesis demonstrates how improvements to AAC devices 'can be made through user-participatory, usercentred and more sensitive and perceptive design. Literature reviews in the following subjects are reported: AAC; the operational knowledge base for design and disability; user participatory design; and wearable computing. At the core of the thesis is the presentation and discussion of an empirical case study, carried out by the researcher, to design and develop the Portland Communication Aid (PCA). The PCA was conceived as an AAC product that would attempt to redress the inadequacies of predecessor devices. The design activity for the PCA is traced in the thesis, from initial concepts and development models through to a working prototype. Key ideas and essential principles of the design are illustrated. Throughout the work on the PCA, many problems associated with designing for people with severe communication disabilities were encountered. These problems, as with their resolutions, comprised matters of both designing (as an activity) and design (as product specification). The thesis contains comprehensive exposure and analysis of these problems and resolutions. In particular, the value of shaping meaning, metaphor, and other product semantics into devices intended for use by people with disabilities is explored. The study provides two substantive conclusions. First, that both the activity and the outcomes of Industrial Design have a valuable role in the empowerment and rehabilitation of AAC users. And second, that key principles have been identified that will enable designers to better identify, articulate and respond to the needs of people with communication disabilities (and the needs of people with disabilities more generally

    A forensic phonetic study of the vocal responses of individuals in distress

    Get PDF
    The production and perception of emotional speech is of growing importance to forensic speech scientists. They are often asked by instructing parties to provide an opinion as to whether recordings representing a violent attack are genuine, and whether speech material reflects real distress. However, they are prohibited from making statements regarding the psychological states of speakers by the International Association of Forensic Phonetics and Acoustics Code of Practice (IAFPA 2004). This study investigates two principal questions. First, it investigates how distress speech can be manifested acoustically. In so doing it proposes a taxonomy for comparing distress speech across speakers, assists in delimiting the boundaries of the vocal repertoire, and considers the extent to which acoustic measures of distress speech can distinguish between the vocalisations of real victims and actors. Second, it investigates whether listeners can discriminate between genuine and acted distress portrayals, and to what extent familiarity with forensic material increases listeners’ ability. Recordings from authentic criminal cases involving violent attack are compared with re-enactments by trained actors. Acoustic analyses examine F0, intensity, vowel formant frequencies and articulation rate. The recordings are also used as stimuli in a perceptual listening test, comparing the performance of lay listeners, police call takers and forensic practitioners. The findings lend support to the view that assessments of distress should be exercised with extreme caution. On the one hand, acoustic parameters can distinguish between non-distress and distress conditions, but cannot discriminate between acted and authentic distress, and so IAFPA’s refrain from such an assessment is justified. On the other, listeners who are familiar with authentic distress data, such as police call takers and forensic practitioners, are better able to differentiate between acted and authentic distress than lay listeners. Thus, if an assessment were to be made, the forensic practitioners may be the best group to do so
    corecore