1,120 research outputs found

    From image to text to speech : The effects of speech prosody on information sequencing in audio description

    Get PDF
    Given the extensive body of research in audio description – the verbal-vocal description of visual or audiovisual content for visually impaired audiences – it is striking how little attention has been paid thus far to the spoken dimension of audio description and its para-linguistic, prosodic aspects. This article complements the previous research into how audio description speech is received by the partially sighted audiences by analyzing how it is performed vocally. We study the audio description of pictorial art, and one aspect of prosody is examined in detail: pitch, and the segmentation of information in relation to it. We analyze this relation in a corpus of audio described pictorial art in Finnish by combining phonetic measurements of the pitch with discourse analysis of the information segmentation. Previous studies have already shown that a sentence-initial high pitch acts as a discourse-structuring device in interpreting. Our study shows that the same applies to audio description. In addition, our study suggests that there is a relationship between the scale in the rise of pitch and the scale of the topical transition. That is, when the topical transition is clear, the rise of pitch level between the beginnings of two consecutive spoken sentences is large. Analogically, when the topical transition is small, the change of the sentence-initial pitch level is also rather small.Given the extensive body of research in audio description – the verbal-vocal description of visual or audiovisual content for visually impaired audiences – it is striking how little attention has been paid thus far to the spoken dimension of audio description and its para-linguistic, prosodic aspects. This article complements the previous research into how audio description speech is received by the partially sighted audiences by analyzing how it is performed vocally. We study the audio description of pictorial art, and one aspect of prosody is examined in detail: pitch, and the segmentation of information in relation to it. We analyze this relation in a corpus of audio described pictorial art in Finnish by combining phonetic measurements of the pitch with discourse analysis of the information segmentation. Previous studies have already shown that a sentence-initial high pitch acts as a discourse-structuring device in interpreting. Our study shows that the same applies to audio description. In addition, our study suggests that there is a relationship between the scale in the rise of pitch and the scale of the topical transition. That is, when the topical transition is clear, the rise of pitch level between the beginnings of two consecutive spoken sentences is large. Analogically, when the topical transition is small, the change of the sentence-initial pitch level is also rather small.Peer reviewe

    Directional adposition use in English, Swedish and Finnish

    Get PDF
    Directional adpositions such as to the left of describe where a Figure is in relation to a Ground. English and Swedish directional adpositions refer to the location of a Figure in relation to a Ground, whether both are static or in motion. In contrast, the Finnish directional adpositions edellĂ€ (in front of) and jĂ€ljessĂ€ (behind) solely describe the location of a moving Figure in relation to a moving Ground (Nikanne, 2003). When using directional adpositions, a frame of reference must be assumed for interpreting the meaning of directional adpositions. For example, the meaning of to the left of in English can be based on a relative (speaker or listener based) reference frame or an intrinsic (object based) reference frame (Levinson, 1996). When a Figure and a Ground are both in motion, it is possible for a Figure to be described as being behind or in front of the Ground, even if neither have intrinsic features. As shown by Walker (in preparation), there are good reasons to assume that in the latter case a motion based reference frame is involved. This means that if Finnish speakers would use edellĂ€ (in front of) and jĂ€ljessĂ€ (behind) more frequently in situations where both the Figure and Ground are in motion, a difference in reference frame use between Finnish on one hand and English and Swedish on the other could be expected. We asked native English, Swedish and Finnish speakers’ to select adpositions from a language specific list to describe the location of a Figure relative to a Ground when both were shown to be moving on a computer screen. We were interested in any differences between Finnish, English and Swedish speakers. All languages showed a predominant use of directional spatial adpositions referring to the lexical concepts TO THE LEFT OF, TO THE RIGHT OF, ABOVE and BELOW. There were no differences between the languages in directional adpositions use or reference frame use, including reference frame use based on motion. We conclude that despite differences in the grammars of the languages involved, and potential differences in reference frame system use, the three languages investigated encode Figure location in relation to Ground location in a similar way when both are in motion. Levinson, S. C. (1996). Frames of reference and Molyneux’s question: Crosslingiuistic evidence. In P. Bloom, M.A. Peterson, L. Nadel & M.F. Garrett (Eds.) Language and Space (pp.109-170). Massachusetts: MIT Press. Nikanne, U. (2003). How Finnish postpositions see the axis system. In E. van der Zee & J. Slack (Eds.), Representing direction in language and space. Oxford, UK: Oxford University Press. Walker, C. (in preparation). Motion encoding in language, the use of spatial locatives in a motion context. Unpublished doctoral dissertation, University of Lincoln, Lincoln. United Kingdo

    Eesti emotsionaalse kÔne korpuse loomine ja emotsioonide taju

    Get PDF
    VĂ€itekirja elektrooniline versioon ei sisalda publikatsioone.VĂ€itekirja eesmĂ€rk oli luua Eesti emotsionaalse kĂ”ne korpuse teoreetiline alus ja kontrollida loodud korpuse materjali pĂ”hjal teoreetiliste seisukohtade Ă”igsust. Uurimus nĂ€itas, kui oluline on korpust enne selle loomist hoolikalt planeerida ja tulemust analĂŒĂŒsida. Saadud teadmisi saavad rakendada nii emotsiooniuurijad kui ka kĂ”nekorpuste arendajad. Eesti korpuse teeb teiste kĂ”neemotsioonikorpuste seas ainulaadseks asjaolu, et lausete emotsioon on mĂ€rgendatud selle jĂ€rgi, kas emotsiooni kannab lause heli vĂ”i mĂ”jutab emotsiooni Ă€ratundmist hÀÀlest lause verbaalne sisu. Selline jaotus teeb vĂ”imalikuks emotsioonide uurimise nii kĂ”nes kui kirjas. Eesti emotsionaalse kĂ”ne korpus on ĂŒks vĂ€heseid esilekutsutud mÔÔdukalt vĂ€ljendunud emotsioone sisaldavaid kĂ”nekorpusi, mis on dokumenteeritud, avalikult ja tasuta kĂ€ttesaadav. Korpuse jaoks on salvestatud n-ö tavalise inimese etteloetud tekstid, kellele ei ole öeldud, millise emotsiooniga tuleb tekste lugeda. Kuna Eesti emotsionaalse kĂ”ne korpuses olevate lausete emotsioonid on mÀÀranud kuulajad testidega, on töös olulised emotsioonide tajuga seotud kĂŒsimused. VĂ€itekirja raames on leidnud kinnitust, et kuulajad suudavad hĂ€sti Ă€ra tunda mÔÔdukalt vĂ€ljendatud emotsioone mitteprofessionaalse lugeja hÀÀlest. Uurimistulemused toetavad otsust valida Eesti emotsionaalse kĂ”ne korpuse lausete emotsiooni mÀÀrajateks ĂŒle 30-aastased eesti keelt emakeelena rÀÀkivad tĂ€iskasvanud eestlased, kuna nad suudavad noortest paremini dekodeerida sĂ”numi emotsiooni. Samuti nĂ€itasid tulemused, et emotsioonidest arusaamine on kultuurisĂ”ltlik Uurimistulemused ei kinnitanud empaatia olulist rolli emotsioonide tuvastamisel hÀÀlest, kĂŒll aga nĂ€itasid meeste ja naiste erinevust emotsioonide tuvastamisel. Korpus on niisugusena, nagu ta teoreetiliselt kavandati olemas ja sisaldab praegu ĂŒhe naishÀÀle lauseid, mis on klassifitseeritud vihaks, rÔÔmuks, kurbuseks ja neutraalsuseks (vt http://peeter.eki.ee:5000). Kuna Eesti emotsionaalse kĂ”ne korpus on kergesti laiendatav, arendatakse seda edasi vastavalt uutele uurimissuundadele.The aim of the thesis was to develop a theoretical base for the Estonian Emotional Speech Corpus and to test the validity of the theoretical starting-points on the Corpus material. The Corpus is now ready as designed (see http://peeter.eki.ee:5000). The results of the research reveal the importance of detailed planning and of the design elements of the Corpus. The theoretical starting-points of the study are relevant and applicable in real situations. Therefore these results could be taken into consideration in the creation of other emotional speech corpora. What makes this Corpus unique among the other corpora of its kind is the fact that its sentences have different labels according to whether their emotion is carried just by the sound of the sentence or whether the recognition of their emotion from vocal expression may be influenced by the verbal-semantic content. This classification enables the research of emotions both in speech as well as in writing. Estonian Emotional Speech Corpus is one of the few freely available documented ones that reviews moderately expressed emotions. The Corpus abandoned acted emotions because of their possible stereotypicality and overactedness. The sentences recorded for the Corpus were read out by a so-called ordinary person, who was not dictated what emotion to use while reading. The Corpus contains 1,234 Estonian sentences that have passed both reading and listening tests. Test takers identified 908 sentences that expressed anger, joy, sadness, or were neutral. As the emotions of the sentences contained in the Corpus were determined by listeners, some issues of emotion perception came to the fore: 1) Is sentence emotion identifiable purely from vocal cues, without the speaker being seen? 2) Can age affect the identification of emotion? 3) Is the identification of emotion culturally bound? 4) Does identification depend on the listeners’ empathy? For the first question asking if the emotion of a sentence can be identified from non-acted vocal expression without the speaker being seen, results confirmed the supposition that listeners can recognize the moderate expression of non-acted emotions from the voice of a non-professional reeder. Also, the results support the decision that the emotions of the sentences in the Estonian Emotional Speech Corpus should be determined by Estonian adults aged over 30 who speak Estonian as their native language because they are more likely to have acquired the skills for decoding the culture-specific expression of emotions. Furthermore, the results imply that the understanding of emotions depends on cultural factors and social interactions, including the social norms specific to one culture. The interpretation of emotional messages is therefore learned in the course of social interactions. Research has shown, that in the recognition of emotion from vocal cues, empathy is less important than clinical results would suggest. In conducting emotion studies for speech technological purposes, it is obviously unnecessary to exclude non-empathic people from the testers for the reason that they may not recognize the emotions expressed if their low empathy level is not due to mental or developmental disorders. The Corpus continues to be developed according to the requirements of new research directions. As the Corpus is publicly available and accessible for free, its data can be used for tackling different research challenges

    Fine-tuning SI Quality Criteria: Could Speech Act Theory be of any Use?

    Get PDF
    This chapter looks at political rhetoric in the European Parliament, focusing on speech acts and the way they are conveyed by interpreters. Discourse in the European Parliament is a specific genre with speech acts constituting an integral rhetorical element of the genre. Following an analysis of an authentic corpus comprising more than 100 speeches in four languages, delivered in the European Parliament, the theoretical framework of the present chapter focuses on speech act theory, and the way it can be used to complement translation and interpreting theories in a close analysis of SI performances. The aim of the analysis has been to use authentic data in order to obtain some specific information that could be applied to interpreter training, as well as suggesting an approach for interpreter quality assessment

    Facial expression in an assessment

    Get PDF
    Peer reviewe

    A Study of Accomodation of Prosodic and Temporal Features in Spoken Dialogues in View of Speech Technology Applications

    Get PDF
    Inter-speaker accommodation is a well-known property of human speech and human interaction in general. Broadly it refers to the behavioural patterns of two (or more) interactants and the effect of the (verbal and non-verbal) behaviour of each to that of the other(s). Implementation of thisbehavior in spoken dialogue systems is desirable as an improvement on the naturalness of humanmachine interaction. However, traditional qualitative descriptions of accommodation phenomena do not provide sufficient information for such an implementation. Therefore, a quantitativedescription of inter-speaker accommodation is required. This thesis proposes a methodology of monitoring accommodation during a human or humancomputer dialogue, which utilizes a moving average filter over sequential frames for each speaker. These frames are time-aligned across the speakers, hence the name Time Aligned Moving Average (TAMA). Analysis of spontaneous human dialogue recordings by means of the TAMA methodology reveals ubiquitous accommodation of prosodic features (pitch, intensity and speech rate) across interlocutors, and allows for statistical (time series) modeling of the behaviour, in a way which is meaningful for implementation in spoken dialogue system (SDS) environments.In addition, a novel dialogue representation is proposed that provides an additional point of view to that of TAMA in monitoring accommodation of temporal features (inter-speaker pause length and overlap frequency). This representation is a percentage turn distribution of individual speakercontributions in a dialogue frame which circumvents strict attribution of speaker-turns, by considering both interlocutors as synchronously active. Both TAMA and turn distribution metrics indicate that correlation of average pause length and overlap frequency between speakers can be attributed to accommodation (a debated issue), and point to possible improvements in SDS “turntaking” behaviour. Although the findings of the prosodic and temporal analyses can directly inform SDS implementations, further work is required in order to describe inter-speaker accommodation sufficiently, as well as to develop an adequate testing platform for evaluating the magnitude ofperceived improvement in human-machine interaction. Therefore, this thesis constitutes a first step towards a convincingly useful implementation of accommodation in spoken dialogue systems

    New Approach to Teaching Japanese Pronunciation in the Digital Era - Challenges and Practices

    Get PDF
    Pronunciation has been a black hole in the L2 Japanese classroom on account of a lack of class time, teacher\u2019s confidence, and consciousness of the need to teach pronunciation, among other reasons. The absence of pronunciation instruction is reported to result in fossilized pronunciation errors, communication problems, and learner frustration. With an intention of making a contribution to improve such circumstances, this paper aims at three goals. First, it discusses the importance, necessity, and e ectiveness of teaching prosodic aspects of Japanese pronunciation from an early stage in acquisition. Second, it shows that Japanese prosody is challenging because of its typological rareness, regardless of the L1 backgrounds of learners. Third and finally, it introduces a new approach to teaching L2 pronunciation with the goal of developing L2 comprehensibility by focusing on essential prosodic features, which is followed by discussions on key issues concerning how to implement the new approach both inside and outside the classroom in the digital era

    Sound-Action Symbolism

    Get PDF
    Recent evidence has shown linkages between actions and segmental elements of speech. For instance, close-front vowels are sound symbolically associated with the precision grip, and front vowels are associated with forward-directed limb movements. The current review article presents a variety of such sound-action effects and proposes that they compose a category of sound symbolism that is based on grounding a conceptual knowledge of a referent in articulatory and manual action representations. In addition, the article proposes that even some widely known sound symbolism phenomena such as the sound-magnitude symbolism can be partially based on similar sensorimotor grounding. It is also discussed that meaning of suprasegmental speech elements in many instances is similarly grounded in body actions. Sound symbolism, prosody, and body gestures might originate from the same embodied mechanisms that enable a vivid and iconic expression of a meaning of a referent to the recipient.Peer reviewe
    • 

    corecore