10 research outputs found

    Emotional persuasion in advertising – analyzing dialectal language, visual images and their interplay in TV commercials

    Get PDF
    Emotions are gaining ever more traction in marketing research and researchers now broadly recognize the benefits of emotional persuasion. Marketing scholars have become interested in emotions as an aspect of consumer behavior because they are important components of consumers’ responses in pre- and post-purchase buying behavior, in consumer satisfaction, and in shaping attitudes to products, services, and brands. The appeal to emotion is also a central topic of advertising research because the practice targets the consumers’ psychological, social, or symbolic needs to evoke an emotional response. This study investigates emotional persuasion in television commercials and provides insights into consumer persuasion from the respondent’s perspective. Advertising seeking to arouse emotions and interest is intended to make the audience process the message more thoroughly, create a vivid and enticing memory of the brand, and ultimately persuade the consumer to purchase the company’s products or services. The purpose of this study is to investigate emotional persuasion in advertising, more specifically how appeals to emotion are mediated in TV commercials. Television advertising is an important part of modern economies and paid media. Multimodal commercials can simultaneously transmit visual and audio stimuli, which makes them especially persuasive in shaping viewer’ emotions. However, there is a dearth of knowledge of how appeals to emotion are mediated through the interplay of language and moving visual components. This dissertation aims to fill this gap by exploring the emotional persuasion of the joint interplay of language in the Swiss-German dialect and moving images in television commercials. By analyzing such language and images this study provides three interconnected perspectives on emotional persuasion: dialectal language, visual moving images, and their interplay. Accordingly, this cross-disciplinary study touches on the theoretical fields of marketing, linguistics, and psychology. To date, research results have shown positive outcomes of the use of local dialects in the process of persuasion in advertising. However, this study is among the first to investigate how dialectal language can be used in advertising to appeal emotionally to a fragmented target audience. In addition, this thesis is among the first studies to focus on the filmic mediation of appeals to emotion, that is, the joint interplay of language in the Swiss-German dialect and moving images. The data for the empirical study consist of 32 television commercials in the spoken Swiss-German dialect placed by the Swiss cooperative Migros operating in the retail segment and specializing in fast-moving consumer goods. The research is based on a mixed-methods approach and the empirical aspect is conducted in two phases by analyzing commercials quantitatively and qualitatively. In the first phase, content analysis is used as a quantitative method to organize the stream of images and language. In the second phase, the qualitative analysis, the appeals to emotion of the language, images, and their interplay are investigated. The qualitative analysis of the data is divided into two stages: linguistic analysis and semiotic analysis. The linguistic analysis is conducted to study the emotional appeal of the language in the Swiss-German dialect. The semiotic analysis is conducted to uncover the emotional meanings of the images at the connotative level and the emotional meanings of the images in the interplay with the language. The outcome of the study is a framework of emotionally persuasive advertising in emotionally appealing dialectal language, emotionally appealing images, and the interplay of language in dialect and images. The framework can open new perspectives on understanding emotionally appealing advertising. From the managerial point of view, being able to appeal to customers on an emotional level can cut through the noise inherent in advertising, something that is becoming more difficult in today’s media environment filled with messages. Since consumers are exposed to numerous commercials, those that carry an emotional appeal can stand out from the crowd. As a practical implication, the framework is applicable to multimodal advertising in several media channels, including online advertising. The framework can help those designing advertising for fragmented target audiences and help marketers respond to the challenges of localization.-- Tunteisiin vetoava markkinointi on keskeinen aihe niin akateemisessa tutkimuksessa kuin käytännön markkinoinnissakin. Kuluttajien tunteiden on osoitettu olevan keskeisiä tekijöitä tuotteista, palveluista ja brändeistä muodostuvissa asenteissa. Tunteita on pyritty ymmärtämään myös osana ostokäyttäytymistä ja koettua asiakastyytyväisyyttä. Tutkimustulokset osoittavat, että tunteisiin vetoava mainonta on tehokas tapa puhutella katsojia. Vetoamalla kohdeyleisön psykologisiin, sosiaalisiin tai symbolisiin tarpeisiin katsojissa pyritään herättämään tunteita ja saamaan heidät ostamaan mainostettuja tuotteita. Tutkimusten mukaan tunteisiin vetoava mainonta herättää hyvin huomiota ja saa vastaanottajat käsittelemään mainosviestejä syvällisemmin. Näin brändeistä pystytään luomaan eläviä ja mieleenpainuvia muistikuvia. Lisäksi on todettu, että kohdeyleisön puhutteleminen heidän omalla murteellaan vetoaa vahvemmin tunteisiin. Käsillä olevan väitöskirjan tarkoitus on tutkia tunteisiin vetoavaa televisiomainontaa vastaanottajan näkökulmasta. Televisiomainonta on tärkeä osa taloutta ja maksettua mediaa. Televisiomainokset välittävät viestejä sekä näkö- että kuuloaistia hyödyntäen, mikä tehostaa mainosten vetoavuutta ja vaikuttavuutta. Vaikka monesta merkkijärjestelmästä koostuvaa multimodaalista mainontaa on tutkittu aikaisemminkin, aiempi tutkimus ei ole osoittanut, kuinka tunteisiin vetoavien mainosviestien kokonaismerkitys muodostuu sekä kielen että liikkuvien kuvien vuorovaikutuksessa. Käsillä oleva väitöskirja pyrkii täyttämään tämän tutkimusaukon tutkimalla tunteisiin vetoavan mainonnan kolmea toisiinsa kytköksissä olevaa näkökulmaa eli puhuttua kieltä, liikkuvia kuvia ja niiden vuorovaikutusta sveitsinsaksan murteella tuotetuissa televisiomainoksissa. Tämä poikkitieteellinen tutkimus onkin yksi ensimmäisistä, joissa selvitetään sekä puhutun kielen että liikkuvien kuvien yhteistoimintaa mainonnassa. Työn teoreettinen viitekehys kytkeytyy markkinointiin, kielitieteeseen ja psykologiaan. Paikallismurteiden ja mainonnan vaikuttavuuden välisestä yhteydestä on olemassa jonkin verran aiempaa tutkimusnäyttöä. Käsillä oleva väitöskirja on kuitenkin yksi ensimmäisistä tutkimuksista, joissa selvitetään, miten mainosten murteellisella kielellä pyritään puhuttelemaan fragmentoitunutta kohdeyleisöä tunnetasolla. Tutkielman empiirisen osan tutkimusaineisto koostuu 32 televisiomainoksesta, joissa puhutaan sveitsinsaksan murretta. Mainosten julkaisija on sveitsiläinen vähittäiskaupan alaan kuuluva osuuskunta Migros, joka on erikoistunut päivittäis- ja käyttötavaroihin. Tutkimusmetodologisesti työ edustaa monimenetelmätutkimusta, jossa yhdistetään sekä määrällisiä että laadullisia tutkimusmenetelmiä. Ensimmäisessä vaiheessa mainosten puhuttua murteellista kieltä ja liikkuvia kuvia tutkitaan määrällisen sisällönanalyysin avulla. Sisällönanalyysi selvittää kielen ja kuvien määrää mainoksissa. Toisessa vaiheessa tunteisiin vetoavaa murteellista kieltä, liikkuvia kuvia ja näiden vuorovaikutusta analysoidaan laadullisin menetelmin. Tutkimuksessa hyödynnettyjä laadullisia menetelmiä ovat lingvistinen ja semioottinen analyysi. Lingvistisen analyysin avulla selvitetään, miten murteellisella mainoskielellä pyritään vetoamaan katsojien tunteisiin. Semioottisessa analyysissä tutkitaan kuvien tunteisiin vetoavia konnotaatioita sekä tunteisiin vetoavan kielen että liikkuvien kuvien välistä vuorovaikutusta. Väitöskirjatutkimuksen tieteellinen kontribuutio esitetään tunteisiin vetoavan mainonnan mallina, johon tiivistyy murteellisen kielen, liikkuvien kuvien ja näiden vuorovaikutuksen keinot vedota kuluttajiin tunnetasolla. Näin ollen tuotetaan uutta tietoa markkinoinnin ja kielitieteen tutkimukseen. Mallista voi olla hyötyä käytännön markkinointityössä, sillä tunteisiin vetoava mainonta erottautuu paremmin kilpailevien mainosten täyttämästä mediaympäristöstä. Mallia voidaan soveltaa käytettäväksi eri viestintäkanavissa, esimerkiksi verkkomainonnassa. Lisäksi tutkimustulokset voivat auttaa markkinoijia kohdentamaan mainontaa paikallisille kohdeyleisöille ja vastaamaan lokalisoinnin tuomiin haasteisiin

    Conveying expressivity and vocal effort transformation in synthetic speech with Harmonic plus Noise Models

    Get PDF
    Aquesta tesi s'ha dut a terme dins del Grup en de Tecnologies Mèdia (GTM) de l'Escola d'Enginyeria i Arquitectura la Salle. El grup te una llarga trajectòria dins del cap de la síntesi de veu i fins i tot disposa d'un sistema propi de síntesi per concatenació d'unitats (US-TTS) que permet sintetitzar diferents estils expressius usant múltiples corpus. De forma que per a realitzar una síntesi agressiva, el sistema usa el corpus de l'estil agressiu, i per a realitzar una síntesi sensual, usa el corpus de l'estil corresponent. Aquesta tesi pretén proposar modificacions del esquema del US-TTS que permetin millorar la flexibilitat del sistema per sintetitzar múltiples expressivitats usant només un únic corpus d'estil neutre. L'enfoc seguit en aquesta tesi es basa en l'ús de tècniques de processament digital del senyal (DSP) per aplicar modificacions de senyal a la veu sintetitzada per tal que aquesta expressi l'estil de parla desitjat. Per tal de dur a terme aquestes modificacions de senyal s'han usat els models harmònic més soroll per la seva flexibilitat a l'hora de realitzar modificacions de senyal. La qualitat de la veu (VoQ) juga un paper important en els diferents estils expressius. És per això que es va estudiar la síntesi de diferents emocions mitjançant la modificació de paràmetres de VoQ de baix nivell. D'aquest estudi es van identificar un conjunt de limitacions que van donar lloc als objectius d'aquesta tesi, entre ells el trobar un paràmetre amb gran impacte sobre els estils expressius. Per aquest fet l'esforç vocal (VE) es va escollir per el seu paper important en la parla expressiva. Primer es va estudiar la possibilitat de transferir l'VE entre dues realitzacions amb diferent VE de la mateixa paraula basant-se en la tècnica de predicció lineal adaptativa del filtre de pre-èmfasi (APLP). La proposta va permetre transferir l'VE correctament però presentava limitacions per a poder generar nivells intermitjos d'VE. Amb la finalitat de millorar la flexibilitat i control de l'VE expressat a la veu sintetitzada, es va proposar un nou model d'VE basat en polinomis lineals. Aquesta proposta va permetre transferir l'VE entre dues paraules qualsevols i sintetitzar nous nivells d'VE diferents dels disponibles al corpus. Aquesta flexibilitat esta alineada amb l'objectiu general d'aquesta tesi, permetre als sistemes US-TTS sintetitzar diferents estils expressius a partir d'un únic corpus d'estil neutre. La proposta realitzada també inclou un paràmetre que permet controlar fàcilment el nivell d'VE sintetitzat. Això obre moltes possibilitats per controlar fàcilment el procés de síntesi tal i com es va fer al projecte CreaVeu usant interfícies gràfiques simples i intuïtives, també realitzat dins del grup GTM. Aquesta memòria conclou presentant el treball realitzat en aquesta tesi i amb una proposta de modificació de l'esquema d'un sistema US-TTS per incloure els blocs de DSP desenvolupats en aquesta tesi que permetin al sistema sintetitzar múltiple nivells d'VE a partir d'un corpus d'estil neutre. Això obre moltes possibilitats per generar interfícies d'usuari que permetin controlar fàcilment el procés de síntesi, tal i com es va fer al projecte CreaVeu, també realitzat dins del grup GTM. Aquesta memòria conclou presentant el treball realitzat en aquesta tesi i amb una proposta de modificació de l'esquema del sistema US-TTS per incloure els blocs de DSP desenvolupats en aquesta tesi que permetin al sistema sintetitzar múltiple nivells d'VE a partir d'un corpus d'estil neutre.Esta tesis se llevó a cabo en el Grup en Tecnologies Mèdia de la Escuela de Ingeniería y Arquitectura la Salle. El grupo lleva una larga trayectoria dentro del campo de la síntesis de voz y cuenta con su propio sistema de síntesis por concatenación de unidades (US-TTS). El sistema permite sintetizar múltiples estilos expresivos mediante el uso de corpus específicos para cada estilo expresivo. De este modo, para realizar una síntesis agresiva, el sistema usa el corpus de este estilo, y para un estilo sensual, usa otro corpus específico para ese estilo. La presente tesis aborda el problema con un enfoque distinto proponiendo cambios en el esquema del sistema con el fin de mejorar la flexibilidad para sintetizar múltiples estilos expresivos a partir de un único corpus de estilo de habla neutro. El planteamiento seguido en esta tesis esta basado en el uso de técnicas de procesamiento de señales (DSP) para llevar a cabo modificaciones del señal de voz para que este exprese el estilo de habla deseado. Para llevar acabo las modificaciones de la señal de voz se han usado los modelos harmónico más ruido (HNM) por su flexibilidad para efectuar modificaciones de señales. La cualidad de la voz (VoQ) juega un papel importante en diferentes estilos expresivos. Por ello se exploró la síntesis expresiva basada en modificaciones de parámetros de bajo nivel de la VoQ. Durante este estudio se detectaron diferentes problemas que dieron pié a los objetivos planteados en esta tesis, entre ellos el encontrar un único parámetro con fuerte influencia en la expresividad. El parámetro seleccionado fue el esfuerzo vocal (VE) por su importante papel a la hora de expresar diferentes emociones. Las primeras pruebas se realizaron con el fin de transferir el VE entre dos realizaciones con diferente grado de VE de la misma palabra usando una metodología basada en un proceso filtrado de pre-émfasis adaptativo con coeficientes de predicción lineales (APLP). Esta primera aproximación logró transferir el nivel de VE entre dos realizaciones de la misma palabra, sin embargo el proceso presentaba limitaciones para generar niveles de esfuerzo vocal intermedios. A fin de mejorar la flexibilidad y el control del sistema para expresar diferentes niveles de VE, se planteó un nuevo modelo de VE basado en polinomios lineales. Este modelo permitió transferir el VE entre dos palabras diferentes e incluso generar nuevos niveles no presentes en el corpus usado para la síntesis. Esta flexibilidad está alineada con el objetivo general de esta tesis de permitir a un sistema US-TTS expresar múltiples estilos de habla expresivos a partir de un único corpus de estilo neutro. Además, la metodología propuesta incorpora un parámetro que permite de forma sencilla controlar el nivel de VE expresado en la voz sintetizada. Esto abre la posibilidad de controlar fácilmente el proceso de síntesis tal y como se hizo en el proyecto CreaVeu usando interfaces simples e intuitivas, también realizado dentro del grupo GTM. Esta memoria concluye con una revisión del trabajo realizado en esta tesis y con una propuesta de modificación de un esquema de US-TTS para expresar diferentes niveles de VE a partir de un único corpus neutro.This thesis was conducted in the Grup en Tecnologies M`edia (GTM) from Escola d’Enginyeria i Arquitectura la Salle. The group has a long trajectory in the speech synthesis field and has developed their own Unit-Selection Text-To-Speech (US-TTS) which is able to convey multiple expressive styles using multiple expressive corpora, one for each expressive style. Thus, in order to convey aggressive speech, the US-TTS uses an aggressive corpus, whereas for a sensual speech style, the system uses a sensual corpus. Unlike that approach, this dissertation aims to present a new schema for enhancing the flexibility of the US-TTS system for performing multiple expressive styles using a single neutral corpus. The approach followed in this dissertation is based on applying Digital Signal Processing (DSP) techniques for carrying out speech modifications in order to synthesize the desired expressive style. For conducting the speech modifications the Harmonics plus Noise Model (HNM) was chosen for its flexibility in conducting signal modifications. Voice Quality (VoQ) has been proven to play an important role in different expressive styles. Thus, low-level VoQ acoustic parameters were explored for conveying multiple emotions. This raised several problems setting new objectives for the rest of the thesis, among them finding a single parameter with strong impact on the expressive style conveyed. Vocal Effort (VE) was selected for conducting expressive speech style modifications due to its salient role in expressive speech. The first approach working with VE was based on transferring VE between two parallel utterances based on the Adaptive Pre-emphasis Linear Prediction (APLP) technique. This approach allowed transferring VE but the model presented certain restrictions regarding its flexibility for generating new intermediate VE levels. Aiming to improve the flexibility and control of the conveyed VE, a new approach using polynomial model for modelling VE was presented. This model not only allowed transferring VE levels between two different utterances, but also allowed to generate other VE levels than those present in the speech corpus. This is aligned with the general goal of this thesis, allowing US-TTS systems to convey multiple expressive styles with a single neutral corpus. Moreover, the proposed methodology introduces a parameter for controlling the degree of VE in the synthesized speech signal. This opens new possibilities for controlling the synthesis process such as the one in the CreaVeu project using a simple and intuitive graphical interfaces, also conducted in the GTM group. The dissertation concludes with a review of the conducted work and a proposal for schema modifications within a US-TTS system for introducing the VE modification blocks designed in this dissertation

    Proceedings of the VIIth GSCP International Conference

    Get PDF
    The 7th International Conference of the Gruppo di Studi sulla Comunicazione Parlata, dedicated to the memory of Claire Blanche-Benveniste, chose as its main theme Speech and Corpora. The wide international origin of the 235 authors from 21 countries and 95 institutions led to papers on many different languages. The 89 papers of this volume reflect the themes of the conference: spoken corpora compilation and annotation, with the technological connected fields; the relation between prosody and pragmatics; speech pathologies; and different papers on phonetics, speech and linguistic analysis, pragmatics and sociolinguistics. Many papers are also dedicated to speech and second language studies. The online publication with FUP allows direct access to sound and video linked to papers (when downloaded)

    Cognitive load theory and listening to accent variations in English

    Full text link
    Accent variability is an emerging field of study in listening to varieties of English. Mutual intelligibility of accent variations in monolingual, as well as multilingual settings may become challenging for native as well as non-native speakers of English. In a CLT framework this thesis examined the accent variability effect and the expertise reversal effect in listening to native and foreign-accented English with different levels of expertise groups. The three experiments reported in this thesis addressed issues of how accent variability boosted meaningful understanding of listening comprehensions, and how instructional design could aide learning in perceptual listening environments so that learners did not become entangled in the novelty of the accents; at the same time maximising the learning of such instructional procedures.In Experiment 1 three single-accent conditions and six multiple-accent conditions were used. The accents were Australian English, Chinese-accented English and Russian-accented English. These three accents were permuted in six combinations to have the six multiple-accent conditions. The results of Experiment 1 did not support the hypotheses. The low expertise learners did not perform better in single-accent conditions and the high expertise learners did not perform better in multiple-accent conditions. In Experiment 2 Russian-accented English and Australian English were employed. The results partially supported the hypotheses. It was found that the single-accent condition was not easier for the low expertise students whereas the dual-accent condition was easier for the high and very high expertise students. In Experiment 3 the low expertise group listening to Indian-accented English found the accent condition easier than the low expertise group listening to both Indian and Arabic-accented English. The high and very high expertise students learned more listening to Arabic and Indian-accented English than listening to Indian-accented English only. The low expertise individuals were more prone to be challenged by the novelty of the dual-accent conditions. The findings of the experiments were explained in terms of accent variability effect and expertise reversal effect in a CLT framework. Instructional design, as pertaining to this thesis facilitated the naïve, as well as expert English language learners’ abilities in extracting accent-independent global adaptation to English within a CLT framework

    IberSPEECH 2020: XI Jornadas en Tecnología del Habla and VII Iberian SLTech

    Get PDF
    IberSPEECH2020 is a two-day event, bringing together the best researchers and practitioners in speech and language technologies in Iberian languages to promote interaction and discussion. The organizing committee has planned a wide variety of scientific and social activities, including technical paper presentations, keynote lectures, presentation of projects, laboratories activities, recent PhD thesis, discussion panels, a round table, and awards to the best thesis and papers. The program of IberSPEECH2020 includes a total of 32 contributions that will be presented distributed among 5 oral sessions, a PhD session, and a projects session. To ensure the quality of all the contributions, each submitted paper was reviewed by three members of the scientific review committee. All the papers in the conference will be accessible through the International Speech Communication Association (ISCA) Online Archive. Paper selection was based on the scores and comments provided by the scientific review committee, which includes 73 researchers from different institutions (mainly from Spain and Portugal, but also from France, Germany, Brazil, Iran, Greece, Hungary, Czech Republic, Ucrania, Slovenia). Furthermore, it is confirmed to publish an extension of selected papers as a special issue of the Journal of Applied Sciences, “IberSPEECH 2020: Speech and Language Technologies for Iberian Languages”, published by MDPI with fully open access. In addition to regular paper sessions, the IberSPEECH2020 scientific program features the following activities: the ALBAYZIN evaluation challenge session.Red Española de Tecnologías del Habla. Universidad de Valladoli

    Prosodic and Voice Quality Cross-Language Analysis of Storytelling Expressive Categories Oriented to Text-To-Speech Synthesis

    Get PDF
    Durant segles, la interpretació oral de contes i històries ha sigut una tradició mundial lligada a l’entreteniment, la educació, i la perpetuació de la cultura. En les últimes dècades, alguns treballs s’han centrat en analitzar aquest estil de parla ric en matisos expressius caracteritzats per determinats patrons acústics. En relació a això, també hi ha hagut un interès creixent en desenvolupar aplicacions de contar contes, com ara les de contacontes interactius. Aquesta tesi està orientada a millorar aspectes claus d’aquest tipus d’aplicacions: millorar la naturalitat de la parla sintètica expressiva a partir d’analitzar la parla de contacontes en detall, a més a més de proporcionar un millor llenguatge no verbal a un avatar parlant mitjançant la sincronització de la parla i els gestos. Per aconseguir aquests objectius és necessari comprendre les característiques acústiques d’aquest estil de parla i la interacció de la parla i els gestos. Pel que fa a característiques acústiques de la parla de contacontes, la literatura relacionada ha treballat en termes de prosòdia, mentre que només ha estat suggerit que la qualitat de la veu pot jugar un paper important per modelar les subtileses d’aquest estil. En aquesta tesi, el paper tant de la prosòdia com de la qualitat de la veu en l’estil indirecte de la parla de contacontes en diferents idiomes és analitzat per identificar les principal categories expressives que la composen i els paràmetres acústics que les caracteritzen. Per fer-ho, es proposa una metodologia d’anotació per aquest estil de parla a nivell de oració basada en modes de discurs dels contes (mode narratiu, descriptiu, i diàleg), introduint a més sub-modes narratius. Considerant aquesta metodologia d’anotació, l’estil indirecte d’una història orientada a una audiència jove (cobrint versions en castellà, anglès, francès, i alemany) és analitzat en termes de prosòdia i qualitat de la veu mitjançant anàlisis estadístics i discriminants, després de classificar els àudios de les oracions de la història en les seves categories expressives. Els resultats confirmen l’existència de les categories de contes amb diferències expressives subtils en tots els idiomes més enllà dels estils personals dels narradors. En aquest sentit, es presenten evidències que suggereixen que les categories expressives dels contes es transmeten amb matisos expressius més subtils que en les emocions bàsiques, després de comparar els resultats obtinguts amb aquells de parla emocional. Els anàlisis també mostren que la prosòdia i la qualitat de la veu contribueixen pràcticament de la mateixa manera a l’hora de discriminar entre les categories expressives dels contes, les quals son expressades amb patrons acústics similars en tots els idiomes analitzats. Cal destacar també la gran relació observada en la selecció de categoria per cada oració que han fet servir els diferents narradors encara quan, que sapiguem, no se’ls hi va donar cap indicació. Per poder traslladar totes aquestes categories a un sistema de text a parla basat en corpus, caldria enregistrar un corpus per cada categoria. No obstant, crear diferents corpus ad-hoc esdevé un tasca molt laboriosa. En la tesi, s’introdueix una alternativa basada en una metodologia d’anàlisi orientada a síntesi dissenyada per derivar models de regles des de un petit però representatiu conjunt d’oracions, que poden poder ser utilitzats per generar parla amb estil de contacontes a partir de parla neutra. Els experiments sobre suspens creixent com a prova de concepte mostren la viabilitat de la proposta en termes de naturalitat i similitud respecte un narrador de contes real. Finalment, pel que fa a interacció entre parla i gestos, es realitza un anàlisi de sincronia i èmfasi orientat a controlar un avatar de contacontes en 3D. Al tal efecte, es defineixen indicadors de força tant per els gestos com per la parla. Després de validar-los amb tests perceptius, una regla d’intensitat s’obté de la seva correlació. A més a més, una regla de sincronia es deriva per determinar correspondències temporals entre els gestos i la parla. Aquests anàlisis s’han dut a terme sobre interpretacions neutres i agressives per part d’un actor per cobrir un gran rang de nivells d’èmfasi, com a primer pas per avaluar la integració d’un avatar parlant després del sistema de text a parla.Durante siglos, la interpretación oral de cuentos e historias ha sido una tradición mundial ligada al entretenimiento, la educación, y la perpetuación de la cultura. En las últimas décadas, algunos trabajos se han centrado en analizar este estilo de habla rico en matices expresivos caracterizados por determinados patrones acústicos. En relación a esto, también ha habido un interés creciente en desarrollar aplicaciones de contar cuentos, como las de cuentacuentos interactivos. Esta tesis está orientada a mejorar aspectos claves de este tipo de aplicaciones: mejorar la naturalidad del habla sintética expresiva a partir de analizar el habla de cuentacuentos en detalle, además de proporcionar un mejor lenguaje no verbal a un avatar parlante mediante la sincronización del habla y los gestos. Para conseguir estos objetivos es necesario comprender las características acústicas de este estilo de habla y la interacción del habla y los gestos. En cuanto a características acústicas del habla de narradores de cuentos, la literatura relacionada ha trabajado en términos de prosodia, mientras que sólo ha sido sugerido que la calidad de la voz puede jugar un papel importante para modelar las sutilezas de este estilo. En esta tesis, el papel tanto de la prosodia como de la calidad de la voz en el estilo indirecto del habla de cuentacuentos en diferentes idiomas es analizado para identificar las principales categorías expresivas que componen este estilo de habla y los parámetros acústicos que las caracterizan. Para ello, se propone una metodología de anotación a nivel de oración basada en modos de discurso de los cuentos (modo narrativo, descriptivo, y diálogo), introduciendo además sub-modos narrativos. Considerando esta metodología de anotación, el estilo indirecto de una historia orientada a una audiencia joven (cubriendo versiones en castellano, inglés, francés, y alemán) es analizado en términos de prosodia y calidad de la voz mediante análisis estadísticos y discriminantes, después de clasificar los audios de las oraciones de la historia en sus categorías expresivas. Los resultados confirman la existencia de las categorías de cuentos con diferencias expresivas sutiles en todos los idiomas más allá de los estilos personales de los narradores. En este sentido, se presentan evidencias que sugieren que las categorías expresivas de los cuentos se transmiten con matices expresivos más sutiles que en las emociones básicas, tras comparar los resultados obtenidos con aquellos de habla emocional. Los análisis también muestran que la prosodia y la calidad de la voz contribuyen prácticamente de la misma manera a la hora de discriminar entre las categorías expresivas de los cuentos, las cuales son expresadas con patrones acústicos similares en todos los idiomas analizados. Cabe destacar también la gran relación observada en la selección de categoría para cada oración que han utilizado los diferentes narradores aun cuando, que sepamos, no se les dio ninguna indicación. Para poder trasladar todas estas categorías a un sistema de texto a habla basado en corpus, habría que grabar un corpus para cada categoría. Sin embargo, crear diferentes corpus ad-hoc es una tarea muy laboriosa. En la tesis, se introduce una alternativa basada en una metodología de análisis orientada a síntesis diseñada para derivar modelos de reglas desde un pequeño pero representativo conjunto de oraciones, que pueden ser utilizados para generar habla de cuentacuentos a partir de neutra. Los experimentos sobre suspense creciente como prueba de concepto muestran la viabilidad de la propuesta en términos de naturalidad y similitud respecto a un narrador de cuentos real. Finalmente, en cuanto a interacción entre habla y gestos, se realiza un análisis de sincronía y énfasis orientado a controlar un avatar cuentacuentos en 3D. Al tal efecto, se definen indicadores de fuerza tanto para gestos como para habla. Después de validarlos con tests perceptivos, una regla de intensidad se obtiene de su correlación. Además, una regla de sincronía se deriva para determinar correspondencias temporales entre los gestos y el habla. Estos análisis se han llevado a cabo sobre interpretaciones neutras y agresivas por parte de un actor para cubrir un gran rango de niveles de énfasis, como primer paso para evaluar la integración de un avatar parlante después del sistema de texto a habla.For ages, the oral interpretation of tales and stories has been a worldwide tradition tied to entertainment, education, and perpetuation of culture. During the last decades, some works have focused on the analysis of this particular speaking style rich in subtle expressive nuances represented by specific acoustic cues. In line with this fact, there has also been a growing interest in the development of storytelling applications, such as those related to interactive storytelling. This thesis deals with one of the key aspects of audiovisual storytellers: improving the naturalness of the expressive synthetic speech by analysing the storytelling speech in detail, together with providing better non-verbal language to a speaking avatar by synchronizing that speech with its gestures. To that effect, it is necessary to understand in detail the acoustic characteristics of this particular speaking style and the interaction between speech and gestures. Regarding the acoustic characteristics of storytelling speech, the related literature has dealt with the acoustic analysis of storytelling speech in terms of prosody, being only suggested that voice quality may play an important role for the modelling of its subtleties. In this thesis, the role of both prosody and voice quality in indirect storytelling speech is analysed across languages to identify the main expressive categories it is composed of together with the acoustic parameters that characterize them. To do so, an analysis methodology is proposed to annotate this particular speaking style at the sentence level based on storytelling discourse modes (narrative, descriptive, and dialogue), besides introducing narrative sub-modes. Considering this annotation methodology, the indirect speech of a story oriented to a young audience (covering the Spanish, English, French, and German versions) is analysed in terms of prosody and voice quality through statistical and discriminant analyses, after classifying the sentence-level utterances of the story in their corresponding expressive categories. The results confirm the existence of storytelling categories containing subtle expressive nuances across the considered languages beyond narrators' personal styles. In this sense, evidences are presented suggesting that such storytelling expressive categories are conveyed with subtler speech nuances than basic emotions by comparing their acoustic patterns to the ones obtained from emotional speech data. The analyses also show that both prosody and voice quality contribute almost equally to the discrimination among storytelling expressive categories, being conveyed with similar acoustic patterns across languages. It is also worth noting the strong relationship observed in the selection of the expressive category per utterance across the narrators even when, up to our knowledge, no previous indications were given to them. In order to translate all these expressive categories to a corpus-based Text-To-Speech system, the recording of a speech corpus for each category would be required. However, building ad-hoc speech corpora for each and every specific expressive style becomes a very daunting task. In this work, we introduce an alternative based on an analysis-oriented-to-synthesis methodology designed to derive rule-based models from a small but representative set of utterances, which can be used to generate storytelling speech from neutral speech. The experiments conducted on increasing suspense as a proof of concept show the viability of the proposal in terms of naturalness and storytelling resemblance. Finally, in what concerns the interaction between speech and gestures, an analysis is performed in terms of time and emphasis oriented to drive a 3D storytelling avatar. To that effect, strength indicators are defined for speech and gestures. After validating them through perceptual tests, an intensity rule is obtained from their correlation. Moreover, a synchrony rule is derived to determine temporal correspondences between speech and gestures. These analyses have been conducted on aggressive and neutral performances to cover a broad range of emphatic levels as a first step to evaluate the integration of a speaking avatar after the expressive Text-To-Speech system

    Multimedia Development of English Vocabulary Learning in Primary School

    Get PDF
    In this paper, we describe a prototype of web-based intelligent handwriting education system for autonomous learning of Bengali characters. Bengali language is used by more than 211 million people of India and Bangladesh. Due to the socio-economical limitation, all of the population does not have the chance to go to school. This research project was aimed to develop an intelligent Bengali handwriting education system. As an intelligent tutor, the system can automatically check the handwriting errors, such as stroke production errors, stroke sequence errors, stroke relationship errors and immediately provide a feedback to the students to correct themselves. Our proposed system can be accessed from smartphone or iPhone that allows students to do practice their Bengali handwriting at anytime and anywhere. Bengali is a multi-stroke input characters with extremely long cursive shaped where it has stroke order variability and stroke direction variability. Due to this structural limitation, recognition speed is a crucial issue to apply traditional online handwriting recognition algorithm for Bengali language learning. In this work, we have adopted hierarchical recognition approach to improve the recognition speed that makes our system adaptable for web-based language learning. We applied writing speed free recognition methodology together with hierarchical recognition algorithm. It ensured the learning of all aged population, especially for children and older national. The experimental results showed that our proposed hierarchical recognition algorithm can provide higher accuracy than traditional multi-stroke recognition algorithm with more writing variability

    Greek teachers’ understandings of Typical Language Development and of Language difficulties

    Get PDF
    Language is a dynamic learning mechanism for children. Oral language skills are pivotal to all children and should be practiced in schools. However, not all children develop language typically and some may experience language difficulties at differing levels and degrees of severity. As the concept of inclusion has gained currency in many countries, it is expected that larger numbers of students whose difficulties are not severe enough to be admitted to a special school, will be educated in mainstream classrooms alongside children with typical language development. Thus, teachers are increasingly faced with the challenge of teaching students with differing profiles of needs. However, research has paid little attention so far to teachers’ views and to their preparedness to cope with such challenges. This study was based on a Sequential Exploratory Mixed Methods Design deployed in three consecutive and integrative phases. The first phase involved 18 exploratory interviews with teachers. Its findings informed the second phase involving a questionnaire survey with 119 respondents. Contradictory questionnaire results were further investigated in a third phase employing a formal testing procedure with 60 children attending Y1, Y2 and Y3 of primary school. Results showed both strengths and weaknesses in teachers’ awareness of language related issues and of language difficulties and gaps in their expertise to meet the needs of children with language difficulties. However, they also provided a different perspective of children’s language needs and of language teaching approaches. This perspective reflected current advances in language problems and synchronous conceptualizations of inclusion and opened a new window on how to optimize existing teaching approaches so as to promote language development for all students in class while at the same time supporting the specific needs of children with language difficulties in an inclusive ethos
    corecore