Search CORE

2,139 research outputs found

LAUGHTER DETECTION FOR ON-LINE HUMAN-ROBOT INTERACTION

Author: Devillers Laurence
Tahon Marie
Publication venue: HAL CCSD
Publication date: 14/04/2015
Field of study

International audienceThis paper presents a study of laugh classification using a cross-corpus protocol. It aims at the automatic detection of laughs in a real-time human-machine interaction. Positive and negative laughs are tested with different classification tasks and different acoustic feature sets. F.measure results show an improvement on positive laughs classification from 59.5% to 64.5% and negative laughs recognition from 10.3% to 28.5%. In the context of the Chist-Era JOKER project, positive and negative laugh detection drives the policies of the robot Nao. A measure of engagement will be provided using also the number of positive laughs detected during the interaction

Vocal turn-taking patterns in groups of children performing collaborative tasks: an exploratory study

Author: Charisi Vicky
Evers Vanessa
Heylen Dirk
Kim Jaebok
Lohse Manja
Truong Khiet P.
Zaga Cristina
Publication venue: International Speech Communication Association (ISCA)
Publication date: 01/09/2015
Field of study

Since children (5-9 years old) are still developing their emotional and social skills, their social interactional behaviors in small groups might differ from adults' interactional behaviors. In order to develop a robot that is able to support children performing collaborative tasks in small groups, it is necessary to gain a better understanding of how children interact with each other. We were interested in investigating vocal turn-taking patterns as we expect these to reveal relations to collaborative and conflict behaviors, especially with children behaviors as previous literature suggests. To that end, we collected an audiovisual corpus of children performing collaborative tasks together in groups of three. Through automatic turn-taking analyses, our results showed that speaker changes with overlaps are more common than without overlaps and children seemed to show smoother turn-taking patterns, i.e., less frequent and longer lasting speaker changes, during collaborative than conflict behaviors

University of Twente Research Information

An exploration of the rhythm of Malay

Author: Docherty G. J
Samoylova Ekaterina
Wan Ahmad Wan Aslynn Salwani
Publication venue
Publication date: 01/01/2010
Field of study

In recent years there has been a surge of interest in speech rhythm. However we still lack a clear understanding of the nature of rhythm and rhythmic differences across languages. Various metrics have been proposed as means for measuring rhythm on the phonetic level and making typological comparisons between languages (Ramus et al, 1999; Grabe & Low, 2002; Dellwo, 2006) but the debate is ongoing on the extent to which these metrics capture the rhythmic basis of speech (Arvaniti, 2009; Fletcher, in press). Furthermore, cross linguistic studies of rhythm have covered a relatively small number of languages and research on previously unclassified languages is necessary to fully develop the typology of rhythm. This study examines the rhythmic features of Malay, for which, to date, relatively little work has been carried out on aspects rhythm and timing. The material for the analysis comprised 10 sentences produced by 20 speakers of standard Malay (10 males and 10 females). The recordings were first analysed using rhythm metrics proposed by Ramus et. al (1999) and Grabe & Low (2002). These metrics (∆C, %V, rPVI, nPVI) are based on durational measurements of vocalic and consonantal intervals. The results indicated that Malay clustered with other so-called syllable-timed languages like French and Spanish on the basis of all metrics. However, underlying the overall findings for these metrics there was a large degree of variability in values across speakers and sentences, with some speakers having values in the range typical of stressed-timed languages like English. Further analysis has been carried out in light of Fletcher’s (in press) argument that measurements based on duration do not wholly reflect speech rhythm as there are many other factors that can influence values of consonantal and vocalic intervals, and Arvaniti’s (2009) suggestion that other features of speech should also be considered in description of rhythm to discover what contributes to listeners’ perception of regularity. Spectrographic analysis of the Malay recordings brought to light two parameters that displayed consistency and regularity for all speakers and sentences: the duration of individual vowels and the duration of intervals between intensity minima. This poster presents the results of these investigations and points to connections between the features which seem to be consistently regulated in the timing of Malay connected speech and aspects of Malay phonology. The results are discussed in light of current debate on the descriptions of rhythm

The International Islamic University Malaysia Repository

The meaning of kiss-teeth

Author: Figueroa E
Patrick PL
Publication venue: Essex Research Reports in Linguistics
Publication date: 01/01/2001
Field of study

University of Essex Research Repository

The Neurosciences and Music Conferences. An overview 2011-2017.:Part one: Text.

Author: Christensen Erik
Publication venue
Publication date: 01/01/2021
Field of study

VBN

From sociolinguistic variation to socially strategic stylisation

Author: Agha
Anderwald
Bakhtin
Barker
Bauman
Bauman
Beal
Beal
Betts
Bucholtz
Bucholtz
Campbell-Kibler
Coupland
Coupland
Coupland
Coupland
Coupland
De Fina
Deppermann
Drew
Du Bois
Eckert
Eckert
Goffman
Goffman
Griffiths
Hollmann
Irvine
Johnstone
Johnstone
Johnstone
Johnstone
Kerswill
Kiesling
Kortmann
Labov
Labov
Ladefoged
Ladefoged
Longden
Maybin
Milroy
Moore
Moore
Ochs
Ochs
Ochs
Podesva
Quirk
Rampton
Rampton
Rampton
Rose
Silverstein
Snell
Trudgill
Wales
Weinreich
Willoughby
Publication venue: 'Wiley'
Publication date: 01/11/2010
Field of study

This article investigates the indexical relation between language, interactional stance and social class. Quantitative sociolinguistic analysis of a linguistic variable (the first person possessive singular) is combined with micro-ethnographic analysis of the way one particular variant, possessive ‘me’ (e.g. Me pencil’s up me jumper), is used by speakers in interaction. The aim of this analysis is to explore: (1) how possessive ‘me’ is implicated in the construction and management of local identities and relationships, and (2) how macro-social categories, such as social class, relate to language choice. The data for this analysis comes from an ethnographic study of the language practices of 9- to 10-year-old children in two socially-differentiated primary schools in north-east England. A secondary aim of the article is to spotlight the sociolinguistic sophistication of these young children, in particular, the working-class participants, who challenge the notion that the speech of working-class children is in any way ‘impoverished’

Crossref

King's Research Portal

White Rose Research Online

Recommended from our members

Exploring change : oral metadiscourse of advanced learners of Russian in extended study abroad

Author: Wilkins Evgenia Mikhaylova
Publication venue
Publication date: 22/02/2018
Field of study

Abstract: In this dissertation, I propose to examine the oral metadiscourse of advanced learners of Russian (RAL2). The data is drawn from speech samples collected at Time 1 and Time 2 during the subjects’ yearlong residence abroad. The first oral segment portrays RAL2s’ metadiscourse (MD) after four months of in-country residence, and the second oral segment demonstrates changes in MD that result from an additional five months spent in the target language environment. Speech samples include role-play and narration, which are the tasks that RAL2 carry out in the Test of Russian as a Foreign Language level 3 (TORFL-3) Professional mastery, speaking portion. From the perspective of the current study, TORFL-3 role-play situated in a professional context most vividly demonstrates the composition of RAL2 oral metadiscourse as participants engage in organizing their message and positioning themselves in a formal setting. In order to understand whether task format bears any significance, I also consider narrative from TORFL-3 and provide a between-task comparison of metadiscourse. To explore oral metadiscourse in RAL2s’ speech, I apply the functional framework of metadiscourse put forth by Hyland (2005). Such analysis illuminates the composition of unexplored facets of proficiency by offering a description of an RAL2 metadiscourse profile. Furthermore, this dissertation addresses the question of nativelikeness by comparing RAL2s’ and native speakers’ metadiscourse in role-plays. I explore the extent to which RAL2’s and native speakers’ (NS) metadiscourse exhibit similarities. The findings herein contribute to research on long-term study abroad gains, and they offer implications for instruction in the area of metadiscourse at the advanced level of proficiency.Slavic and Eurasian Studie

Texas ScholarWorks

Concatenative speech synthesis: a Framework for Reducing Perceived Distortion when using the TD-PSOLA Algorithm

Author: Longster Jennifer Ann
Publication venue
Publication date
Field of study

This thesis presents the design and evaluation of an approach to concatenative speech synthesis using the Titne-Domain Pitch-Synchronous OverLap-Add (I'D-PSOLA) signal processing algorithm. Concatenative synthesis systems make use of pre-recorded speech segments stored in a speech corpus. At synthesis time, the `best' segments available to synthesise the new utterances are chosen from the corpus using a process known as unit selection. During the synthesis process, the pitch and duration of these segments may be modified to generate the desired prosody. The TD-PSOLA algorithm provides an efficient and essentially successful solution to perform these modifications, although some perceptible distortion, in the form of `buzzyness', may be introduced into the speech signal. Despite the popularity of the TD-PSOLA algorithm, little formal research has been undertaken to address this recognised problem of distortion. The approach in the thesis has been developed towards reducing the perceived distortion that is introduced when TD-PSOLA is applied to speech. To investigate the occurrence of this distortion, a psychoacoustic evaluation of the effect of pitch modification using the TD-PSOLA algorithm is presented. Subjective experiments in the form of a set of listening tests were undertaken using word-level stimuli that had been manipulated using TD-PSOLA. The data collected from these experiments were analysed for patterns of co- occurrence or correlations to investigate where this distortion may occur. From this, parameters were identified which may have contributed to increased distortion. These parameters were concerned with the relationship between the spectral content of individual phonemes, the extent of pitch manipulation, and aspects of the original recordings. Based on these results, a framework was designed for use in conjunction with TD-PSOLA to minimise the possible causes of distortion. The framework consisted of a novel speech corpus design, a signal processing distortion measure, and a selection process for especially problematic phonemes. Rather than phonetically balanced, the corpus is balanced to the needs of the signal processing algorithm, containing more of the adversely affected phonemes. The aim is to reduce the potential extent of pitch modification of such segments, and hence produce synthetic speech with less perceptible distortion. The signal processingdistortion measure was developed to allow the prediction of perceptible distortion in pitch-modified speech. Different weightings were estimated for individual phonemes,trained using the experimental data collected during the listening tests.The potential benefit of such a measure for existing unit selection processes in a corpus-based system using TD-PSOLA is illustrated. Finally, the special-case selection process was developed for highly problematic voiced fricative phonemes to minimise the occurrence of perceived distortion in these segments. The success of the framework, in terms of generating synthetic speech with reduced distortion, was evaluated. A listening test showed that the TD-PSOLA balanced speech corpus may be capable of generating pitch-modified synthetic sentences with significantly less distortion than those generated using a typical phonetically balanced corpus. The voiced fricative selection process was also shown to produce pitch-modified versions of these phonemes with less perceived distortion than a standard selection process. The listening test then indicated that the signal processing distortion measure was able to predict the resulting amount of distortion at the sentence-level after the application of TD-PSOLA, suggesting that it may be beneficial to include such a measure in existing unit selection processes. The framework was found to be capable of producing speech with reduced perceptible distortion in certain situations, although the effects seen at the sentence-level were less than those seen in the previous investigative experiments that made use of word-level stimuli. This suggeststhat the effect of the TD-PSOLA algorithm cannot always be easily anticipated due to the highly dynamic nature of speech, and that the reduction of perceptible distortion in TD-PSOLA-modified speech remains a challenge to the speech community

Bournemouth University Research Online

A cross-cultural investigation of the vocal correlates of emotion

Author: Tickle Alison Anne
Publication venue: Newcastle University
Publication date: 01/01/2015
Field of study

PhD ThesisUniversal and culture-specific properties of the vocal communication of human emotion are investigated in this balanced study focussing on encoding and decoding of Happy, Sad, Angry, Fearful and Calm by English and Japanese participants (eight female encoders for each culture, and eight female and eight male decoders for each culture). Previous methodologies and findings are compared. This investigation is novel in the design of symmetrical procedures to facilitate cross-cultural comparison of results of decoding tests and acoustic analysis; a simulation/self-induction method was used in which participants from both cultures produced, as far as possible, the same pseudo-utterances. All emotions were distinguished beyond chance irrespective of culture, except for Japanese participants’ decoding of English Fearful, which was decoded at a level borderline with chance. Angry and Sad were well-recognised, both in-group and cross-culturally and Happy was identified well in-group. Confusions between emotions tended to follow dimensional lines of arousal or valence. Acoustic analysis found significant distinctions between all emotions for each culture, except between the two low arousal emotions Sad and Calm. Evidence of ‘In-Group Advantage’ was found for English decoding of Happy, Fearful and Calm and for Japanese decoding of Happy; there is support for previous evidence of East/West cultural differences in display rules. A novel concept is suggested for the finding that Japanese decoders identified Happy, Sad and Angry more reliably from English than from Japanese expressions. Whilst duration, fundamental frequency and intensity all contributed to distinctions between emotions for English, only measures of fundamental frequency were found to significantly distinguish emotions in Japanese. Acoustic cues tended to be less salient in Japanese than in English when compared to expected cues for high and low arousal emotions. In addition, new evidence was found of cross-cultural influence of vowel quality upon emotion recognition

Newcastle University eTheses

Models and analysis of vocal emissions for biomedical applications: 5th International Workshop: December 13-15, 2007, Firenze, Italy

Author
Publication venue: 'Firenze University Press'
Publication date: 31/05/2022
Field of study

The MAVEBA Workshop proceedings, held on a biannual basis, collect the scientific papers presented both as oral and poster contributions, during the conference. The main subjects are: development of theoretical and mechanical models as an aid to the study of main phonatory dysfunctions, as well as the biomedical engineering methods for the analysis of voice signals and images, as a support to clinical diagnosis and classification of vocal pathologies. The Workshop has the sponsorship of: Ente Cassa Risparmio di Firenze, COST Action 2103, Biomedical Signal Processing and Control Journal (Elsevier Eds.), IEEE Biomedical Engineering Soc. Special Issues of International Journals have been, and will be, published, collecting selected papers from the conference

Directory of Open Access Books (DOAB)