2,139 research outputs found

    LAUGHTER DETECTION FOR ON-LINE HUMAN-ROBOT INTERACTION

    Get PDF
    International audienceThis paper presents a study of laugh classification using a cross-corpus protocol. It aims at the automatic detection of laughs in a real-time human-machine interaction. Positive and negative laughs are tested with different classification tasks and different acoustic feature sets. F.measure results show an improvement on positive laughs classification from 59.5% to 64.5% and negative laughs recognition from 10.3% to 28.5%. In the context of the Chist-Era JOKER project, positive and negative laugh detection drives the policies of the robot Nao. A measure of engagement will be provided using also the number of positive laughs detected during the interaction

    Vocal turn-taking patterns in groups of children performing collaborative tasks: an exploratory study

    Get PDF
    Since children (5-9 years old) are still developing their emotional and social skills, their social interactional behaviors in small groups might differ from adults' interactional behaviors. In order to develop a robot that is able to support children performing collaborative tasks in small groups, it is necessary to gain a better understanding of how children interact with each other. We were interested in investigating vocal turn-taking patterns as we expect these to reveal relations to collaborative and conflict behaviors, especially with children behaviors as previous literature suggests. To that end, we collected an audiovisual corpus of children performing collaborative tasks together in groups of three. Through automatic turn-taking analyses, our results showed that speaker changes with overlaps are more common than without overlaps and children seemed to show smoother turn-taking patterns, i.e., less frequent and longer lasting speaker changes, during collaborative than conflict behaviors

    An exploration of the rhythm of Malay

    Get PDF
    In recent years there has been a surge of interest in speech rhythm. However we still lack a clear understanding of the nature of rhythm and rhythmic differences across languages. Various metrics have been proposed as means for measuring rhythm on the phonetic level and making typological comparisons between languages (Ramus et al, 1999; Grabe & Low, 2002; Dellwo, 2006) but the debate is ongoing on the extent to which these metrics capture the rhythmic basis of speech (Arvaniti, 2009; Fletcher, in press). Furthermore, cross linguistic studies of rhythm have covered a relatively small number of languages and research on previously unclassified languages is necessary to fully develop the typology of rhythm. This study examines the rhythmic features of Malay, for which, to date, relatively little work has been carried out on aspects rhythm and timing. The material for the analysis comprised 10 sentences produced by 20 speakers of standard Malay (10 males and 10 females). The recordings were first analysed using rhythm metrics proposed by Ramus et. al (1999) and Grabe & Low (2002). These metrics (∆C, %V, rPVI, nPVI) are based on durational measurements of vocalic and consonantal intervals. The results indicated that Malay clustered with other so-called syllable-timed languages like French and Spanish on the basis of all metrics. However, underlying the overall findings for these metrics there was a large degree of variability in values across speakers and sentences, with some speakers having values in the range typical of stressed-timed languages like English. Further analysis has been carried out in light of Fletcher’s (in press) argument that measurements based on duration do not wholly reflect speech rhythm as there are many other factors that can influence values of consonantal and vocalic intervals, and Arvaniti’s (2009) suggestion that other features of speech should also be considered in description of rhythm to discover what contributes to listeners’ perception of regularity. Spectrographic analysis of the Malay recordings brought to light two parameters that displayed consistency and regularity for all speakers and sentences: the duration of individual vowels and the duration of intervals between intensity minima. This poster presents the results of these investigations and points to connections between the features which seem to be consistently regulated in the timing of Malay connected speech and aspects of Malay phonology. The results are discussed in light of current debate on the descriptions of rhythm

    The meaning of kiss-teeth

    Get PDF

    From sociolinguistic variation to socially strategic stylisation

    Get PDF
    This article investigates the indexical relation between language, interactional stance and social class. Quantitative sociolinguistic analysis of a linguistic variable (the first person possessive singular) is combined with micro-ethnographic analysis of the way one particular variant, possessive ‘me’ (e.g. Me pencil’s up me jumper), is used by speakers in interaction. The aim of this analysis is to explore: (1) how possessive ‘me’ is implicated in the construction and management of local identities and relationships, and (2) how macro-social categories, such as social class, relate to language choice. The data for this analysis comes from an ethnographic study of the language practices of 9- to 10-year-old children in two socially-differentiated primary schools in north-east England. A secondary aim of the article is to spotlight the sociolinguistic sophistication of these young children, in particular, the working-class participants, who challenge the notion that the speech of working-class children is in any way ‘impoverished’

    Concatenative speech synthesis: a Framework for Reducing Perceived Distortion when using the TD-PSOLA Algorithm

    Get PDF
    This thesis presents the design and evaluation of an approach to concatenative speech synthesis using the Titne-Domain Pitch-Synchronous OverLap-Add (I'D-PSOLA) signal processing algorithm. Concatenative synthesis systems make use of pre-recorded speech segments stored in a speech corpus. At synthesis time, the `best' segments available to synthesise the new utterances are chosen from the corpus using a process known as unit selection. During the synthesis process, the pitch and duration of these segments may be modified to generate the desired prosody. The TD-PSOLA algorithm provides an efficient and essentially successful solution to perform these modifications, although some perceptible distortion, in the form of `buzzyness', may be introduced into the speech signal. Despite the popularity of the TD-PSOLA algorithm, little formal research has been undertaken to address this recognised problem of distortion. The approach in the thesis has been developed towards reducing the perceived distortion that is introduced when TD-PSOLA is applied to speech. To investigate the occurrence of this distortion, a psychoacoustic evaluation of the effect of pitch modification using the TD-PSOLA algorithm is presented. Subjective experiments in the form of a set of listening tests were undertaken using word-level stimuli that had been manipulated using TD-PSOLA. The data collected from these experiments were analysed for patterns of co- occurrence or correlations to investigate where this distortion may occur. From this, parameters were identified which may have contributed to increased distortion. These parameters were concerned with the relationship between the spectral content of individual phonemes, the extent of pitch manipulation, and aspects of the original recordings. Based on these results, a framework was designed for use in conjunction with TD-PSOLA to minimise the possible causes of distortion. The framework consisted of a novel speech corpus design, a signal processing distortion measure, and a selection process for especially problematic phonemes. Rather than phonetically balanced, the corpus is balanced to the needs of the signal processing algorithm, containing more of the adversely affected phonemes. The aim is to reduce the potential extent of pitch modification of such segments, and hence produce synthetic speech with less perceptible distortion. The signal processingdistortion measure was developed to allow the prediction of perceptible distortion in pitch-modified speech. Different weightings were estimated for individual phonemes,trained using the experimental data collected during the listening tests.The potential benefit of such a measure for existing unit selection processes in a corpus-based system using TD-PSOLA is illustrated. Finally, the special-case selection process was developed for highly problematic voiced fricative phonemes to minimise the occurrence of perceived distortion in these segments. The success of the framework, in terms of generating synthetic speech with reduced distortion, was evaluated. A listening test showed that the TD-PSOLA balanced speech corpus may be capable of generating pitch-modified synthetic sentences with significantly less distortion than those generated using a typical phonetically balanced corpus. The voiced fricative selection process was also shown to produce pitch-modified versions of these phonemes with less perceived distortion than a standard selection process. The listening test then indicated that the signal processing distortion measure was able to predict the resulting amount of distortion at the sentence-level after the application of TD-PSOLA, suggesting that it may be beneficial to include such a measure in existing unit selection processes. The framework was found to be capable of producing speech with reduced perceptible distortion in certain situations, although the effects seen at the sentence-level were less than those seen in the previous investigative experiments that made use of word-level stimuli. This suggeststhat the effect of the TD-PSOLA algorithm cannot always be easily anticipated due to the highly dynamic nature of speech, and that the reduction of perceptible distortion in TD-PSOLA-modified speech remains a challenge to the speech community

    A cross-cultural investigation of the vocal correlates of emotion

    Get PDF
    PhD ThesisUniversal and culture-specific properties of the vocal communication of human emotion are investigated in this balanced study focussing on encoding and decoding of Happy, Sad, Angry, Fearful and Calm by English and Japanese participants (eight female encoders for each culture, and eight female and eight male decoders for each culture). Previous methodologies and findings are compared. This investigation is novel in the design of symmetrical procedures to facilitate cross-cultural comparison of results of decoding tests and acoustic analysis; a simulation/self-induction method was used in which participants from both cultures produced, as far as possible, the same pseudo-utterances. All emotions were distinguished beyond chance irrespective of culture, except for Japanese participants’ decoding of English Fearful, which was decoded at a level borderline with chance. Angry and Sad were well-recognised, both in-group and cross-culturally and Happy was identified well in-group. Confusions between emotions tended to follow dimensional lines of arousal or valence. Acoustic analysis found significant distinctions between all emotions for each culture, except between the two low arousal emotions Sad and Calm. Evidence of ‘In-Group Advantage’ was found for English decoding of Happy, Fearful and Calm and for Japanese decoding of Happy; there is support for previous evidence of East/West cultural differences in display rules. A novel concept is suggested for the finding that Japanese decoders identified Happy, Sad and Angry more reliably from English than from Japanese expressions. Whilst duration, fundamental frequency and intensity all contributed to distinctions between emotions for English, only measures of fundamental frequency were found to significantly distinguish emotions in Japanese. Acoustic cues tended to be less salient in Japanese than in English when compared to expected cues for high and low arousal emotions. In addition, new evidence was found of cross-cultural influence of vowel quality upon emotion recognition

    Models and analysis of vocal emissions for biomedical applications: 5th International Workshop: December 13-15, 2007, Firenze, Italy

    Get PDF
    The MAVEBA Workshop proceedings, held on a biannual basis, collect the scientific papers presented both as oral and poster contributions, during the conference. The main subjects are: development of theoretical and mechanical models as an aid to the study of main phonatory dysfunctions, as well as the biomedical engineering methods for the analysis of voice signals and images, as a support to clinical diagnosis and classification of vocal pathologies. The Workshop has the sponsorship of: Ente Cassa Risparmio di Firenze, COST Action 2103, Biomedical Signal Processing and Control Journal (Elsevier Eds.), IEEE Biomedical Engineering Soc. Special Issues of International Journals have been, and will be, published, collecting selected papers from the conference
    corecore