41 research outputs found

    Recognition of sign language subwords based on boosted hidden Markov models

    Full text link
    Sign language recognition (SLR) plays an important role in human-computer interaction (HCI), especially for the convenient communication between deaf and hearing society. How to enhance the traditional hidden Markov models (HMM) based SLR is an important issue in the SLR community. And how to refine the boundaries of the classifiers to effectively characterize the property of spread-out of the training samples is another significant issue. In this paper, a new classification framework applying adaptive boosting (AdaBoost) strategy to continuous HMM (CHMM) training procedure at the subwords classification level for SLR is presented. The ensemble of multiple composite CHMMs for each subword trained in boosting iterations tends to concentrate more on the hard-to-classify samples so as to generate more complex decision boundary than that of the single HMM classifier. Experimental results on the vocabulary of frequently used Chinese sign language (CSL) subwords show that the proposed boosted CHMM outperforms the conventional CHMM for SLR

    Geometrical-based lip-reading using template probabilistic multi-dimension dynamic time warping

    Get PDF
    By identifying lip movements and characterizing their associations with speech sounds, the performance of speech recognition systems can be improved, particularly when operating in noisy environments. In this paper, we present a geometrical-based automatic lip reading system that extracts the lip region from images using conventional techniques, but the contour itself is extracted using a novel application of a combination of border following and convex hull approaches. Classification is carried out using an enhanced dynamic time warping technique that has the ability to operate in multiple dimensions and a template probability technique that is able to compensate for differences in the way words are uttered in the training set. The performance of the new system has been assessed in recognition of the English digits 0 to 9 as available in the CUAVE database. The experimental results obtained from the new approach compared favorably with those of existing lip reading approaches, achieving a word recognition accuracy of up to 71% with the visual information being obtained from estimates of lip height, width and their ratio

    Learning Multi-Boosted HMMs for Lip-Password Based Speaker Verification

    Full text link

    How adults and children interpret disjunction under negation in Dutch, French, Hungarian and Italian:A cross-linguistic comparison

    Get PDF
    In English, a sentence like “The cat didn’t eat the carrot or the pepper” typically receives a “neither” interpretation; in Japanese it receives a “not this or not that” interpretation. These two interpretations are in a subset/superset relation, such that the “neither” interpretation (strong reading) asymmetrically entails the “not this or not that” interpretation (weak reading). This asymmetrical entailment raises a learnability problem. According to the Semantic Subset Principle, all language learners, regardless of the language they are exposed to, start by assigning the strong reading, since this interpretation makes such sentences true in the narrowest range of circumstances.). If the “neither” interpretation is children’s initial hypothesis, then children acquiring a superset language will be able to revise their initial hypothesis on the basis of positive evidence. The aim of the present study is to test an additional account proposed by Pagliarini, Crain, Guasti (2018) as a possible explanation for the earlier convergence to the adult grammar by Italian children. The hypothesis tested here is that the presence of a lexical form such as recursive né that unambiguously conveys a “neither” meaning, would lead children to converge earlier to the adult grammar due to a blocking e!ect of the recursive né form in the inventory of negated disjunction forms in a language. We compared data from Italian (taken from Pagliarini, Crain, Guasti, 2018), French, Hungarian and Dutch. Dutch was tested as baseline language. French and Hungarian have – similarly to Italian – a lexical form that unambiguously expresses the “neither” interpretation (ni ni and sem sem, respectively). Our results did not support this hypothesis however, and are discussed in the light of language-specifc particularities of the syntax and semantics of negation

    A novel lip geometry approach for audio-visual speech recognition

    Get PDF
    By identifying lip movements and characterizing their associations with speech sounds, the performance of speech recognition systems can be improved, particularly when operating in noisy environments. Various method have been studied by research group around the world to incorporate lip movements into speech recognition in recent years, however exactly how best to incorporate the additional visual information is still not known. This study aims to extend the knowledge of relationships between visual and speech information specifically using lip geometry information due to its robustness to head rotation and the fewer number of features required to represent movement. A new method has been developed to extract lip geometry information, to perform classification and to integrate visual and speech modalities. This thesis makes several contributions. First, this work presents a new method to extract lip geometry features using the combination of a skin colour filter, a border following algorithm and a convex hull approach. The proposed method was found to improve lip shape extraction performance compared to existing approaches. Lip geometry features including height, width, ratio, area, perimeter and various combinations of these features were evaluated to determine which performs best when representing speech in the visual domain. Second, a novel template matching technique able to adapt dynamic differences in the way words are uttered by speakers has been developed, which determines the best fit of an unseen feature signal to those stored in a database template. Third, following on evaluation of integration strategies, a novel method has been developed based on alternative decision fusion strategy, in which the outcome from the visual and speech modality is chosen by measuring the quality of audio based on kurtosis and skewness analysis and driven by white noise confusion. Finally, the performance of the new methods introduced in this work are evaluated using the CUAVE and LUNA-V data corpora under a range of different signal to noise ratio conditions using the NOISEX-92 dataset

    Stress and emotion recognition in natural speech in the work and family environments

    Get PDF
    The speech stress and emotion recognition and classification technology has a potential to provide significant benefits to the national and international industry and society in general. The accuracy of an automatic emotion speech and emotion recognition relays heavily on the discrimination power of the characteristic features. This work introduced and examined a number of new linear and nonlinear feature extraction methods for an automatic detection of stress and emotion in speech. The proposed linear feature extraction methods included features derived from the speech spectrograms (SS-CB/BARK/ERB-AE, SS-AF-CB/BARK/ERB-AE, SS-LGF-OFS, SS-ALGF-OFS, SS-SP-ALGF-OFS and SS-sigma-pi), wavelet packets (WP-ALGF-OFS) and the empirical mode decomposition (EMD-AER). The proposed nonlinear feature extraction methods were based on the results of recent laryngological studies and nonlinear modelling of the phonation process. The proposed nonlinear features included the area under the TEO autocorrelation envelope based on different spectral decompositions (TEO-DWT, TEO-WP, TEO-PWP-S and TEO-PWP-G), as well as features representing spectral energy distribution of speech (AUSEES) and glottal waveform (AUSEEG). The proposed features were compared with features based on the classical linear model of speech production including F0, formants, MFCC and glottal time/frequency parameters. Two classifiers GMM and KNN were tested for consistency. The experiments used speech under actual stress from the SUSAS database (7 speakers; 3 female and 4 male) and speech with five naturally expressed emotions (neutral, anger, anxious, dysphoric and happy) from the ORI corpora (71 speakers; 27 female and 44 male). The nonlinear features clearly outperformed all the linear features. The classification results demonstrated consistency with the nonlinear model of the phonation process indicating that the harmonic structure and the spectral distribution of the glottal energy provide the most important cues for stress and emotion recognition in speech. The study also investigated if the automatic emotion recognition can determine differences in emotion expression between parents of depressed adolescents and parents of non-depressed adolescents. It was also investigated if there are differences in emotion expression between mothers and fathers in general. The experiment results indicated that parents of depressed adolescent produce stronger more exaggerated expressions of affect than parents of non-depressed children. And females in general provide easier to discriminate (more exaggerated) expressions of affect than males

    Teacher agency in synchronous one-to-one Chinese online language teaching : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Applied Linguistics at Massey University, Manawatū, New Zealand

    Get PDF
    This study explores the teacher agency of four Chinese language teachers who teach in one-to-one videoconferencing settings. Since these teachers only had limited teaching experience in such a context, four preparatory workshops were designed for the teacher participants before they began teaching. The study seeks to answer three questions: 1) What kinds of competencies did teachers identify as required in their teaching via one-to-one videoconferencing? 2) What kinds of affordances and constraints did teachers perceive in teaching, and how was their agency influenced by these factors? 3) What was the main value of the preparatory workshops from the teachers’ perspective? The study is informed by ecological perspectives and employs a qualitative longitudinal case study approach. The data collected through teaching recordings, stimulated recall interviews, semi-structured interviews and group discussions formed the main data set. The data collected through a teacher questionnaire, written reflection sheets, opinion frames, and text chat on a social media platform formed the supporting data set. The main part of the study, spanning about eight months, comprised three stages. At the first stage, there were four teacher preparatory workshops, each including a lecture and a group discussion. At the second stage, each teacher conducted a series of Chinese learning sessions with a single learner, which were recorded and analysed. At the third stage, semi-structured interviews with individual teachers were conducted. The findings suggest that the teachers identified four important competencies required for online teaching: pedagogical competency, multimedia competency, social-affective competency and the competency of being reflective and reflexive. Different beliefs about teacher roles, perceived social hierarchy, and their relationships with peer teachers and the learners were the factors that enabled or constrained teachers’ actions. The perceived value of the teacher preparatory workshops was in providing opportunities for the teachers to bridge the gap between theories and teaching practice and to explore the pedagogical possibilities. They collectively formed an idealised notion of online teaching as a result of their discussions and this notion influenced their identity and teaching practice. The study concludes with implications for research methodology and a theoretical frame, shedding light on how the factors from the outer world, and teachers’ experience and aspirations could impact the enactment of agency. It is hoped that this study will be valuable for future online language teacher training and research

    Talking at cross-purposes?: the effect of gender on New Zealand primary schoolchildren's interaction strategies in pair discussions

    No full text
    This thesis explores one aspect of the relationship between sex and language. Twenty pairs of eleven and twelve year old children were tape-recorded during two discussion tasks. Quantitative and qualitative analyses of the data were carried out to investigate to what extent previously reported sex differences in interactional style could be observed in this group of New Zealand school children. Particular attention was paid to the relationship between such differences and the way in which children learn through talk in peer discussion. Two general hypotheses were tested: (i) that girls would tend to use a more collaborative, polite, and affiliative style of interaction, while boys would tend to use a more competitive, task-oriented style, paying less attention to the processes of interaction, and (ii) that the style of interaction associated with females would be more conducive to effective discussion from a pedagogical point of view. There were no significant sex differences in the use of interruptive forms and overlaps. However, the girls produced more talk relative to the boys in the mixed-sex context, supportive minimal responses were distributed differently, suggesting different norms as to their use and function, and there was a marked sex difference in the use of strategies for expressing disagreement: the boys were over four times more likely than the girls to produce bald, unmodified disagreements (approximately half of their total disagreement responses), while over 90% of the girls' disagreement responses were qualified in some way. These differences in style were linked to the results of the qualitative analysis of the data which provided clear evidence that the sex composition of the dyads was an important variable in determining the overall quality of discussion, with the girls more likely to facilitate effective, open-ended, elaborated discussion than the boys

    Affective Computing

    Get PDF
    This book provides an overview of state of the art research in Affective Computing. It presents new ideas, original results and practical experiences in this increasingly important research field. The book consists of 23 chapters categorized into four sections. Since one of the most important means of human communication is facial expression, the first section of this book (Chapters 1 to 7) presents a research on synthesis and recognition of facial expressions. Given that we not only use the face but also body movements to express ourselves, in the second section (Chapters 8 to 11) we present a research on perception and generation of emotional expressions by using full-body motions. The third section of the book (Chapters 12 to 16) presents computational models on emotion, as well as findings from neuroscience research. In the last section of the book (Chapters 17 to 22) we present applications related to affective computing
    corecore