98 research outputs found

    The nature of novel word representations : computer mouse tracking shows evidence of immediate lexical engagement effects in adults

    Get PDF
    Simplistically, words are the mental bundling of a form and a referent. However, words also dynamically interact with one another in the cognitive system, and have other so-called ‘lexical properties’. For example, the word ‘dog’ will cue recognition of ‘dock’ by shared phonology, and ‘cat’, by shared semantics. Researchers have suggested that such lexical engagement between words emerges slowly, and with sleep. However, newer research suggests that this is not the case. Herein, seven experiments investigate this claim.Fast mapping (FM), a developmental word learning procedure, has been reported to promote lexical engagement before sleep in adults. Experiment 1 altered the task parameters and failed to replicate this finding. Experiment 2 attempted a methodological replication – again, no effect was found. It is concluded that the effect reported is not easily replicable.Other findings of pre-sleep lexical engagement were then considered using a novel methodology – computer mouse tracking. Experiments 3 and 4 developed optimal mouse tracking procedures and protocols for studying lexical engagement. Experiment 5 then applied this methodology to novel word learning, and found clear evidence of immediate lexical engagement. Experiment 6 provided evidence that participants were binding the word form to the referent in these pre-sleep lexical representations. Experiment 7 sought to strengthen this finding, but has been postponed due to the CoViD-19 pandemic.The results are discussed in the context of the distributed cohort model of speech perception, a complementary learning systems account of word learning, and differing abstractionist and episodic accounts of the lexicon. It is concluded that the results may be most clearly explained by an episodic lexicon, although there is a need to develop hybrid models, factoring in consolidation and abstraction for the efficient storage of representations in the long term

    Group-Level Emotion Recognition Using a Unimodal Privacy-Safe Non-Individual Approach

    Get PDF
    This article presents our unimodal privacy-safe and non-individual proposal for the audio-video group emotion recognition subtask at the Emotion Recognition in the Wild (EmotiW) Challenge 2020 1. This sub challenge aims to classify in the wild videos into three categories: Positive, Neutral and Negative. Recent deep learning models have shown tremendous advances in analyzing interactions between people, predicting human behavior and affective evaluation. Nonetheless, their performance comes from individual-based analysis, which means summing up and averaging scores from individual detections, which inevitably leads to some privacy issues. In this research, we investigated a frugal approach towards a model able to capture the global moods from the whole image without using face or pose detection, or any individual-based feature as input. The proposed methodology mixes state-of-the-art and dedicated synthetic corpora as training sources. With an in-depth exploration of neural network architectures for group-level emotion recognition, we built a VGG-based model achieving 59.13% accuracy on the VGAF test set (eleventh place of the challenge). Given that the analysis is unimodal based only on global features and that the performance is evaluated on a real-world dataset, these results are promising and let us envision extending this model to multimodality for classroom ambiance evaluation, our final target application

    Tandem approach for information fusion in audio visual speech recognition

    Get PDF
    Speech is the most frequently preferred medium for humans to interact with their environment making it an ideal instrument for human-computer interfaces. However, for the speech recognition systems to be more prevalent in real life applications, high recognition accuracy together with speaker independency and robustness to hostile conditions is necessary. One of the main preoccupation for speech recognition systems is acoustic noise. Audio Visual Speech Recognition systems intend to overcome the noise problem utilizing visual speech information generally extracted from the face or in particular the lip region. Visual speech information is known to be a complementary source for speech perception and is not impacted by acoustic noise. This advantage brings in two additional issues into the task which are visual feature extraction and information fusion. There is extensive research on both issues but an admissable level of success has not been reached yet. This work concentrates on the issue of information fusion and proposes a novel methodology. The aim of the proposed technique is to deploy a preliminary decision stage at frame level as an initial stage and feed the Hidden Markov Model with the output posterior probabilities as in tandem HMM approach. First, classification is performed for each modality separately. Sequentially, the individual classifiers of each modality are combined to obtain posterior probability vectors corresponding to each speech frame. The purpose of using a preliminary stage is to integrate acoustic and visual data for maximum class separability. Hidden Markov Models are employed as the second stage of modelling because of their ability to handle temporal evolutions of data. The proposed approach is investigated in a speaker independent scenario for digit recognition with the existence of diverse levels of car noise. The method is compared with a principal information fusion framework in audio visual speech recognition which is Multiple Stream Hidden Markov Models (MSHMM). The results on M2VTS database show that the novel method achieves resembling performance with less processing time as compared to MSHMM

    Speech Recognition

    Get PDF
    Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes

    Group-Level Emotion Recognition Using a Unimodal Privacy-Safe Non-Individual Approach

    Get PDF
    International audienceThis article presents our unimodal privacy-safe and non-individual proposal for the audio-video group emotion recognition subtask at the Emotion Recognition in the Wild (EmotiW) Challenge 2020 1. This sub challenge aims to classify in the wild videos into three categories: Positive, Neutral and Negative. Recent deep learning models have shown tremendous advances in analyzing interactions between people, predicting human behavior and affective evaluation. Nonetheless, their performance comes from individual-based analysis, which means summing up and averaging scores from individual detections, which inevitably leads to some privacy issues. In this research, we investigated a frugal approach towards a model able to capture the global moods from the whole image without using face or pose detection, or any individual-based feature as input. The proposed methodology mixes state-of-the-art and dedicated synthetic corpora as training sources. With an in-depth exploration of neural network architectures for group-level emotion recognition, we built a VGG-based model achieving 59.13% accuracy on the VGAF test set (eleventh place of the challenge). Given that the analysis is unimodal based only on global features and that the performance is evaluated on a real-world dataset, these results are promising and let us envision extending this model to multimodality for classroom ambiance evaluation, our final target application

    A review of affective computing: From unimodal analysis to multimodal fusion

    Get PDF
    Affective computing is an emerging interdisciplinary research field bringing together researchers and practitioners from various fields, ranging from artificial intelligence, natural language processing, to cognitive and social sciences. With the proliferation of videos posted online (e.g., on YouTube, Facebook, Twitter) for product reviews, movie reviews, political views, and more, affective computing research has increasingly evolved from conventional unimodal analysis to more complex forms of multimodal analysis. This is the primary motivation behind our first of its kind, comprehensive literature review of the diverse field of affective computing. Furthermore, existing literature surveys lack a detailed discussion of state of the art in multimodal affect analysis frameworks, which this review aims to address. Multimodality is defined by the presence of more than one modality or channel, e.g., visual, audio, text, gestures, and eye gage. In this paper, we focus mainly on the use of audio, visual and text information for multimodal affect analysis, since around 90% of the relevant literature appears to cover these three modalities. Following an overview of different techniques for unimodal affect analysis, we outline existing methods for fusing information from different modalities. As part of this review, we carry out an extensive study of different categories of state-of-the-art fusion techniques, followed by a critical analysis of potential performance improvements with multimodal analysis compared to unimodal analysis. A comprehensive overview of these two complementary fields aims to form the building blocks for readers, to better understand this challenging and exciting research field

    Towards music perception by redundancy reduction and unsupervised learning in probabilistic models

    Get PDF
    PhDThe study of music perception lies at the intersection of several disciplines: perceptual psychology and cognitive science, musicology, psychoacoustics, and acoustical signal processing amongst others. Developments in perceptual theory over the last fifty years have emphasised an approach based on Shannon’s information theory and its basis in probabilistic systems, and in particular, the idea that perceptual systems in animals develop through a process of unsupervised learning in response to natural sensory stimulation, whereby the emerging computational structures are well adapted to the statistical structure of natural scenes. In turn, these ideas are being applied to problems in music perception. This thesis is an investigation of the principle of redundancy reduction through unsupervised learning, as applied to representations of sound and music. In the first part, previous work is reviewed, drawing on literature from some of the fields mentioned above, and an argument presented in support of the idea that perception in general and music perception in particular can indeed be accommodated within a framework of unsupervised learning in probabilistic models. In the second part, two related methods are applied to two different low-level representations. Firstly, linear redundancy reduction (Independent Component Analysis) is applied to acoustic waveforms of speech and music. Secondly, the related method of sparse coding is applied to a spectral representation of polyphonic music, which proves to be enough both to recognise that the individual notes are the important structural elements, and to recover a rough transcription of the music. Finally, the concepts of distance and similarity are considered, drawing in ideas about noise, phase invariance, and topological maps. Some ecologically and information theoretically motivated distance measures are suggested, and put in to practice in a novel method, using multidimensional scaling (MDS), for visualising geometrically the dependency structure in a distributed representation.Engineering and Physical Science Research Counci
    • …
    corecore