847 research outputs found

    automaatne kõnepõhine emotsioonituvastus

    Get PDF
    The main objectives of affective computing is the study and creation of computer systems which can detect human affects. For speech-based emotion recognition, universal features offering the best performance for all languages have not yet been found. In this thesis, a speech-based emotion recognition system using a novel set of features is created. Support vector machines are used as classifiers in the offline system on Surrey Audio-Visual Expressed Emotion database, Berlin Database of Emotional Speech, Polish Emotional Speech database and Serbian emotional speech database. Average emotion recognition rates of 80.21%, 88.6%, 75.42% and 93.41% are achieved, respectively, with a total number of 87 features. The online system, which uses Random Forests as it’s classifier, consists of two models trained on reduced versions of the first and second database, with the first model trained on only male samples and the second trained on both. The main purpose of the online system was to test the features’ usability in real-life scenarios and to explore the effects of gender in speech-based emotion recognition. To test the online system, two female and two male non-native English speakers recorded emotionally spoken sentences and used these as inputs to the trained model. Averaging over all emotions and speakers per model, it is seen that the features offer better performance than random guessing, achieving 28% emotion recognition in both models. The average recognition rate for female speakers was 19% in the first and 29% in the second model. For male speakers, the rates were 36% and 28%, respectively. These results show how having more samples for training for a particular gender affects emotion recognition rates in a trained model

    EEG Analysis Method to Detect Unspoken Answers to Questions Using MSNNs

    Get PDF
    Brain–computer interfaces (BCI) facilitate communication between the human brain and computational systems, additionally offering mechanisms for environmental control to enhance human life. The current study focused on the application of BCI for communication support, especially in detecting unspoken answers to questions. Utilizing a multistage neural network (MSNN) replete with convolutional and pooling layers, the proposed method comprises a threefold approach: electroencephalogram (EEG) measurements, EEG feature extraction, and answer classification. The EEG signals of the participants are captured as they mentally respond with “yes” or “no” to the posed questions. Feature extraction was achieved through an MSNN composed of three distinct convolutional neural network models. The first model discriminates between the EEG signals with and without discernible noise artifacts, whereas the subsequent two models are designated for feature extraction from EEG signals with or without such noise artifacts. Furthermore, a support vector machine is employed to classify the answers to the questions. The proposed method was validated via experiments using authentic EEG data. The mean and standard deviation values for sensitivity and precision of the proposed method were 99.6% and 0.2%, respectively. These findings demonstrate the viability of attaining high accuracy in a BCI by preliminarily segregating the EEG signals based on the presence or absence of artifact noise and underscore the stability of such classification. Thus, the proposed method manifests prospective advantages of separating EEG signals characterized by noise artifacts for enhanced BCI performance

    Using minimal number of electrodes for emotion detection using noisy EEG data

    Get PDF
    Emotion is an important aspect in the interaction between humans. It is fundamental to human experience and rational decision-making. There is a great interest for detecting emotions automatically. A number of techniques have been employed for this purpose using channels such as voice and facial expressions. However, these channels are not very accurate because they can be affected by users\u27 intentions. Other techniques use physiological signals along with electroencephalography (EEG) for emotion detection. However, these approaches are not very practical for real time applications because they ask the participants to reduce any motion and facial muscle movement, reject EEG data contaminated with artifacts and rely on large number of electrodes. In this thesis, we propose an approach that analyzes highly contaminated EEG data produced from a new emotion elicitation technique. We also use a feature selection mechanism to extract features that are relevant to the emotion detection task based on neuroscience findings. We reached an average accuracy of 51% for joy emotion, 53% for anger, 58% for fear and 61% for sadness. We are also, applying our approach on smaller number of electrodes that ranges from 4 up to 25 electrodes and we reached an average classification accuracy of 33% for joy emotion, 38% for anger, 33% for fear and 37.5% for sadness using 4 or 6 electrodes only

    Speech data analysis for semantic indexing of video of simulated medical crises.

    Get PDF
    The Simulation for Pediatric Assessment, Resuscitation, and Communication (SPARC) group within the Department of Pediatrics at the University of Louisville, was established to enhance the care of children by using simulation based educational methodologies to improve patient safety and strengthen clinician-patient interactions. After each simulation session, the physician must manually review and annotate the recordings and then debrief the trainees. The physician responsible for the simulation has recorded 100s of videos, and is seeking solutions that can automate the process. This dissertation introduces our developed system for efficient segmentation and semantic indexing of videos of medical simulations using machine learning methods. It provides the physician with automated tools to review important sections of the simulation by identifying who spoke, when and what was his/her emotion. Only audio information is extracted and analyzed because the quality of the image recording is low and the visual environment is static for most parts. Our proposed system includes four main components: preprocessing, speaker segmentation, speaker identification, and emotion recognition. The preprocessing consists of first extracting the audio component from the video recording. Then, extracting various low-level audio features to detect and remove silence segments. We investigate and compare two different approaches for this task. The first one is threshold-based and the second one is classification-based. The second main component of the proposed system consists of detecting speaker changing points for the purpose of segmenting the audio stream. We propose two fusion methods for this task. The speaker identification and emotion recognition components of our system are designed to provide users the capability to browse the video and retrieve shots that identify ”who spoke, when, and the speaker’s emotion” for further analysis. For this component, we propose two feature representation methods that map audio segments of arbitary length to a feature vector with fixed dimensions. The first one is based on soft bag-of-word (BoW) feature representations. In particular, we define three types of BoW that are based on crisp, fuzzy, and possibilistic voting. The second feature representation is a generalization of the BoW and is based on Fisher Vector (FV). FV uses the Fisher Kernel principle and combines the benefits of generative and discriminative approaches. The proposed feature representations are used within two learning frameworks. The first one is supervised learning and assumes that a large collection of labeled training data is available. Within this framework, we use standard classifiers including K-nearest neighbor (K-NN), support vector machine (SVM), and Naive Bayes. The second framework is based on semi-supervised learning where only a limited amount of labeled training samples are available. We use an approach that is based on label propagation. Our proposed algorithms were evaluated using 15 medical simulation sessions. The results were analyzed and compared to those obtained using state-of-the-art algorithms. We show that our proposed speech segmentation fusion algorithms and feature mappings outperform existing methods. We also integrated all proposed algorithms and developed a GUI prototype system for subjective evaluation. This prototype processes medical simulation video and provides the user with a visual summary of the different speech segments. It also allows the user to browse videos and retrieve scenes that provide answers to semantic queries such as: who spoke and when; who interrupted who? and what was the emotion of the speaker? The GUI prototype can also provide summary statistics of each simulation video. Examples include: for how long did each person spoke? What is the longest uninterrupted speech segment? Is there an unusual large number of pauses within the speech segment of a given speaker

    Speech Recognition

    Get PDF
    Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes

    Pattern Recognition

    Get PDF
    Pattern recognition is a very wide research field. It involves factors as diverse as sensors, feature extraction, pattern classification, decision fusion, applications and others. The signals processed are commonly one, two or three dimensional, the processing is done in real- time or takes hours and days, some systems look for one narrow object class, others search huge databases for entries with at least a small amount of similarity. No single person can claim expertise across the whole field, which develops rapidly, updates its paradigms and comprehends several philosophical approaches. This book reflects this diversity by presenting a selection of recent developments within the area of pattern recognition and related fields. It covers theoretical advances in classification and feature extraction as well as application-oriented works. Authors of these 25 works present and advocate recent achievements of their research related to the field of pattern recognition

    A Multimodal Deep Learning-Based Fault Detection Model for a Plastic Injection Molding Process

    Get PDF
    The authors of this work propose a deep learning-based fault detection model that can be implemented in the field of plastic injection molding. Compared to conventional approaches to fault detection in this domain, recent deep learning approaches prove useful for on-site problems involving complex underlying dynamics with a large number of variables. In addition, the advent of advanced sensors that generate data types in multiple modalities prompts the need for multimodal learning with deep neural networks to detect faults. This process is able to facilitate information from various modalities in an end-to-end learning fashion. The proposed deep learning-based approach opts for an early fusion scheme, in which the low-level feature representations of modalities are combined. A case study involving real-world data, obtained from a car parts company and related to a car window side molding process, validates that the proposed model outperforms late fusion methods and conventional models in solving the problem
    corecore