11 research outputs found

    Deep Residual Adaptive Neural Network Based Feature Extraction for Cognitive Computing with Multimodal Sentiment Sensing and Emotion Recognition Process

    Get PDF
    For the healthcare framework, automatic recognition of patients’ emotions is considered to be a good facilitator. Feedback about the status of patients and satisfaction levels can be provided automatically to the stakeholders of the healthcare industry. Multimodal sentiment analysis of human is considered as the attractive and hot topic of research in artificial intelligence (AI) and is the much finer classification issue which differs from other classification issues. In cognitive science, as emotional processing procedure has inspired more, the abilities of both binary and multi-classification tasks are enhanced by splitting complex issues to simpler ones which can be handled more easily. This article proposes an automated audio-visual emotional recognition model for a healthcare industry. The model uses Deep Residual Adaptive Neural Network (DeepResANNet) for feature extraction where the scores are computed based on the differences between feature and class values of adjacent instances. Based on the output of feature extraction, positive and negative sub-nets are trained separately by the fusion module thereby improving accuracy. The proposed method is extensively evaluated using eNTERFACE’05, BAUM-2 and MOSI databases by comparing with three standard methods in terms of various parameters. As a result, DeepResANNet method achieves 97.9% of accuracy, 51.5% of RMSE, 42.5% of RAE and 44.9%of MAE in 78.9sec for eNTERFACE’05 dataset.  For BAUM-2 dataset, this model achieves 94.5% of accuracy, 46.9% of RMSE, 42.9%of RAE and 30.2% MAE in 78.9 sec. By utilizing MOSI dataset, this model achieves 82.9% of accuracy, 51.2% of RMSE, 40.1% of RAE and 37.6% of MAE in 69.2sec. By analysing all these three databases, eNTERFACE’05 is best in terms of accuracy achieving 97.9%. BAUM-2 is best in terms of error rate as it achieved 30.2 % of MAE and 46.9% of RMSE. Finally MOSI is best in terms of RAE and minimal response time by achieving 40.1% of RAE in 69.2 sec

    Connecting Deep Neural Networks to Physical, Perceptual, and Electrophysiological Auditory Signals

    Get PDF
    Deep neural networks have been recently shown to capture intricate information transformation of signals from the sensory profiles to semantic representations that facilitate recognition or discrimination of complex stimuli. In this vein, convolutional neural networks (CNNs) have been used very successfully in image and audio classification. Designed to imitate the hierarchical structure of the nervous system, CNNs reflect activation with increasing degrees of complexity that transform the incoming signal onto object-level representations. In this work, we employ a CNN trained for large-scale audio object classification to gain insights about the contribution of various audio representations that guide sound perception. The analysis contrasts activation of different layers of a CNN with acoustic features extracted directly from the scenes, perceptual salience obtained from behavioral responses of human listeners, as well as neural oscillations recorded by electroencephalography (EEG) in response to the same natural scenes. All three measures are tightly linked quantities believed to guide percepts of salience and object formation when listening to complex scenes. The results paint a picture of the intricate interplay between low-level and object-level representations in guiding auditory salience that is very much dependent on context and sound category

    Hybrid context enriched deep learning model for fine-grained sentiment analysis in textual and visual semiotic modality social data

    Get PDF
    Detecting sentiments in natural language is tricky even for humans, making its automated detection more complicated. This research proffers a hybrid deep learning model for fine-grained sentiment prediction in real-time multimodal data. It reinforces the strengths of deep learning nets in combination to machine learning to deal with two specific semiotic systems, namely the textual (written text) and visual (still images) and their combination within the online content using decision level multimodal fusion. The proposed contextual ConvNet-SVMBoVW model, has four modules, namely, the discretization, text analytics, image analytics, and decision module. The input to the model is multimodal text, m ε {text, image, info-graphic}. The discretization module uses Google Lens to separate the text from the image, which is then processed as discrete entities and sent to the respective text analytics and image analytics modules. Text analytics module determines the sentiment using a hybrid of a convolution neural network (ConvNet) enriched with the contextual semantics of SentiCircle. An aggregation scheme is introduced to compute the hybrid polarity. A support vector machine (SVM) classifier trained using bag-of-visual-words (BoVW) for predicting the visual content sentiment. A Boolean decision module with a logical OR operation is augmented to the architecture which validates and categorizes the output on the basis of five fine-grained sentiment categories (truth values), namely ‘highly positive,’ ‘positive,’ ‘neutral,’ ‘negative’ and ‘highly negative.’ The accuracy achieved by the proposed model is nearly 91% which is an improvement over the accuracy obtained by the text and image modules individually

    Ensemble application of convolutional neural networks and multiple kernel learning for multimodal sentiment analysis

    No full text
    The advent of the Social Web has enabled anyone with an Internet connection to easily create and share their ideas, opinions and content with millions of other people around the world. In pace with a global deluge of videos from billions of computers, smartphones, tablets, university projectors and security cameras, the amount of multimodal content on the Web has been growing exponentially, and with that comes the need for decoding such information into useful knowledge. In this paper, a multimodal affective data analysis framework is proposed to extract user opinion and emotions from video content. In particular, multiple kernel learning is used to combine visual, audio and textual modalities. The proposed framework outperforms the state-of-the-art model in multimodal sentiment analysis research with a margin of 10–13% and 3–5% accuracy on polarity detection and emotion recognition, respectively. The paper also proposes an extensive study on decision-level fusion

    Reconocimiento multimodal de emociones mediante el uso de redes neuronales artificiales

    Full text link
    [ES] En este trabajo se desarrolla un reconocedor multimodal de emociones humanas mediante el uso de redes neuronales artificiales. Para ello, se diseñan y entrenan tres modelos capaces de reconocer emociones a partir de imágenes de la cara, audios y texto, por separado. Entonces, se combinan dichos sistemas para crear uno multimodal, más robusto y preciso que sus partes. Sin embargo, hay que tener en cuenta que la expresión de emociones es, en cierto grado, dependiente de la cultura y el idioma, por lo que se propone especializar nuestro sistema en el idioma español. Además, para completar este proyecto, se desarrolla una aplicación web en Django capaz de incorporar dicho sistema multimodal. Está aplicación es capaz de detectar emociones en directo a partir de las imágenes obtenidas de la webcam y del audio del micrófono. Asimismo, la aplicación permite grabar y descargar una sesión anotada de detección en directo. Por último, cabe destacar que la utilidad de este proyecto es muy variada, desde sesiones de terapia psicológica, hasta recomendaciones musicales personalizadas, pasando por la educación o el márquetin.[EN] In this project we develop a multimodal recognizer of human emotions using artificial neural networks. In order to accomplish it, we design and train three models, which are able to recognize emotions from facial images, audios and text, separately. Then we combine those systems to create one which is multimodal and more robust and precise than its parts. Despite this, we have to take into account that the expression of emotions is, to some extent, dependent on culture and language, and that is why we propose to specialize our system in the Spanish language. Moreover, to complete this Project, a web application in Django is developed, which is able to incorporate this multimodal system. This app can detect emotions either live from the images obtained from the webcam and the audio from the microphone, or from a video uploaded by the user, which is analyzed and returned to the user annotated via download. Likewise, the app allows the user to record and download an annotated session of live detection. Lastly, it should be noted that the applicability of this project is very diverse, from sessions of phycological therapy, to personalized musical recommendations, through education or marketing.Fuentes López, JM. (2019). Reconocimiento multimodal de emociones mediante el uso de redes neuronales artificiales. http://hdl.handle.net/10251/129137TFG

    Dauruxu : detección de emociones de personas y sus actividades para el apoyo en la evaluación de factores de riesgo psicosocial

    Get PDF
    La evaluación de riesgos psicosociales ha desempeñado un papel dominante para garantizar el bienestar y la salud de las personas. No obstante, mecanismos como entrevistas y cuestionarios son susceptibles de obtener resultados sesgados debido a la falta de datos que no se pueden adquirir durante las evaluaciones. Este trabajo propone una arquitectura para identificar actividades y emociones implícitas en los cuestionarios actuales y que tienen el potencial de ser detectadas por cámaras. Mediante visión por computadora, se extraen características de los fotogramas de video los cuales son empleados como predictores para tareas de clasificación. La cuantificación de indicadores basada en la detección de actividades y emociones brindará datos adicionales para respaldar las evaluaciones de riesgo psicosocial.Psychosocial risk assessment has played a dominant role in ensuring the well-being and health of people. However, mechanisms such as interviews and questionnaires are susceptible to obtaining biased results due to the lack of data that cannot be acquired during evaluations. This work proposes an architecture to identify activities and emotions implicit in current questionnaires and that have the potential to be detected by cameras. Through computer vision, features are extracted from the video frames which are used as predictors for classification tasks. The quantification of indicators based on the detection of activities and emotions will provide additional data to support psychosocial risk assessments.Magíster en Ingeniería de Sistemas y ComputaciónMagíster en Analítica para la Inteligencia de NegociosMaestrí

    Towards the Use of Interactive Simulation for Effective e-Learning in University Classroom Environment

    Get PDF
    In this PhD thesis, the utilisation of interactive simulation in a higher education e-learning classroom environment was explored and its effectiveness was experimentally evaluated by engaging university students in a classroom setting. Two case studies were carried out for the experimental evaluation of the proposed novel interactive simulation e-learning tool. In the first case study, the use of interactive agent-based simulation was demonstrated in teaching complex adaptive system concepts in the area of ecology to university students and its effectiveness was measured in a classroom environment. In a lab intervention using a novel interactive agent-based simulation (built in NetLogo). For the purpose of teaching complex adaptive systems such as the concept of spatially-explicit predator prey interaction to undergraduate and postgraduate students in the University of Stirling. The effectiveness of using the interactive simulation was investigated by using the NetLogo software and compared with non-interactive simulation built using R programming language. The experimental evaluation was carried out using a total of 38 students. Results of this case study demonstrates that the students found interactive agent-based simulation to be more engaging, effective and user friendly as compare to the non-interactive simulation. In the second case study, a novel interactive simulation game was developed (in NetLogo) and its effectiveness in teaching and learning of complex concepts in the field of marine ecology was demonstrated. This case study makes a twofold contribution. Firstly, the presentation of a novel interactive simulation game, developed specifically for use in undergraduate and postgraduate courses in the area of marine ecology. This novel interactive simulation game is designed to help learners to explore a mathematical model of fishery population growth and understand the principles for sustainable fisheries. Secondly, the comparison of two different methods of using the interactive simulation game within the classroom was investigated: learning from active exploration of the interactive simulation game compared with learning from an expert demonstration of the interactive simulation game. The case study demonstrated the effectiveness of learning from passive viewing of an expert demonstration of the interactive simulation game over learning from active exploration of the interactive simulation game without expert guidance, for teaching complex concepts sustainable fishery management. A mixed methods study design was used, using both quantitative and qualitative methods to compare the learning effectiveness of the two approaches, and the students’ preferences. The investigation was carried out by running interventions with a mixture of undergraduate and postgraduate students from the University of Stirling in a classroom environment. A total of 74 participants were recruited from undergraduate and postgraduate level for both case studies. This thesis demonstrated through two case studies effectiveness of the proposed novel interactive simulation in university e-learning classroom environment
    corecore