363 research outputs found

    A COMPUTATION METHOD/FRAMEWORK FOR HIGH LEVEL VIDEO CONTENT ANALYSIS AND SEGMENTATION USING AFFECTIVE LEVEL INFORMATION

    No full text
    VIDEO segmentation facilitates e±cient video indexing and navigation in large digital video archives. It is an important process in a content-based video indexing and retrieval (CBVIR) system. Many automated solutions performed seg- mentation by utilizing information about the \facts" of the video. These \facts" come in the form of labels that describe the objects which are captured by the cam- era. This type of solutions was able to achieve good and consistent results for some video genres such as news programs and informational presentations. The content format of this type of videos is generally quite standard, and automated solutions were designed to follow these format rules. For example in [1], the presence of news anchor persons was used as a cue to determine the start and end of a meaningful news segment. The same cannot be said for video genres such as movies and feature films. This is because makers of this type of videos utilized different filming techniques to design their videos in order to elicit certain affective response from their targeted audience. Humans usually perform manual video segmentation by trying to relate changes in time and locale to discontinuities in meaning [2]. As a result, viewers usually have doubts about the boundary locations of a meaningful video segment due to their different affective responses. This thesis presents an entirely new view to the problem of high level video segmentation. We developed a novel probabilistic method for affective level video content analysis and segmentation. Our method had two stages. In the first stage, a®ective content labels were assigned to video shots by means of a dynamic bayesian 0. Abstract 3 network (DBN). A novel hierarchical-coupled dynamic bayesian network (HCDBN) topology was proposed for this stage. The topology was based on the pleasure- arousal-dominance (P-A-D) model of a®ect representation [3]. In principle, this model can represent a large number of emotions. In the second stage, the visual, audio and a®ective information of the video was used to compute a statistical feature vector to represent the content of each shot. Affective level video segmentation was achieved by applying spectral clustering to the feature vectors. We evaluated the first stage of our proposal by comparing its emotion detec- tion ability with all the existing works which are related to the field of a®ective video content analysis. To evaluate the second stage, we used the time adaptive clustering (TAC) algorithm as our performance benchmark. The TAC algorithm was the best high level video segmentation method [2]. However, it is a very computationally intensive algorithm. To accelerate its computation speed, we developed a modified TAC (modTAC) algorithm which was designed to be mapped easily onto a field programmable gate array (FPGA) device. Both the TAC and modTAC algorithms were used as performance benchmarks for our proposed method. Since affective video content is a perceptual concept, the segmentation per- formance and human agreement rates were used as our evaluation criteria. To obtain our ground truth data and viewer agreement rates, a pilot panel study which was based on the work of Gross et al. [4] was conducted. Experiment results will show the feasibility of our proposed method. For the first stage of our proposal, our experiment results will show that an average improvement of as high as 38% was achieved over previous works. As for the second stage, an improvement of as high as 37% was achieved over the TAC algorithm

    Design and application of reconfigurable circuits and systems

    No full text
    Open Acces

    Finding emotional-laden resources on the World Wide Web

    Get PDF
    Some content in multimedia resources can depict or evoke certain emotions in users. The aim of Emotional Information Retrieval (EmIR) and of our research is to identify knowledge about emotional-laden documents and to use these findings in a new kind of World Wide Web information service that allows users to search and browse by emotion. Our prototype, called Media EMOtion SEarch (MEMOSE), is largely based on the results of research regarding emotive music pieces, images and videos. In order to index both evoked and depicted emotions in these three media types and to make them searchable, we work with a controlled vocabulary, slide controls to adjust the emotions’ intensities, and broad folksonomies to identify and separate the correct resource-specific emotions. This separation of so-called power tags is based on a tag distribution which follows either an inverse power law (only one emotion was recognized) or an inverse-logistical shape (two or three emotions were recognized). Both distributions are well known in information science. MEMOSE consists of a tool for tagging basic emotions with the help of slide controls, a processing device to separate power tags, a retrieval component consisting of a search interface (for any topic in combination with one or more emotions) and a results screen. The latter shows two separately ranked lists of items for each media type (depicted and felt emotions), displaying thumbnails of resources, ranked by the mean values of intensity. In the evaluation of the MEMOSE prototype, study participants described our EmIR system as an enjoyable Web 2.0 service

    Continuous Analysis of Affect from Voice and Face

    Get PDF
    Human affective behavior is multimodal, continuous and complex. Despite major advances within the affective computing research field, modeling, analyzing, interpreting and responding to human affective behavior still remains a challenge for automated systems as affect and emotions are complex constructs, with fuzzy boundaries and with substantial individual differences in expression and experience [7]. Therefore, affective and behavioral computing researchers have recently invested increased effort in exploring how to best model, analyze and interpret the subtlety, complexity and continuity (represented along a continuum e.g., from −1 to +1) of affective behavior in terms of latent dimensions (e.g., arousal, power and valence) and appraisals, rather than in terms of a small number of discrete emotion categories (e.g., happiness and sadness). This chapter aims to (i) give a brief overview of the existing efforts and the major accomplishments in modeling and analysis of emotional expressions in dimensional and continuous space while focusing on open issues and new challenges in the field, and (ii) introduce a representative approach for multimodal continuous analysis of affect from voice and face, and provide experimental results using the audiovisual Sensitive Artificial Listener (SAL) Database of natural interactions. The chapter concludes by posing a number of questions that highlight the significant issues in the field, and by extracting potential answers to these questions from the relevant literature. The chapter is organized as follows. Section 10.2 describes theories of emotion, Sect. 10.3 provides details on the affect dimensions employed in the literature as well as how emotions are perceived from visual, audio and physiological modalities. Section 10.4 summarizes how current technology has been developed, in terms of data acquisition and annotation, and automatic analysis of affect in continuous space by bringing forth a number of issues that need to be taken into account when applying a dimensional approach to emotion recognition, namely, determining the duration of emotions for automatic analysis, modeling the intensity of emotions, determining the baseline, dealing with high inter-subject expression variation, defining optimal strategies for fusion of multiple cues and modalities, and identifying appropriate machine learning techniques and evaluation measures. Section 10.5 presents our representative system that fuses vocal and facial expression cues for dimensional and continuous prediction of emotions in valence and arousal space by employing the bidirectional Long Short-Term Memory neural networks (BLSTM-NN), and introduces an output-associative fusion framework that incorporates correlations between the emotion dimensions to further improve continuous affect prediction. Section 10.6 concludes the chapter

    Survey on Emotional Body Gesture Recognition

    Get PDF
    Automatic emotion recognition has become a trending research topic in the past decade. While works based on facial expressions or speech abound, recognizing affect from body gestures remains a less explored topic. We present a new comprehensive survey hoping to boost research in the field. We first introduce emotional body gestures as a component of what is commonly known as "body language" and comment general aspects as gender differences and culture dependence. We then define a complete framework for automatic emotional body gesture recognition. We introduce person detection and comment static and dynamic body pose estimation methods both in RGB and 3D. We then comment the recent literature related to representation learning and emotion recognition from images of emotionally expressive gestures. We also discuss multi-modal approaches that combine speech or face with body gestures for improved emotion recognition. While pre-processing methodologies (e.g., human detection and pose estimation) are nowadays mature technologies fully developed for robust large scale analysis, we show that for emotion recognition the quantity of labelled data is scarce. There is no agreement on clearly defined output spaces and the representations are shallow and largely based on naive geometrical representations

    Motion and emotion : Semantic knowledge for hollywood film indexing

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Brain Music : Sistema generativo para la creación de música simbólica a partir de respuestas neuronales afectivas

    Get PDF
    gráficas, tablasEsta tesis de maestría presenta una metodología de aprendizaje profundo multimodal innovadora que fusiona un modelo de clasificación de emociones con un generador musical, con el propósito de crear música a partir de señales de electroencefalografía, profundizando así en la interconexión entre emociones y música. Los resultados alcanzan tres objetivos específicos: Primero, ya que el rendimiento de los sistemas interfaz cerebro-computadora varía considerablemente entre diferentes sujetos, se introduce un enfoque basado en la transferencia de conocimiento entre sujetos para mejorar el rendimiento de individuos con dificultades en sistemas de interfaz cerebro-computadora basados en el paradigma de imaginación motora. Este enfoque combina datos de EEG etiquetados con datos estructurados, como cuestionarios psicológicos, mediante un método de "Kernel Matching CKA". Utilizamos una red neuronal profunda (Deep&Wide) para la clasificación de la imaginación motora. Los resultados destacan su potencial para mejorar las habilidades motoras en interfaces cerebro-computadora. Segundo, proponemos una técnica innovadora llamada "Labeled Correlation Alignment"(LCA) para sonificar respuestas neurales a estímulos representados en datos no estructurados, como música afectiva. Esto genera características musicales basadas en la actividad cerebral inducida por las emociones. LCA aborda la variabilidad entre sujetos y dentro de sujetos mediante el análisis de correlación, lo que permite la creación de envolventes acústicos y la distinción entre diferente información sonora. Esto convierte a LCA en una herramienta prometedora para interpretar la actividad neuronal y su reacción a estímulos auditivos. Finalmente, en otro capítulo, desarrollamos una metodología de aprendizaje profundo de extremo a extremo para generar contenido musical MIDI (datos simbólicos) a partir de señales de actividad cerebral inducidas por música con etiquetas afectivas. Esta metodología abarca el preprocesamiento de datos, el entrenamiento de modelos de extracción de características y un proceso de emparejamiento de características mediante Deep Centered Kernel Alignment, lo que permite la generación de música a partir de señales EEG. En conjunto, estos logros representan avances significativos en la comprensión de la relación entre emociones y música, así como en la aplicación de la inteligencia artificial en la generación musical a partir de señales cerebrales. Ofrecen nuevas perspectivas y herramientas para la creación musical y la investigación en neurociencia emocional. Para llevar a cabo nuestros experimentos, utilizamos bases de datos públicas como GigaScience, Affective Music Listening y Deap Dataset (Texto tomado de la fuente)This master’s thesis presents an innovative multimodal deep learning methodology that combines an emotion classification model with a music generator, aimed at creating music from electroencephalography (EEG) signals, thus delving into the interplay between emotions and music. The results achieve three specific objectives: First, since the performance of brain-computer interface systems varies significantly among different subjects, an approach based on knowledge transfer among subjects is introduced to enhance the performance of individuals facing challenges in motor imagery-based brain-computer interface systems. This approach combines labeled EEG data with structured information, such as psychological questionnaires, through a "Kernel Matching CKA"method. We employ a deep neural network (Deep&Wide) for motor imagery classification. The results underscore its potential to enhance motor skills in brain-computer interfaces. Second, we propose an innovative technique called "Labeled Correlation Alignment"(LCA) to sonify neural responses to stimuli represented in unstructured data, such as affective music. This generates musical features based on emotion-induced brain activity. LCA addresses variability among subjects and within subjects through correlation analysis, enabling the creation of acoustic envelopes and the distinction of different sound information. This makes LCA a promising tool for interpreting neural activity and its response to auditory stimuli. Finally, in another chapter, we develop an end-to-end deep learning methodology for generating MIDI music content (symbolic data) from EEG signals induced by affectively labeled music. This methodology encompasses data preprocessing, feature extraction model training, and a feature matching process using Deep Centered Kernel Alignment, enabling music generation from EEG signals. Together, these achievements represent significant advances in understanding the relationship between emotions and music, as well as in the application of artificial intelligence in musical generation from brain signals. They offer new perspectives and tools for musical creation and research in emotional neuroscience. To conduct our experiments, we utilized public databases such as GigaScience, Affective Music Listening and Deap DatasetMaestríaMagíster en Ingeniería - Automatización IndustrialInvestigación en Aprendizaje Profundo y señales BiológicasEléctrica, Electrónica, Automatización Y Telecomunicaciones.Sede Manizale

    Socio-Cognitive and Affective Computing

    Get PDF
    Social cognition focuses on how people process, store, and apply information about other people and social situations. It focuses on the role that cognitive processes play in social interactions. On the other hand, the term cognitive computing is generally used to refer to new hardware and/or software that mimics the functioning of the human brain and helps to improve human decision-making. In this sense, it is a type of computing with the goal of discovering more accurate models of how the human brain/mind senses, reasons, and responds to stimuli. Socio-Cognitive Computing should be understood as a set of theoretical interdisciplinary frameworks, methodologies, methods and hardware/software tools to model how the human brain mediates social interactions. In addition, Affective Computing is the study and development of systems and devices that can recognize, interpret, process, and simulate human affects, a fundamental aspect of socio-cognitive neuroscience. It is an interdisciplinary field spanning computer science, electrical engineering, psychology, and cognitive science. Physiological Computing is a category of technology in which electrophysiological data recorded directly from human activity are used to interface with a computing device. This technology becomes even more relevant when computing can be integrated pervasively in everyday life environments. Thus, Socio-Cognitive and Affective Computing systems should be able to adapt their behavior according to the Physiological Computing paradigm. This book integrates proposals from researchers who use signals from the brain and/or body to infer people's intentions and psychological state in smart computing systems. The design of this kind of systems combines knowledge and methods of ubiquitous and pervasive computing, as well as physiological data measurement and processing, with those of socio-cognitive and affective computing
    corecore