49 research outputs found

    2CET-GAN: Pixel-Level GAN Model for Human Facial Expression Transfer

    Full text link
    Recent studies have used GAN to transfer expressions between human faces. However, existing models have many flaws: relying on emotion labels, lacking continuous expressions, and failing to capture the expression details. To address these limitations, we propose a novel CycleGAN- and InfoGAN-based network called 2 Cycles Expression Transfer GAN (2CET-GAN), which can learn continuous expression transfer without using emotion labels. The experiment shows our network can generate diverse and high-quality expressions and can generalize to unknown identities. To the best of our knowledge, we are among the first to successfully use an unsupervised approach to disentangle expression representation from identities at the pixel level.Comment: 9 pages, 5 figure

    EmoStim: A Database of Emotional Film Clips with Discrete and Componential Assessment

    Full text link
    Emotion elicitation using emotional film clips is one of the most common and ecologically valid methods in Affective Computing. However, selecting and validating appropriate materials that evoke a range of emotions is challenging. Here we present EmoStim: A Database of Emotional Film Clips as a film library with a rich and varied content. EmoStim is designed for researchers interested in studying emotions in relation to either discrete or componential models of emotion. To create the database, 139 film clips were selected from literature and then annotated by 638 participants through the CrowdFlower platform. We selected 99 film clips based on the distribution of subjective ratings that effectively distinguished between emotions defined by the discrete model. We show that the selected film clips reliably induce a range of specific emotions according to the discrete model. Further, we describe relationships between emotions, emotion organization in the componential space, and underlying dimensions representing emotional experience. The EmoStim database and participant annotations are freely available for research purposes. The database can be used to enrich our understanding of emotions further and serve as a guide to select or create additional materials.Comment: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibl

    Towards a Technology of Nonverbal Communication: Vocal Behavior in Social and Affective Phenomena

    Get PDF
    Nonverbal communication is the main channel through which we experience inner life of others, including their emotions, feelings, moods, social attitudes, etc. This attracts the interest of the computing community because nonverbal communication is based on cues like facial expressions, vocalizations, gestures, postures, etc. that we can perceive with our senses and can be (and often are) detected, analyzed and synthesized with automatic approaches. In other words, nonverbal communication can be used as a viable interface between computers and some of the most important aspects of human psychology such as emotions and social attitudes. As a result, a new computing domain seems to emerge that we can define “technology of nonverbal communicationâ€. This chapter outlines some of the most salient aspects of such a potentially new domain and outlines some of its most important perspectives for the future

    A New 2D Corner Detector for Extracting Landmarks from Brain MR Images

    Get PDF
    Point-based registration of images strongly depends on the extraction of suitable landmarks. Recently, various 2D operators have been proposed for the detection of corner points but most of them are not effective for medical images that need a high accuracy. In this paper we have proposed a new automatic corner detector based on the covariance between the small region of support around a central pixel and its rotated one. The main goal of this paper is medical images so we especially focus on extracting brain MR image’s control points which play an important role in accuracy of registration. This approach has been improved by refined localization through a differential edge intersection approach proposed by Karl Rohr. This method is robust to rotation, transition and scaling and in comparison with other grayscale methods has better results particularly for the brain MR images and also has acceptable robustness to distortion which is a common incident in brain surgeries. In the first part of this paper we describe the algorithm and in the second part we investigate the results of this algorithm on different MR images and its ability to detect corresponding points under elastic deformation and noise. It turns out that this method: 1)detect larger number of corresponding points that the other operators, 2)its performance on the basis of the statistical measures is better, and 3)by choosing a suitable region of support, it can significantly decrease the number of false detection

    Speech-Gesture GAN: Gesture Generation for Robots and Embodied Agents

    Full text link
    Embodied agents, in the form of virtual agents or social robots, are rapidly becoming more widespread. In human-human interactions, humans use nonverbal behaviours to convey their attitudes, feelings, and intentions. Therefore, this capability is also required for embodied agents in order to enhance the quality and effectiveness of their interactions with humans. In this paper, we propose a novel framework that can generate sequences of joint angles from the speech text and speech audio utterances. Based on a conditional Generative Adversarial Network (GAN), our proposed neural network model learns the relationships between the co-speech gestures and both semantic and acoustic features from the speech input. In order to train our neural network model, we employ a public dataset containing co-speech gestures with corresponding speech audio utterances, which were captured from a single male native English speaker. The results from both objective and subjective evaluations demonstrate the efficacy of our gesture-generation framework for Robots and Embodied Agents.Comment: RO-MAN'23, 32nd IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), August 2023, Busan, South Kore

    The Voice of Personality: Mapping Nonverbal Vocal Behavior into Trait Attributions

    Get PDF
    This paper reports preliminary experiments on automatic attribution of personality traits based on nonverbal vocal behavioral cues. In particular, the work shows how prosodic features can be used to predict, with an accuracy up to 75% depending on the trait, the personality assessments performed by human judges on a collection of 640 speech samples. The assessments are based on a short version of the Big Five Inventory, one of the most widely used ques- tionnaires for personality assessment. The judges did not understand the language spoken in the speech samples so that the influence of the verbal content is limited. To the best of our knowledge, this is the first work aimed at infer- ring automatically traits attributed by judges rather than traits self-reported by subjects

    From Speech to Personality: Mapping Voice Quality and Intonation into Personality Differences

    Get PDF
    From a cognitive point of view, personality perception corresponds to capturing individual dierences and can be thought of as positioning the people around us in an ideal personality space. The more similar the personality of two individuals the closer their position in the space. This work shows that the mutual position of two individuals in the personality space can be inferred from prosodic features. The experiments, based on ordinal regression techniques, have been performed over a corpus of 640 speech samples comprising 322 individuals assessed in terms of personality traits by 11 human judges, which is the largest database of this type in the literature. The results show that the mutual position of two individuals can be predicted with up to 80% accuracy

    Multi-view video segmentation and tracking for video surveillance

    Get PDF
    Tracking moving objects is a critical step for smart video surveillance systems. Despite the complexity increase, multiple camera systems exhibit the undoubted advantages of covering wide areas and handling the occurrence of occlusions by exploiting the different viewpoints. The technical problems in multiple camera systems are several: installation, calibration, objects matching, switching, data fusion, and occlusion handling. In this paper, we address the issue of tracking moving objects in an environment covered by multiple un-calibrated cameras with overlapping fields of view, typical of most surveillance setups. Our main objective is to create a framework that can be used to integrate object-tracking information from multiple video sources. Basically, the proposed technique consists of the following steps. We first perform a single- view tracking algorithm on each camera view, and then apply a consistent object labeling algorithm on all views. In the next step, we verify objects in each view separately for inconsistencies. Correspondent objects are extracted through a Homography transform from one view to the other and vice versa. Having found the correspondent objects of different views, we partition each object into homogeneous regions. In the last step, we apply the Homography transform to find the region map of first view in the second view and vice versa. For each region (in the main frame and mapped frame) a set of descriptors are extracted to find the best match between two views based on region descriptors similarity. This method is able to deal with multiple objects. Track management issues such as occlusion, appearance and disappearance of objects are resolved using information from all views. This method is capable of tracking rigid and deformable objects and this versatility lets it to be suitable for different application scenarios

    Effective permeability of an immiscible fluid in porous media determined from its geometric state

    Full text link
    Based on the phenomenological extension of Darcy's law, two-fluid flow is dependent on a relative permeability function of saturation only that is process/path dependent with an underlying dependency on pore structure. For applications, fuel cells to underground CO2CO_2 storage, it is imperative to determine the effective phase permeability relationships where the traditional approach is based on the inverse modelling of time-consuming experiments. The underlying reason is that the fundamental upscaling step from pore to Darcy scale, which links the pore structure of the porous medium to the continuum hydraulic conductivities, is not solved. Herein, we develop an Artificial Neural Network (ANN) that relies on fundamental geometrical relationships to determine the mechanical energy dissipation during creeping immiscible two-fluid flow. The developed ANN is based on a prescribed set of state variables based on physical insights that predicts the effective permeability of 4,500 unseen pore-scale geometrical states with R2=0.98R^2 = 0.98.Comment: 6 Pages, 2 Figures, and Supporting Materia

    Removing Ocular Artifacts from EEG Signals Using Adaptive Filtering and ARMAX Modeling

    Get PDF
    EEG signal is one of the oldest measures of brain activity that has been used vastly for clinical diagnoses and biomedical researches. However, EEG signals are highly contaminated with various artifacts, both from the subject and from equipment interferences. Among these various kinds of artifacts, ocular noise is the most important one. Since many applications such as BCI require online and real-time processing of EEG signal, it is ideal if the removal of artifacts is performed in an online fashion. Recently, some methods for online ocular artifact removing have been proposed. One of these methods is ARMAX modeling of EEG signal. This method assumes that the recorded EEG signal is a combination of EOG artifacts and the background EEG. Then the background EEG is estimated via estimation of ARMAX parameters. The other recently proposed method is based on adaptive filtering. This method uses EOG signal as the reference input and subtracts EOG artifacts from recorded EEG signals. In this paper we investigate the efficiency of each method for removing of EOG artifacts. A comparison is made between these two methods. Our undertaken conclusion from this comparison is that adaptive filtering method has better results compared with the results achieved by ARMAX modeling
    corecore