14 research outputs found

    SmileNet: Registration-Free Smiling Face Detection In The Wild

    Get PDF

    Emotional Theory of Mind: Bridging Fast Visual Processing with Slow Linguistic Reasoning

    Full text link
    The emotional theory of mind problem in images is an emotion recognition task, specifically asking "How does the person in the bounding box feel?" Facial expressions, body pose, contextual information and implicit commonsense knowledge all contribute to the difficulty of the task, making this task currently one of the hardest problems in affective computing. The goal of this work is to evaluate the emotional commonsense knowledge embedded in recent large vision language models (CLIP, LLaVA) and large language models (GPT-3.5) on the Emotions in Context (EMOTIC) dataset. In order to evaluate a purely text-based language model on images, we construct "narrative captions" relevant to emotion perception, using a set of 872 physical social signal descriptions related to 26 emotional categories, along with 224 labels for emotionally salient environmental contexts, sourced from writer's guides for character expressions and settings. We evaluate the use of the resulting captions in an image-to-language-to-emotion task. Experiments using zero-shot vision-language models on EMOTIC show that combining "fast" and "slow" reasoning is a promising way forward to improve emotion recognition systems. Nevertheless, a gap remains in the zero-shot emotional theory of mind task compared to prior work trained on the EMOTIC dataset.Comment: 16 pages(including references and appendix), 8 Tables, 3 figure

    Exploring remote photoplethysmography signals for deepfake detection in facial videos

    Get PDF
    Abstract. With the advent of deep learning-based facial forgeries, also called "deepfakes", the feld of accurately detecting forged videos has become a quickly growing area of research. For this endeavor, remote photoplethysmography, the process of extracting biological signals such as the blood volume pulse and heart rate from facial videos, offers an interesting avenue for detecting fake videos that appear utterly authentic to the human eye. This thesis presents an end-to-end system for deepfake video classifcation using remote photoplethysmography. The minuscule facial pixel colour changes are used to extract the rPPG signal, from which various features are extracted and used to train an XGBoost classifer. The classifer is then tested using various colour-to-blood volume pulse methods (OMIT, POS, LGI and CHROM) and three feature extraction window lengths of two, four and eight seconds. The classifer was found effective at detecting deepfake videos with an accuracy of 85 %, with minimal performance difference found between the window lengths. The GREEN channel signal was found to be important for this classifcationEtäfotoplethysmografian hyödyntäminen syväväärennösten tunnistamiseen. Tiivistelmä. Syväväärennösten eli syväoppimiseen perustuvien kasvoväärennöksien yleistyessä väärennösten tarkasta tunnistamisesta koneellisesti on tullut nopeasti kasvava tutkimusalue. Etäfotoplethysmografa (rPPG) eli biologisten signaalien kuten veritilavuuspulssin tai sykkeen mittaaminen videokuvasta tarjoaa kiinnostavan keinon tunnistaa väärennöksiä, jotka vaikuttavat täysin aidoilta ihmissilmälle. Tässä diplomityössä esitellään etäfotoplethysmografaan perustuva syväväärennösten tunnistusmetodi. Kasvojen minimaalisia värimuutoksia hyväksikäyttämällä mitataan fotoplethysmografasignaali, josta lasketuilla ominaisuuksilla koulutetaan XGBoost-luokittelija. Luokittelijaa testataan usealla eri värisignaalista veritilavuussignaaliksi muuntavalla metodilla sekä kolmella eri ominaisuuksien ikkunapituudella. Luokittelija pystyy tunnistamaan väärennetyn videon aidosta 85 % tarkkuudella. Eri ikkunapituuksien välillä oli minimaalisia eroja, ja vihreän värin signaalin havaittiin olevan luokittelun suorituskyvyn kannalta merkittävä

    "How" and "what" matters: Sampling method affects biodiversity estimates of reef fishes

    Get PDF
    Understanding changes in biodiversity requires the implementation of monitoring programs encompassing different dimensions of biodiversity through varying sampling techniques. In this work, fish assemblages associated with the "outer" and "inner" sides of four marinas, two at the Canary Islands and two at southern Portugal, were investigated using three complementary sampling techniques: underwater visual censuses (UVCs), baited cameras (BCs), and fish traps (FTs). We firstly investigated the complementarity of these sampling methods to describe species composition. Then, we investigated differences in taxonomic (TD), phylogenetic (PD) and functional diversity (FD) between sides of the marinas according to each sampling method. Finally, we explored the applicability/reproducibility of each sampling technique to characterize fish assemblages according to these metrics of diversity. UVCs and BCs provided complementary information, in terms of the number and abundances of species, while FTs sampled a particular assemblage. Patterns of TD, PD, and FD between sides of the marinas varied depending on the sampling method. UVC was the most cost-efficient technique, in terms of personnel hours, and it is recommended for local studies. However, for large-scale studies, BCs are recommended, as it covers greater spatio-temporal scales by a lower cost. Our study highlights the need to implement complementary sampling techniques to monitor ecological change, at various dimensions of biodiversity. The results presented here will be useful for optimizing future monitoring programs.FCT-Foundation for Science and Technology [CCMAR/Multi/04326/2013]info:eu-repo/semantics/publishedVersio
    corecore