319 research outputs found

    How to Synthesize a Large-Scale and Trainable Micro-Expression Dataset?

    Full text link
    This paper does not contain technical novelty but introduces our key discoveries in a data generation protocol, a database and insights. We aim to address the lack of large-scale datasets in micro-expression (MiE) recognition due to the prohibitive cost of data collection, which renders large-scale training less feasible. To this end, we develop a protocol to automatically synthesize large scale MiE training data that allow us to train improved recognition models for real-world test data. Specifically, we discover three types of Action Units (AUs) that can constitute trainable MiEs. These AUs come from real-world MiEs, early frames of macro-expression videos, and the relationship between AUs and expression categories defined by human expert knowledge. With these AUs, our protocol then employs large numbers of face images of various identities and an off-the-shelf face generator for MiE synthesis, yielding the MiE-X dataset. MiE recognition models are trained or pre-trained on MiE-X and evaluated on real-world test sets, where very competitive accuracy is obtained. Experimental results not only validate the effectiveness of the discovered AUs and MiE-X dataset but also reveal some interesting properties of MiEs: they generalize across faces, are close to early-stage macro-expressions, and can be manually defined.Comment: European Conference on Computer Vision 202

    A multiscale hybrid mathematical model of epidermal-dermal interactions during skin wound healing.

    Get PDF
    Following injury, skin activates a complex wound healing programme. While cellular and signalling mechanisms of wound repair have been extensively studied, the principles of epidermal-dermal interactions and their effects on wound healing outcomes are only partially understood. To gain new insight into the effects of epidermal-dermal interactions, we developed a multiscale, hybrid mathematical model of skin wound healing. The model takes into consideration interactions between epidermis and dermis across the basement membrane via diffusible signals, defined as activator and inhibitor. Simulations revealed that epidermal-dermal interactions are critical for proper extracellular matrix deposition in the dermis, suggesting these signals may influence how wound scars form. Our model makes several theoretical predictions. First, basal levels of epidermal activator and inhibitor help to maintain dermis in a steady state, whereas their absence results in a raised, scar-like dermal phenotype. Second, wound-triggered increase in activator and inhibitor production by basal epidermal cells, coupled with fast re-epithelialization kinetics, reduces dermal scar size. Third, high-density fibrin clot leads to a raised, hypertrophic scar phenotype, whereas low-density fibrin clot leads to a hypotrophic phenotype. Fourth, shallow wounds, compared to deep wounds, result in overall reduced scarring. Taken together, our model predicts the important role of signalling across dermal-epidermal interface and the effect of fibrin clot density and wound geometry on scar formation. This hybrid modelling approach may be also applicable to other complex tissue systems, enabling the simulation of dynamic processes, otherwise computationally prohibitive with fully discrete models due to a large number of variables

    Fusion of wearable and contactless sensors for intelligent gesture recognition

    Get PDF
    This paper presents a novel approach of fusing datasets from multiple sensors using a hierarchical support vector machine algorithm. The validation of this method was experimentally carried out using an intelligent learning system that combines two different data sources. The sensors are based on a contactless sensor, which is a radar that detects the movements of the hands and fingers, as well as a wearable sensor, which is a flexible pressure sensor array that measures pressure distribution around the wrist. A hierarchical support vector machine architecture has been developed to effectively fuse different data types in terms of sampling rate, data format and gesture information from the pressure sensors and radar. In this respect, the proposed method was compared with the classification results from each of the two sensors independently. Datasets from 15 different participants were collected and analyzed in this work. The results show that the radar on its own provides a mean classification accuracy of 76.7%, while the pressure sensors provide an accuracy of 69.0%. However, enhancing the pressure sensors’ output results with radar using the proposed hierarchical support vector machine algorithm improves the classification accuracy to 92.5%

    Towards End-to-End Embodied Decision Making via Multi-modal Large Language Model: Explorations with GPT4-Vision and Beyond

    Full text link
    In this study, we explore the potential of Multimodal Large Language Models (MLLMs) in improving embodied decision-making processes for agents. While Large Language Models (LLMs) have been widely used due to their advanced reasoning skills and vast world knowledge, MLLMs like GPT4-Vision offer enhanced visual understanding and reasoning capabilities. We investigate whether state-of-the-art MLLMs can handle embodied decision-making in an end-to-end manner and whether collaborations between LLMs and MLLMs can enhance decision-making. To address these questions, we introduce a new benchmark called PCA-EVAL, which evaluates embodied decision-making from the perspectives of Perception, Cognition, and Action. Additionally, we propose HOLMES, a multi-agent cooperation framework that allows LLMs to leverage MLLMs and APIs to gather multimodal information for informed decision-making. We compare end-to-end embodied decision-making and HOLMES on our benchmark and find that the GPT4-Vision model demonstrates strong end-to-end embodied decision-making abilities, outperforming GPT4-HOLMES in terms of average decision accuracy (+3%). However, this performance is exclusive to the latest GPT4-Vision model, surpassing the open-source state-of-the-art MLLM by 26%. Our results indicate that powerful MLLMs like GPT4-Vision hold promise for decision-making in embodied agents, offering new avenues for MLLM research. Code and data are open at https://github.com/pkunlp-icler/PCA-EVAL/.Comment: FMDM@NeurIPS2023, Code and data: https://github.com/pkunlp-icler/PCA-EVAL
    • …
    corecore