319 research outputs found
How to Synthesize a Large-Scale and Trainable Micro-Expression Dataset?
This paper does not contain technical novelty but introduces our key
discoveries in a data generation protocol, a database and insights. We aim to
address the lack of large-scale datasets in micro-expression (MiE) recognition
due to the prohibitive cost of data collection, which renders large-scale
training less feasible. To this end, we develop a protocol to automatically
synthesize large scale MiE training data that allow us to train improved
recognition models for real-world test data. Specifically, we discover three
types of Action Units (AUs) that can constitute trainable MiEs. These AUs come
from real-world MiEs, early frames of macro-expression videos, and the
relationship between AUs and expression categories defined by human expert
knowledge. With these AUs, our protocol then employs large numbers of face
images of various identities and an off-the-shelf face generator for MiE
synthesis, yielding the MiE-X dataset. MiE recognition models are trained or
pre-trained on MiE-X and evaluated on real-world test sets, where very
competitive accuracy is obtained. Experimental results not only validate the
effectiveness of the discovered AUs and MiE-X dataset but also reveal some
interesting properties of MiEs: they generalize across faces, are close to
early-stage macro-expressions, and can be manually defined.Comment: European Conference on Computer Vision 202
A multiscale hybrid mathematical model of epidermal-dermal interactions during skin wound healing.
Following injury, skin activates a complex wound healing programme. While cellular and signalling mechanisms of wound repair have been extensively studied, the principles of epidermal-dermal interactions and their effects on wound healing outcomes are only partially understood. To gain new insight into the effects of epidermal-dermal interactions, we developed a multiscale, hybrid mathematical model of skin wound healing. The model takes into consideration interactions between epidermis and dermis across the basement membrane via diffusible signals, defined as activator and inhibitor. Simulations revealed that epidermal-dermal interactions are critical for proper extracellular matrix deposition in the dermis, suggesting these signals may influence how wound scars form. Our model makes several theoretical predictions. First, basal levels of epidermal activator and inhibitor help to maintain dermis in a steady state, whereas their absence results in a raised, scar-like dermal phenotype. Second, wound-triggered increase in activator and inhibitor production by basal epidermal cells, coupled with fast re-epithelialization kinetics, reduces dermal scar size. Third, high-density fibrin clot leads to a raised, hypertrophic scar phenotype, whereas low-density fibrin clot leads to a hypotrophic phenotype. Fourth, shallow wounds, compared to deep wounds, result in overall reduced scarring. Taken together, our model predicts the important role of signalling across dermal-epidermal interface and the effect of fibrin clot density and wound geometry on scar formation. This hybrid modelling approach may be also applicable to other complex tissue systems, enabling the simulation of dynamic processes, otherwise computationally prohibitive with fully discrete models due to a large number of variables
Fusion of wearable and contactless sensors for intelligent gesture recognition
This paper presents a novel approach of fusing datasets from multiple sensors using a hierarchical support vector machine algorithm. The validation of this method was experimentally carried out using an intelligent learning system that combines two different data sources. The sensors are based on a contactless sensor, which is a radar that detects the movements of the hands and fingers, as well as a wearable sensor, which is a flexible pressure sensor array that measures pressure distribution around the wrist. A hierarchical support vector machine architecture has been developed to effectively fuse different data types in terms of sampling rate, data format and gesture information from the pressure sensors and radar. In this respect, the proposed method was compared with the classification results from each of the two sensors independently. Datasets from 15 different participants were collected and analyzed in this work. The results show that the radar on its own provides a mean classification accuracy of 76.7%, while the pressure sensors provide an accuracy of 69.0%. However, enhancing the pressure sensors’ output results with radar using the proposed hierarchical support vector machine algorithm improves the classification accuracy to 92.5%
Towards End-to-End Embodied Decision Making via Multi-modal Large Language Model: Explorations with GPT4-Vision and Beyond
In this study, we explore the potential of Multimodal Large Language Models
(MLLMs) in improving embodied decision-making processes for agents. While Large
Language Models (LLMs) have been widely used due to their advanced reasoning
skills and vast world knowledge, MLLMs like GPT4-Vision offer enhanced visual
understanding and reasoning capabilities. We investigate whether
state-of-the-art MLLMs can handle embodied decision-making in an end-to-end
manner and whether collaborations between LLMs and MLLMs can enhance
decision-making. To address these questions, we introduce a new benchmark
called PCA-EVAL, which evaluates embodied decision-making from the perspectives
of Perception, Cognition, and Action. Additionally, we propose HOLMES, a
multi-agent cooperation framework that allows LLMs to leverage MLLMs and APIs
to gather multimodal information for informed decision-making. We compare
end-to-end embodied decision-making and HOLMES on our benchmark and find that
the GPT4-Vision model demonstrates strong end-to-end embodied decision-making
abilities, outperforming GPT4-HOLMES in terms of average decision accuracy
(+3%). However, this performance is exclusive to the latest GPT4-Vision model,
surpassing the open-source state-of-the-art MLLM by 26%. Our results indicate
that powerful MLLMs like GPT4-Vision hold promise for decision-making in
embodied agents, offering new avenues for MLLM research. Code and data are open
at https://github.com/pkunlp-icler/PCA-EVAL/.Comment: FMDM@NeurIPS2023, Code and data:
https://github.com/pkunlp-icler/PCA-EVAL
- …