142 research outputs found
Generative Action Description Prompts for Skeleton-based Action Recognition
Skeleton-based action recognition has recently received considerable
attention. Current approaches to skeleton-based action recognition are
typically formulated as one-hot classification tasks and do not fully exploit
the semantic relations between actions. For example, "make victory sign" and
"thumb up" are two actions of hand gestures, whose major difference lies in the
movement of hands. This information is agnostic from the categorical one-hot
encoding of action classes but could be unveiled from the action description.
Therefore, utilizing action description in training could potentially benefit
representation learning. In this work, we propose a Generative
Action-description Prompts (GAP) approach for skeleton-based action
recognition. More specifically, we employ a pre-trained large-scale language
model as the knowledge engine to automatically generate text descriptions for
body parts movements of actions, and propose a multi-modal training scheme by
utilizing the text encoder to generate feature vectors for different body parts
and supervise the skeleton encoder for action representation learning.
Experiments show that our proposed GAP method achieves noticeable improvements
over various baseline models without extra computation cost at inference. GAP
achieves new state-of-the-arts on popular skeleton-based action recognition
benchmarks, including NTU RGB+D, NTU RGB+D 120 and NW-UCLA. The source code is
available at https://github.com/MartinXM/GAP.Comment: Accepted by ICCV2
Diff-ID: An Explainable Identity Difference Quantification Framework for DeepFake Detection
Despite the fact that DeepFake forgery detection algorithms have achieved
impressive performance on known manipulations, they often face disastrous
performance degradation when generalized to an unseen manipulation. Some recent
works show improvement in generalization but rely on features fragile to image
distortions such as compression. To this end, we propose Diff-ID, a concise and
effective approach that explains and measures the identity loss induced by
facial manipulations. When testing on an image of a specific person, Diff-ID
utilizes an authentic image of that person as a reference and aligns them to
the same identity-insensitive attribute feature space by applying a
face-swapping generator. We then visualize the identity loss between the test
and the reference image from the image differences of the aligned pairs, and
design a custom metric to quantify the identity loss. The metric is then proved
to be effective in distinguishing the forgery images from the real ones.
Extensive experiments show that our approach achieves high detection
performance on DeepFake images and state-of-the-art generalization ability to
unknown forgery methods, while also being robust to image distortions
A Developmental Evolutionary Learning Framework for Robotic Chinese Stroke Writing
The ability of robots to write Chinese strokes, which is recognized as a sophisticated task, involves complicated kinematic control algorithms. The conventional approaches for robotic writing of Chinese strokes often suffer from limited font generation methods, which limits the ability of robots to perform high-quality writing. This paper instead proposes a developmental evolutionary learning framework that enables a robot to learn to write fundamental Chinese strokes. The framework first considers the learning process of robotic writing as an evolutionary easy-to-difficult procedure. Then, a developmental learning mechanism called “Lift-constraint, act and saturate” that stems from developmental robotics is used to determine how the robot learns tasks ranging from simple to difficult by building on the learning results from the easy tasks. The developmental constraints, which include altitude adjustments, number of mutation points, and stroke trajectory points, determine the learning complexity of robot writing. The developmental algorithm divides the evolutionary procedure into three developmental learning stages. In each stage, the stroke trajectory points gradually increase, while the number of mutation points and adjustment altitudes gradually decrease, allowing the learning difficulties involved in these three stages to be categorized as easy, medium, and difficult. Our robot starts with an easy learning task and then gradually progresses to the medium and difficult tasks. Under various developmental constraint setups in each stage, the robot applies an evolutionary algorithm to handle the basic shapes of the Chinese strokes and eventually acquires the ability to write with good quality. The experimental results demonstrate that the proposed framework allows a calligraphic robot to gradually learn to write five fundamental Chinese strokes and also reveal a developmental pattern similar to that of humans. Compared to an evolutionary algorithm without the developmental mechanism, the proposed framework achieves good writing quality more rapidly
An On-demand Photonic Ising Machine with Simplified Hamiltonian Calculation by Phase-encoding and Intensity Detection
Photonic Ising machine is a new paradigm of optical computing, which is based
on the characteristics of light wave propagation, parallel processing and low
loss transmission. Thus, the process of solving the combinatorial optimization
problems can be accelerated through photonic/optoelectronic devices. In this
work, we have proposed and demonstrated the so-called Phase-Encoding and
Intensity Detection Ising Annealer (PEIDIA) to solve arbitrary Ising problems
on demand. The PEIDIA is based on the simulated annealing algorithm and
requires only one step of optical linear transformation with simplified
Hamiltonian calculation. With PEIDIA, the Ising spins are encoded on the phase
term of the optical field and only intensity detection is required during the
solving process. As a proof of principle, several 20 and 30-dimensional Ising
problems have been solved with high ground state probability
- …