142 research outputs found

    Generative Action Description Prompts for Skeleton-based Action Recognition

    Full text link
    Skeleton-based action recognition has recently received considerable attention. Current approaches to skeleton-based action recognition are typically formulated as one-hot classification tasks and do not fully exploit the semantic relations between actions. For example, "make victory sign" and "thumb up" are two actions of hand gestures, whose major difference lies in the movement of hands. This information is agnostic from the categorical one-hot encoding of action classes but could be unveiled from the action description. Therefore, utilizing action description in training could potentially benefit representation learning. In this work, we propose a Generative Action-description Prompts (GAP) approach for skeleton-based action recognition. More specifically, we employ a pre-trained large-scale language model as the knowledge engine to automatically generate text descriptions for body parts movements of actions, and propose a multi-modal training scheme by utilizing the text encoder to generate feature vectors for different body parts and supervise the skeleton encoder for action representation learning. Experiments show that our proposed GAP method achieves noticeable improvements over various baseline models without extra computation cost at inference. GAP achieves new state-of-the-arts on popular skeleton-based action recognition benchmarks, including NTU RGB+D, NTU RGB+D 120 and NW-UCLA. The source code is available at https://github.com/MartinXM/GAP.Comment: Accepted by ICCV2

    Diff-ID: An Explainable Identity Difference Quantification Framework for DeepFake Detection

    Full text link
    Despite the fact that DeepFake forgery detection algorithms have achieved impressive performance on known manipulations, they often face disastrous performance degradation when generalized to an unseen manipulation. Some recent works show improvement in generalization but rely on features fragile to image distortions such as compression. To this end, we propose Diff-ID, a concise and effective approach that explains and measures the identity loss induced by facial manipulations. When testing on an image of a specific person, Diff-ID utilizes an authentic image of that person as a reference and aligns them to the same identity-insensitive attribute feature space by applying a face-swapping generator. We then visualize the identity loss between the test and the reference image from the image differences of the aligned pairs, and design a custom metric to quantify the identity loss. The metric is then proved to be effective in distinguishing the forgery images from the real ones. Extensive experiments show that our approach achieves high detection performance on DeepFake images and state-of-the-art generalization ability to unknown forgery methods, while also being robust to image distortions

    A Developmental Evolutionary Learning Framework for Robotic Chinese Stroke Writing

    Get PDF
    The ability of robots to write Chinese strokes, which is recognized as a sophisticated task, involves complicated kinematic control algorithms. The conventional approaches for robotic writing of Chinese strokes often suffer from limited font generation methods, which limits the ability of robots to perform high-quality writing. This paper instead proposes a developmental evolutionary learning framework that enables a robot to learn to write fundamental Chinese strokes. The framework first considers the learning process of robotic writing as an evolutionary easy-to-difficult procedure. Then, a developmental learning mechanism called “Lift-constraint, act and saturate” that stems from developmental robotics is used to determine how the robot learns tasks ranging from simple to difficult by building on the learning results from the easy tasks. The developmental constraints, which include altitude adjustments, number of mutation points, and stroke trajectory points, determine the learning complexity of robot writing. The developmental algorithm divides the evolutionary procedure into three developmental learning stages. In each stage, the stroke trajectory points gradually increase, while the number of mutation points and adjustment altitudes gradually decrease, allowing the learning difficulties involved in these three stages to be categorized as easy, medium, and difficult. Our robot starts with an easy learning task and then gradually progresses to the medium and difficult tasks. Under various developmental constraint setups in each stage, the robot applies an evolutionary algorithm to handle the basic shapes of the Chinese strokes and eventually acquires the ability to write with good quality. The experimental results demonstrate that the proposed framework allows a calligraphic robot to gradually learn to write five fundamental Chinese strokes and also reveal a developmental pattern similar to that of humans. Compared to an evolutionary algorithm without the developmental mechanism, the proposed framework achieves good writing quality more rapidly

    An On-demand Photonic Ising Machine with Simplified Hamiltonian Calculation by Phase-encoding and Intensity Detection

    Full text link
    Photonic Ising machine is a new paradigm of optical computing, which is based on the characteristics of light wave propagation, parallel processing and low loss transmission. Thus, the process of solving the combinatorial optimization problems can be accelerated through photonic/optoelectronic devices. In this work, we have proposed and demonstrated the so-called Phase-Encoding and Intensity Detection Ising Annealer (PEIDIA) to solve arbitrary Ising problems on demand. The PEIDIA is based on the simulated annealing algorithm and requires only one step of optical linear transformation with simplified Hamiltonian calculation. With PEIDIA, the Ising spins are encoded on the phase term of the optical field and only intensity detection is required during the solving process. As a proof of principle, several 20 and 30-dimensional Ising problems have been solved with high ground state probability
    corecore