5,532 research outputs found
DiffProtect: Generate Adversarial Examples with Diffusion Models for Facial Privacy Protection
The increasingly pervasive facial recognition (FR) systems raise serious
concerns about personal privacy, especially for billions of users who have
publicly shared their photos on social media. Several attempts have been made
to protect individuals from being identified by unauthorized FR systems
utilizing adversarial attacks to generate encrypted face images. However,
existing methods suffer from poor visual quality or low attack success rates,
which limit their utility. Recently, diffusion models have achieved tremendous
success in image generation. In this work, we ask: can diffusion models be used
to generate adversarial examples to improve both visual quality and attack
performance? We propose DiffProtect, which utilizes a diffusion autoencoder to
generate semantically meaningful perturbations on FR systems. Extensive
experiments demonstrate that DiffProtect produces more natural-looking
encrypted images than state-of-the-art methods while achieving significantly
higher attack success rates, e.g., 24.5% and 25.1% absolute improvements on the
CelebA-HQ and FFHQ datasets.Comment: Code will be available at https://github.com/joellliu/DiffProtect
A deep neural network model of the primate superior colliculus for emotion recognition
Although sensory processing is pivotal to nearly every theory of emotion, the evaluation of the visual input as ‘emotional’ (e.g. a smile as signalling happiness) has been traditionally assumed to take place in supramodal ‘limbic’ brain regions. Accordingly, subcortical structures of ancient evolutionary origin that receive direct input from the retina, such as the superior colliculus (SC), are traditionally conceptualized as passive relay centres. However, mounting evidence suggests that the SC is endowed with the necessary infrastructure and computational capabilities for the innate recognition and initial categorization of emotionally salient features from retinal information. Here, we built a neurobiologically inspired convolutional deep neural network (DNN) model that approximates physiological, anatomical and connectional properties of the retino-collicular circuit. This enabled us to characterize and isolate the initial computations and discriminations that the DNN model of the SC can perform on facial expressions, based uniquely on the information it directly receives from the virtual retina. Trained to discriminate facial expressions of basic emotions, our model matches human error patterns and above chance, yet suboptimal, classification accuracy analogous to that reported in patients with V1 damage, who rely on retino-collicular pathways for non-conscious vision of emotional attributes. When presented with gratings of different spatial frequencies and orientations never ‘seen’ before, the SC model exhibits spontaneous tuning to low spatial frequencies and reduced orientation discrimination, as can be expected from the prevalence of the magnocellular (M) over parvocellular (P) projections. Likewise, face manipulation that biases processing towards the M or P pathway affects expression recognition in the SC model accordingly, an effect that dovetails with variations of activity in the human SC purposely measured with ultra-high field functional magnetic resonance imaging. Lastly, the DNN generates saliency maps and extracts visual features, demonstrating that certain face parts, like the mouth or the eyes, provide higher discriminative information than other parts as a function of emotional expressions like happiness and sadness. The present findings support the contention that the SC possesses the necessary infrastructure to analyse the visual features that define facial emotional stimuli also without additional processing stages in the visual cortex or in ‘limbic’ areas
Neural Point-based Volumetric Avatar: Surface-guided Neural Points for Efficient and Photorealistic Volumetric Head Avatar
Rendering photorealistic and dynamically moving human heads is crucial for
ensuring a pleasant and immersive experience in AR/VR and video conferencing
applications. However, existing methods often struggle to model challenging
facial regions (e.g., mouth interior, eyes, hair/beard), resulting in
unrealistic and blurry results. In this paper, we propose {\fullname}
({\name}), a method that adopts the neural point representation as well as the
neural volume rendering process and discards the predefined connectivity and
hard correspondence imposed by mesh-based approaches. Specifically, the neural
points are strategically constrained around the surface of the target
expression via a high-resolution UV displacement map, achieving increased
modeling capacity and more accurate control. We introduce three technical
innovations to improve the rendering and training efficiency: a patch-wise
depth-guided (shading point) sampling strategy, a lightweight radiance decoding
process, and a Grid-Error-Patch (GEP) ray sampling strategy during training. By
design, our {\name} is better equipped to handle topologically changing regions
and thin structures while also ensuring accurate expression control when
animating avatars. Experiments conducted on three subjects from the Multiface
dataset demonstrate the effectiveness of our designs, outperforming previous
state-of-the-art methods, especially in handling challenging facial regions
- …