822 research outputs found
Architecture and Circuit Design Optimization for Compute-In-Memory
The objective of the proposed research is to optimize computing-in-memory (CIM) design for accelerating Deep Neural Network (DNN) algorithms. As compute peripheries such as analog-to-digital converter (ADC) introduce significant overhead in CIM inference design, the research first focuses on the circuit optimization for inference acceleration and proposes a resistive random access memory (RRAM) based ADC-free in-memory compute scheme. We comprehensively explore the trade-offs involving different types of ADCs and investigate a new ADC design especially suited for the CIM, which performs the analog shift-add for multiple weight significance bits, improving the throughput and energy efficiency under similar area constraints. Furthermore, we prototype an ADC-free CIM inference chip design with a fully-analog data processing manner between sub-arrays, which can significantly improve the hardware performance over the conventional CIM designs and achieve near-software classification accuracy on ImageNet and CIFAR-10/-100 dataset. Secondly, the research focuses on hardware support for CIM on-chip training. To maximize hardware reuse of CIM weight stationary dataflow, we propose the CIM training architectures with the transpose weight mapping strategy. The cell design and periphery circuitry are modified to efficiently support bi-directional compute. A novel solution of signed number multiplication is also proposed to handle the negative input in backpropagation. Finally, we propose an SRAM-based CIM training architecture and comprehensively explore the system-level hardware performance for DNN on-chip training based on silicon measurement results.Ph.D
FaceDiffuser: Speech-Driven 3D Facial Animation Synthesis Using Diffusion
Speech-driven 3D facial animation synthesis has been a challenging task both
in industry and research. Recent methods mostly focus on deterministic deep
learning methods meaning that given a speech input, the output is always the
same. However, in reality, the non-verbal facial cues that reside throughout
the face are non-deterministic in nature. In addition, majority of the
approaches focus on 3D vertex based datasets and methods that are compatible
with existing facial animation pipelines with rigged characters is scarce. To
eliminate these issues, we present FaceDiffuser, a non-deterministic deep
learning model to generate speech-driven facial animations that is trained with
both 3D vertex and blendshape based datasets. Our method is based on the
diffusion technique and uses the pre-trained large speech representation model
HuBERT to encode the audio input. To the best of our knowledge, we are the
first to employ the diffusion method for the task of speech-driven 3D facial
animation synthesis. We have run extensive objective and subjective analyses
and show that our approach achieves better or comparable results in comparison
to the state-of-the-art methods. We also introduce a new in-house dataset that
is based on a blendshape based rigged character. We recommend watching the
accompanying supplementary video. The code and the dataset will be publicly
available.Comment: Pre-print of the paper accepted at ACM SIGGRAPH MIG 202
FLARE: Fast Learning of Animatable and Relightable Mesh Avatars
Our goal is to efficiently learn personalized animatable 3D head avatars from
videos that are geometrically accurate, realistic, relightable, and compatible
with current rendering systems. While 3D meshes enable efficient processing and
are highly portable, they lack realism in terms of shape and appearance. Neural
representations, on the other hand, are realistic but lack compatibility and
are slow to train and render. Our key insight is that it is possible to
efficiently learn high-fidelity 3D mesh representations via differentiable
rendering by exploiting highly-optimized methods from traditional computer
graphics and approximating some of the components with neural networks. To that
end, we introduce FLARE, a technique that enables the creation of animatable
and relightable mesh avatars from a single monocular video. First, we learn a
canonical geometry using a mesh representation, enabling efficient
differentiable rasterization and straightforward animation via learned
blendshapes and linear blend skinning weights. Second, we follow
physically-based rendering and factor observed colors into intrinsic albedo,
roughness, and a neural representation of the illumination, allowing the
learned avatars to be relit in novel scenes. Since our input videos are
captured on a single device with a narrow field of view, modeling the
surrounding environment light is non-trivial. Based on the split-sum
approximation for modeling specular reflections, we address this by
approximating the pre-filtered environment map with a multi-layer perceptron
(MLP) modulated by the surface roughness, eliminating the need to explicitly
model the light. We demonstrate that our mesh-based avatar formulation,
combined with learned deformation, material, and lighting MLPs, produces
avatars with high-quality geometry and appearance, while also being efficient
to train and render compared to existing approaches.Comment: 15 pages, Accepted: ACM Transactions on Graphics (Proceedings of
SIGGRAPH Asia), 202
Towards Effective Adversarial Textured 3D Meshes on Physical Face Recognition
Face recognition is a prevailing authentication solution in numerous
biometric applications. Physical adversarial attacks, as an important
surrogate, can identify the weaknesses of face recognition systems and evaluate
their robustness before deployed. However, most existing physical attacks are
either detectable readily or ineffective against commercial recognition
systems. The goal of this work is to develop a more reliable technique that can
carry out an end-to-end evaluation of adversarial robustness for commercial
systems. It requires that this technique can simultaneously deceive black-box
recognition models and evade defensive mechanisms. To fulfill this, we design
adversarial textured 3D meshes (AT3D) with an elaborate topology on a human
face, which can be 3D-printed and pasted on the attacker's face to evade the
defenses. However, the mesh-based optimization regime calculates gradients in
high-dimensional mesh space, and can be trapped into local optima with
unsatisfactory transferability. To deviate from the mesh-based space, we
propose to perturb the low-dimensional coefficient space based on 3D Morphable
Model, which significantly improves black-box transferability meanwhile
enjoying faster search efficiency and better visual quality. Extensive
experiments in digital and physical scenarios show that our method effectively
explores the security vulnerabilities of multiple popular commercial services,
including three recognition APIs, four anti-spoofing APIs, two prevailing
mobile phones and two automated access control systems
RenderMe-360: A Large Digital Asset Library and Benchmarks Towards High-fidelity Head Avatars
Synthesizing high-fidelity head avatars is a central problem for computer
vision and graphics. While head avatar synthesis algorithms have advanced
rapidly, the best ones still face great obstacles in real-world scenarios. One
of the vital causes is inadequate datasets -- 1) current public datasets can
only support researchers to explore high-fidelity head avatars in one or two
task directions; 2) these datasets usually contain digital head assets with
limited data volume, and narrow distribution over different attributes. In this
paper, we present RenderMe-360, a comprehensive 4D human head dataset to drive
advance in head avatar research. It contains massive data assets, with 243+
million complete head frames, and over 800k video sequences from 500 different
identities captured by synchronized multi-view cameras at 30 FPS. It is a
large-scale digital library for head avatars with three key attributes: 1) High
Fidelity: all subjects are captured by 60 synchronized, high-resolution 2K
cameras in 360 degrees. 2) High Diversity: The collected subjects vary from
different ages, eras, ethnicities, and cultures, providing abundant materials
with distinctive styles in appearance and geometry. Moreover, each subject is
asked to perform various motions, such as expressions and head rotations, which
further extend the richness of assets. 3) Rich Annotations: we provide
annotations with different granularities: cameras' parameters, matting, scan,
2D/3D facial landmarks, FLAME fitting, and text description.
Based on the dataset, we build a comprehensive benchmark for head avatar
research, with 16 state-of-the-art methods performed on five main tasks: novel
view synthesis, novel expression synthesis, hair rendering, hair editing, and
talking head generation. Our experiments uncover the strengths and weaknesses
of current methods. RenderMe-360 opens the door for future exploration in head
avatars.Comment: Technical Report; Project Page: 36; Github Link:
https://github.com/RenderMe-360/RenderMe-36
Enabling Neuromorphic Computing for Artificial Intelligence with Hardware-Software Co-Design
In the last decade, neuromorphic computing was rebirthed with the emergence of novel nano-devices and hardware-software co-design approaches. With the fast advancement in algorithms for today’s artificial intelligence (AI) applications, deep neural networks (DNNs) have become the mainstream technology. It has been a new research trend to enable neuromorphic designs for DNNs computing with high computing efficiency in speed and energy. In this chapter, we will summarize the recent advances in neuromorphic computing hardware and system designs with non-volatile resistive access memory (ReRAM) devices. More specifically, we will discuss the ReRAM-based neuromorphic computing hardware and system implementations, hardware-software co-design approaches for quantized and sparse DNNs, and architecture designs
Modeling Reliance on XAI Indicating Its Purpose and Attention
This study used XAI, which shows its purposes and attention as explanations
of its process, and investigated how these explanations affect human trust in
and use of AI. In this study, we generated heat maps indicating AI attention,
conducted Experiment 1 to confirm the validity of the interpretability of the
heat maps, and conducted Experiment 2 to investigate the effects of the purpose
and heat maps in terms of reliance (depending on AI) and compliance (accepting
answers of AI). The results of structural equation modeling (SEM) analyses
showed that (1) displaying the purpose of AI positively and negatively
influenced trust depending on the types of AI usage, reliance or compliance,
and task difficulty, (2) just displaying the heat maps negatively influenced
trust in a more difficult task, and (3) the heat maps positively influenced
trust according to their interpretability in a more difficult task
MaLP: Manipulation Localization Using a Proactive Scheme
Advancements in the generation quality of various Generative Models (GMs) has
made it necessary to not only perform binary manipulation detection but also
localize the modified pixels in an image. However, prior works termed as
passive for manipulation localization exhibit poor generalization performance
over unseen GMs and attribute modifications. To combat this issue, we propose a
proactive scheme for manipulation localization, termed MaLP. We encrypt the
real images by adding a learned template. If the image is manipulated by any
GM, this added protection from the template not only aids binary detection but
also helps in identifying the pixels modified by the GM. The template is
learned by leveraging local and global-level features estimated by a two-branch
architecture. We show that MaLP performs better than prior passive works. We
also show the generalizability of MaLP by testing on 22 different GMs,
providing a benchmark for future research on manipulation localization.
Finally, we show that MaLP can be used as a discriminator for improving the
generation quality of GMs. Our models/codes are available at
www.github.com/vishal3477/pro_loc.Comment: Published at Conference on Computer Vision and Pattern Recognition
202
PhoMoH: Implicit Photorealistic 3D Models of Human Heads
We present PhoMoH, a neural network methodology to construct generative
models of photo-realistic 3D geometry and appearance of human heads including
hair, beards, an oral cavity, and clothing. In contrast to prior work, PhoMoH
models the human head using neural fields, thus supporting complex topology.
Instead of learning a head model from scratch, we propose to augment an
existing expressive head model with new features. Concretely, we learn a highly
detailed geometry network layered on top of a mid-resolution head model
together with a detailed, local geometry-aware, and disentangled color field.
Our proposed architecture allows us to learn photo-realistic human head models
from relatively little data. The learned generative geometry and appearance
networks can be sampled individually and enable the creation of diverse and
realistic human heads. Extensive experiments validate our method qualitatively
and across different metrics.Comment: To be published at the International Conference on 3D Vision 202
Training and Tuning Generative Neural Radiance Fields for Attribute-Conditional 3D-Aware Face Generation
Generative Neural Radiance Fields (GNeRF) based 3D-aware GANs have
demonstrated remarkable capabilities in generating high-quality images while
maintaining strong 3D consistency. Notably, significant advancements have been
made in the domain of face generation. However, most existing models prioritize
view consistency over disentanglement, resulting in limited semantic/attribute
control during generation. To address this limitation, we propose a conditional
GNeRF model incorporating specific attribute labels as input to enhance the
controllability and disentanglement abilities of 3D-aware generative models.
Our approach builds upon a pre-trained 3D-aware face model, and we introduce a
Training as Init and Optimizing for Tuning (TRIOT) method to train a
conditional normalized flow module to enable the facial attribute editing, then
optimize the latent vector to improve attribute-editing precision further. Our
extensive experiments demonstrate that our model produces high-quality edits
with superior view consistency while preserving non-target regions. Code is
available at https://github.com/zhangqianhui/TT-GNeRF.Comment: 13 page
- …