23 research outputs found
ShadowNet: A Secure and Efficient System for On-device Model Inference
With the increased usage of AI accelerators on mobile and edge devices,
on-device machine learning (ML) is gaining popularity. Consequently, thousands
of proprietary ML models are being deployed on billions of untrusted devices.
This raises serious security concerns about model privacy. However, protecting
the model privacy without losing access to the AI accelerators is a challenging
problem. In this paper, we present a novel on-device model inference system,
ShadowNet. ShadowNet protects the model privacy with Trusted Execution
Environment (TEE) while securely outsourcing the heavy linear layers of the
model to the untrusted hardware accelerators. ShadowNet achieves this by
transforming the weights of the linear layers before outsourcing them and
restoring the results inside the TEE. The nonlinear layers are also kept secure
inside the TEE. The transformation of the weights and the restoration of the
results are designed in a way that can be implemented efficiently. We have
built a ShadowNet prototype based on TensorFlow Lite and applied it on four
popular CNNs, namely, MobileNets, ResNet-44, AlexNet and MiniVGG. Our
evaluation shows that ShadowNet achieves strong security guarantees with
reasonable performance, offering a practical solution for secure on-device
model inference.Comment: single column, 21 pages (29 pages include appendix), 12 figure
DeS3: Attention-driven Self and Soft Shadow Removal using ViT Similarity and Color Convergence
Removing soft and self shadows that lack clear boundaries from a single image
is still challenging. Self shadows are shadows that are cast on the object
itself. Most existing methods rely on binary shadow masks, without considering
the ambiguous boundaries of soft and self shadows. In this paper, we present
DeS3, a method that removes hard, soft and self shadows based on the self-tuned
ViT feature similarity and color convergence. Our novel ViT similarity loss
utilizes features extracted from a pre-trained Vision Transformer. This loss
helps guide the reverse diffusion process towards recovering scene structures.
We also introduce a color convergence loss to constrain the surface colors in
the reverse inference process to avoid any color shifts. Our DeS3 is able to
differentiate shadow regions from the underlying objects, as well as shadow
regions from the object casting the shadow. This capability enables DeS3 to
better recover the structures of objects even when they are partially occluded
by shadows. Different from existing methods that rely on constraints during the
training phase, we incorporate the ViT similarity and color convergence loss
during the sampling stage. This enables our DeS3 model to effectively integrate
its strong modeling capabilities with input-specific knowledge in a self-tuned
manner. Our method outperforms state-of-the-art methods on the SRD, AISTD,
LRSS, USR and UIUC datasets, removing hard, soft, and self shadows robustly.
Specifically, our method outperforms the SOTA method by 20% of the RMSE of the
whole image on the SRD dataset
ShaDocFormer: A Shadow-attentive Threshold Detector with Cascaded Fusion Refiner for document shadow removal
Document shadow is a common issue that arise when capturing documents using
mobile devices, which significantly impacts the readability. Current methods
encounter various challenges including inaccurate detection of shadow masks and
estimation of illumination. In this paper, we propose ShaDocFormer, a
Transformer-based architecture that integrates traditional methodologies and
deep learning techniques to tackle the problem of document shadow removal. The
ShaDocFormer architecture comprises two components: the Shadow-attentive
Threshold Detector (STD) and the Cascaded Fusion Refiner (CFR). The STD module
employs a traditional thresholding technique and leverages the attention
mechanism of the Transformer to gather global information, thereby enabling
precise detection of shadow masks. The cascaded and aggregative structure of
the CFR module facilitates a coarse-to-fine restoration process for the entire
image. As a result, ShaDocFormer excels in accurately detecting and capturing
variations in both shadow and illumination, thereby enabling effective removal
of shadows. Extensive experiments demonstrate that ShaDocFormer outperforms
current state-of-the-art methods in both qualitative and quantitative
measurements
Synthetic image generation and the use of virtual environments for image enhancement tasks
Deep learning networks are often difficult to train if there are insufficient image samples. Gathering real-world images tailored for a specific job takes a lot of work to perform. This dissertation explores techniques for synthetic image generation and virtual environments for various image enhancement/ correction/restoration tasks, specifically distortion correction, dehazing, shadow removal, and intrinsic image decomposition. First, given various image formation equations, such as those used in distortion correction and dehazing, synthetic image samples can be produced, provided that the equation is well-posed. Second, using virtual environments to train various image models is applicable for simulating real-world effects that are otherwise difficult to gather or replicate, such as dehazing and shadow removal. Given synthetic images, one cannot train a network directly on it as there is a possible gap between the synthetic and real domains. We have devised several techniques for generating synthetic images and formulated domain adaptation methods where our trained deep-learning networks perform competitively in distortion correction, dehazing, and shadow removal. Additional studies and directions are provided for the intrinsic image decomposition problem and the exploration of procedural content generation, where a virtual Philippine city was created as an initial prototype.
Keywords: image generation, image correction, image dehazing, shadow removal, intrinsic image decomposition, computer graphics, rendering, machine learning, neural networks, domain adaptation, procedural content generation