106 research outputs found
Real-time Streaming Video Denoising with Bidirectional Buffers
Video streams are delivered continuously to save the cost of storage and
device memory. Real-time denoising algorithms are typically adopted on the user
device to remove the noise involved during the shooting and transmission of
video streams. However, sliding-window-based methods feed multiple input frames
for a single output and lack computation efficiency. Recent multi-output
inference works propagate the bidirectional temporal feature with a parallel or
recurrent framework, which either suffers from performance drops on the
temporal edges of clips or can not achieve online inference. In this paper, we
propose a Bidirectional Streaming Video Denoising (BSVD) framework, to achieve
high-fidelity real-time denoising for streaming videos with both past and
future temporal receptive fields. The bidirectional temporal fusion for online
inference is considered not applicable in the MoViNet. However, we introduce a
novel Bidirectional Buffer Block as the core module of our BSVD, which makes it
possible during our pipeline-style inference. In addition, our method is
concise and flexible to be utilized in both non-blind and blind video
denoising. We compare our model with various state-of-the-art video denoising
models qualitatively and quantitatively on synthetic and real noise. Our method
outperforms previous methods in terms of restoration fidelity and runtime. Our
source code is publicly available at https://github.com/ChenyangQiQi/BSVDComment: Accepted to ACM MM 2022; Github link:
https://github.com/ChenyangQiQi/BSVD
Deep Reinforcement Learning for Resource Management in Network Slicing
Network slicing is born as an emerging business to operators, by allowing
them to sell the customized slices to various tenants at different prices. In
order to provide better-performing and cost-efficient services, network slicing
involves challenging technical issues and urgently looks forward to intelligent
innovations to make the resource management consistent with users' activities
per slice. In that regard, deep reinforcement learning (DRL), which focuses on
how to interact with the environment by trying alternative actions and
reinforcing the tendency actions producing more rewarding consequences, is
assumed to be a promising solution. In this paper, after briefly reviewing the
fundamental concepts of DRL, we investigate the application of DRL in solving
some typical resource management for network slicing scenarios, which include
radio resource slicing and priority-based core network slicing, and demonstrate
the advantage of DRL over several competing schemes through extensive
simulations. Finally, we also discuss the possible challenges to apply DRL in
network slicing from a general perspective.Comment: The manuscript has been accepted by IEEE Access in Nov. 201
Progressive Scale-aware Network for Remote sensing Image Change Captioning
Remote sensing (RS) images contain numerous objects of different scales,
which poses significant challenges for the RS image change captioning (RSICC)
task to identify visual changes of interest in complex scenes and describe them
via language. However, current methods still have some weaknesses in
sufficiently extracting and utilizing multi-scale information. In this paper,
we propose a progressive scale-aware network (PSNet) to address the problem.
PSNet is a pure Transformer-based model. To sufficiently extract multi-scale
visual features, multiple progressive difference perception (PDP) layers are
stacked to progressively exploit the differencing features of bitemporal
features. To sufficiently utilize the extracted multi-scale features for
captioning, we propose a scale-aware reinforcement (SR) module and combine it
with the Transformer decoding layer to progressively utilize the features from
different PDP layers. Experiments show that the PDP layer and SR module are
effective and our PSNet outperforms previous methods. Our code is public at
https://github.com/Chen-Yang-Liu/PSNetComment: IGARSS 2023 - 2023 IEEE International Geoscience and Remote Sensing
Symposiu
Implicit Ray-Transformers for Multi-view Remote Sensing Image Segmentation
The mainstream CNN-based remote sensing (RS) image semantic segmentation
approaches typically rely on massive labeled training data. Such a paradigm
struggles with the problem of RS multi-view scene segmentation with limited
labeled views due to the lack of considering 3D information within the scene.
In this paper, we propose ''Implicit Ray-Transformer (IRT)'' based on Implicit
Neural Representation (INR), for RS scene semantic segmentation with sparse
labels (such as 4-6 labels per 100 images). We explore a new way of introducing
multi-view 3D structure priors to the task for accurate and view-consistent
semantic segmentation. The proposed method includes a two-stage learning
process. In the first stage, we optimize a neural field to encode the color and
3D structure of the remote sensing scene based on multi-view images. In the
second stage, we design a Ray Transformer to leverage the relations between the
neural field 3D features and 2D texture features for learning better semantic
representations. Different from previous methods that only consider 3D prior or
2D features, we incorporate additional 2D texture information and 3D prior by
broadcasting CNN features to different point features along the sampled ray. To
verify the effectiveness of the proposed method, we construct a challenging
dataset containing six synthetic sub-datasets collected from the Carla platform
and three real sub-datasets from Google Maps. Experiments show that the
proposed method outperforms the CNN-based methods and the state-of-the-art
INR-based segmentation methods in quantitative and qualitative metrics
HyperThumbnail: Real-time 6K Image Rescaling with Rate-distortion Optimization
Contemporary image rescaling aims at embedding a high-resolution (HR) image
into a low-resolution (LR) thumbnail image that contains embedded information
for HR image reconstruction. Unlike traditional image super-resolution, this
enables high-fidelity HR image restoration faithful to the original one, given
the embedded information in the LR thumbnail. However, state-of-the-art image
rescaling methods do not optimize the LR image file size for efficient sharing
and fall short of real-time performance for ultra-high-resolution (e.g., 6K)
image reconstruction. To address these two challenges, we propose a novel
framework (HyperThumbnail) for real-time 6K rate-distortion-aware image
rescaling. Our framework first embeds an HR image into a JPEG LR thumbnail by
an encoder with our proposed quantization prediction module, which minimizes
the file size of the embedding LR JPEG thumbnail while maximizing HR
reconstruction quality. Then, an efficient frequency-aware decoder reconstructs
a high-fidelity HR image from the LR one in real time. Extensive experiments
demonstrate that our framework outperforms previous image rescaling baselines
in rate-distortion performance and can perform 6K image reconstruction in real
time.Comment: Accepted by CVPR 2023; Github Repository:
https://github.com/AbnerVictor/HyperThumbnai
AnimateZero: Video Diffusion Models are Zero-Shot Image Animators
Large-scale text-to-video (T2V) diffusion models have great progress in
recent years in terms of visual quality, motion and temporal consistency.
However, the generation process is still a black box, where all attributes
(e.g., appearance, motion) are learned and generated jointly without precise
control ability other than rough text descriptions. Inspired by image animation
which decouples the video as one specific appearance with the corresponding
motion, we propose AnimateZero to unveil the pre-trained text-to-video
diffusion model, i.e., AnimateDiff, and provide more precise appearance and
motion control abilities for it. For appearance control, we borrow intermediate
latents and their features from the text-to-image (T2I) generation for ensuring
the generated first frame is equal to the given generated image. For temporal
control, we replace the global temporal attention of the original T2V model
with our proposed positional-corrected window attention to ensure other frames
align with the first frame well. Empowered by the proposed methods, AnimateZero
can successfully control the generating progress without further training. As a
zero-shot image animator for given images, AnimateZero also enables multiple
new applications, including interactive video generation and real image
animation. The detailed experiments demonstrate the effectiveness of the
proposed method in both T2V and related applications.Comment: Project Page: https://vvictoryuki.github.io/animatezero.github.io
VPA mediates bidirectional regulation of cell cycle progression through the PPP2R2A-Chk1 signaling axis in response to HU
Cell cycle checkpoint kinases play a pivotal role in protecting against replicative stress. In this study, valproic acid (VPA), a histone deacetylase inhibitor (HDACi), was found to promote breast cancer MCF-7 cells to traverse into G2/M phase for catastrophic injury by promoting PPP2R2A (the B-regulatory subunit of Phosphatase PP2A) to facilitate the dephosphorylation of Chk1 at Ser317 and Ser345. By contrast, VPA protected normal 16HBE cells from HU toxicity through decreasing PPP2R2A expression and increasing Chk1 phosphorylation. The effect of VPA on PPP2R2A was at the post-transcription level through HDAC1/2. The in vitro results were affirmed in vivo. Patients with lower PPP2R2A expression and higher pChk1 expression showed significantly worse survival. PPP2R2A D197 and N181 are essential for PPP2R2A-Chk1 signaling and VPA-mediated bidirectional effect on augmenting HU-induced tumor cell death and protecting normal cells
DAMO-StreamNet: Optimizing Streaming Perception in Autonomous Driving
Real-time perception, or streaming perception, is a crucial aspect of
autonomous driving that has yet to be thoroughly explored in existing research.
To address this gap, we present DAMO-StreamNet, an optimized framework that
combines recent advances from the YOLO series with a comprehensive analysis of
spatial and temporal perception mechanisms, delivering a cutting-edge solution.
The key innovations of DAMO-StreamNet are: (1) A robust neck structure
incorporating deformable convolution, enhancing the receptive field and feature
alignment capabilities. (2) A dual-branch structure that integrates short-path
semantic features and long-path temporal features, improving motion state
prediction accuracy. (3) Logits-level distillation for efficient optimization,
aligning the logits of teacher and student networks in semantic space. (4) A
real-time forecasting mechanism that updates support frame features with the
current frame, ensuring seamless streaming perception during inference. Our
experiments demonstrate that DAMO-StreamNet surpasses existing state-of-the-art
methods, achieving 37.8% (normal size (600, 960)) and 43.3% (large size (1200,
1920)) sAP without using extra data. This work not only sets a new benchmark
for real-time perception but also provides valuable insights for future
research. Additionally, DAMO-StreamNet can be applied to various autonomous
systems, such as drones and robots, paving the way for real-time perception.
The code is available at https://github.com/zhiqic/DAMO-StreamNet
MetaPortrait: Identity-Preserving Talking Head Generation with Fast Personalized Adaptation
In this work, we propose an ID-preserving talking head generation framework,
which advances previous methods in two aspects. First, as opposed to
interpolating from sparse flow, we claim that dense landmarks are crucial to
achieving accurate geometry-aware flow fields. Second, inspired by
face-swapping methods, we adaptively fuse the source identity during synthesis,
so that the network better preserves the key characteristics of the image
portrait. Although the proposed model surpasses prior generation fidelity on
established benchmarks, to further make the talking head generation qualified
for real usage, personalized fine-tuning is usually needed. However, this
process is rather computationally demanding that is unaffordable to standard
users. To solve this, we propose a fast adaptation model using a meta-learning
approach. The learned model can be adapted to a high-quality personalized model
as fast as 30 seconds. Last but not the least, a spatial-temporal enhancement
module is proposed to improve the fine details while ensuring temporal
coherency. Extensive experiments prove the significant superiority of our
approach over the state of the arts in both one-shot and personalized settings.Comment: CVPR 2023, project page: https://meta-portrait.github.i
- …