241 research outputs found
Super-reflection and Cloaking Based on Zero Index Metamaterial
A zero index metamaterial (ZIM) can be utilized to block wave
(super-reflection) or conceal objects completely (cloaking). The
"super-reflection" device is realized by a ZIM with a perfect electric
(magnetic) conductor inclusion of arbitrary shape and size for a transverse
electric (magnetic) incident wave. In contrast, a ZIM with a perfect magnetic
(electric) conductor inclusion for a transverse electric (magnetic) incident
wave can be used to conceal objects of arbitrary shape. The underlying physics
here is determined by the intrinsic properties of the ZIM
CIF-T: A Novel CIF-based Transducer Architecture for Automatic Speech Recognition
RNN-T models are widely used in ASR, which rely on the RNN-T loss to achieve
length alignment between input audio and target sequence. However, the
implementation complexity and the alignment-based optimization target of RNN-T
loss lead to computational redundancy and a reduced role for predictor network,
respectively. In this paper, we propose a novel model named CIF-Transducer
(CIF-T) which incorporates the Continuous Integrate-and-Fire (CIF) mechanism
with the RNN-T model to achieve efficient alignment. In this way, the RNN-T
loss is abandoned, thus bringing a computational reduction and allowing the
predictor network a more significant role. We also introduce Funnel-CIF,
Context Blocks, Unified Gating and Bilinear Pooling joint network, and
auxiliary training strategy to further improve performance. Experiments on the
178-hour AISHELL-1 and 10000-hour WenetSpeech datasets show that CIF-T achieves
state-of-the-art results with lower computational overhead compared to RNN-T
models.Comment: Accepted by ICASSP 202
Learning Raw Image Denoising with Bayer Pattern Unification and Bayer Preserving Augmentation
In this paper, we present new data pre-processing and augmentation techniques
for DNN-based raw image denoising. Compared with traditional RGB image
denoising, performing this task on direct camera sensor readings presents new
challenges such as how to effectively handle various Bayer patterns from
different data sources, and subsequently how to perform valid data augmentation
with raw images. To address the first problem, we propose a Bayer pattern
unification (BayerUnify) method to unify different Bayer patterns. This allows
us to fully utilize a heterogeneous dataset to train a single denoising model
instead of training one model for each pattern. Furthermore, while it is
essential to augment the dataset to improve model generalization and
performance, we discovered that it is error-prone to modify raw images by
adapting augmentation methods designed for RGB images. Towards this end, we
present a Bayer preserving augmentation (BayerAug) method as an effective
approach for raw image augmentation. Combining these data processing technqiues
with a modified U-Net, our method achieves a PSNR of 52.11 and a SSIM of 0.9969
in NTIRE 2019 Real Image Denoising Challenge, demonstrating the
state-of-the-art performance. Our code is available at
https://github.com/Jiaming-Liu/BayerUnifyAug.Comment: Accepted by CVPRW 201
Nowhere to Go: Benchmarking Multi-robot Collaboration in Target Trapping Environment
Collaboration is one of the most important factors in multi-robot systems.
Considering certain real-world applications and to further promote its
development, we propose a new benchmark to evaluate multi-robot collaboration
in Target Trapping Environment (T2E). In T2E, two kinds of robots (called
captor robot and target robot) share the same space. The captors aim to catch
the target collaboratively, while the target will try to escape from the trap.
Both the trapping and escaping process can use the environment layout to help
achieve the corresponding objective, which requires high collaboration between
robots and the utilization of the environment. For the benchmark, we present
and evaluate multiple learning-based baselines in T2E, and provide insights
into regimes of multi-robot collaboration. We also make our benchmark publicly
available and encourage researchers from related robotics disciplines to
propose, evaluate, and compare their solutions in this benchmark. Our project
is released at https://github.com/Dr-Xiaogaren/T2E
Bi-Mapper: Holistic BEV Semantic Mapping for Autonomous Driving
A semantic map of the road scene, covering fundamental road elements, is an
essential ingredient in autonomous driving systems. It provides important
perception foundations for positioning and planning when rendered in the
Bird's-Eye-View (BEV). Currently, the prior knowledge of hypothetical depth can
guide the learning of translating front perspective views into BEV directly
with the help of calibration parameters. However, it suffers from geometric
distortions in the representation of distant objects. In addition, another
stream of methods without prior knowledge can learn the transformation between
front perspective views and BEV implicitly with a global view. Considering that
the fusion of different learning methods may bring surprising beneficial
effects, we propose a Bi-Mapper framework for top-down road-scene semantic
understanding, which incorporates a global view and local prior knowledge. To
enhance reliable interaction between them, an asynchronous mutual learning
strategy is proposed. At the same time, an Across-Space Loss (ASL) is designed
to mitigate the negative impact of geometric distortions. Extensive results on
nuScenes and Cam2BEV datasets verify the consistent effectiveness of each
module in the proposed Bi-Mapper framework. Compared with exiting road mapping
networks, the proposed Bi-Mapper achieves 2.1% higher IoU on the nuScenes
dataset. Moreover, we verify the generalization performance of Bi-Mapper in a
real-world driving scenario. The source code is publicly available at
https://github.com/lynn-yu/Bi-Mapper.Comment: Accepted to IEEE Robotics and Automation Letters (RA-L). The source
code is publicly available at https://github.com/lynn-yu/Bi-Mappe
Computational Optics Meet Domain Adaptation: Transferring Semantic Segmentation Beyond Aberrations
Semantic scene understanding with Minimalist Optical Systems (MOS) in mobile
and wearable applications remains a challenge due to the corrupted imaging
quality induced by optical aberrations. However, previous works only focus on
improving the subjective imaging quality through computational optics, i.e.
Computational Imaging (CI) technique, ignoring the feasibility in semantic
segmentation. In this paper, we pioneer to investigate Semantic Segmentation
under Optical Aberrations (SSOA) of MOS. To benchmark SSOA, we construct
Virtual Prototype Lens (VPL) groups through optical simulation, generating
Cityscapes-ab and KITTI-360-ab datasets under different behaviors and levels of
aberrations. We look into SSOA via an unsupervised domain adaptation
perspective to address the scarcity of labeled aberration data in real-world
scenarios. Further, we propose Computational Imaging Assisted Domain Adaptation
(CIADA) to leverage prior knowledge of CI for robust performance in SSOA. Based
on our benchmark, we conduct experiments on the robustness of state-of-the-art
segmenters against aberrations. In addition, extensive evaluations of possible
solutions to SSOA reveal that CIADA achieves superior performance under all
aberration distributions, paving the way for the applications of MOS in
semantic scene understanding. Code and dataset will be made publicly available
at https://github.com/zju-jiangqi/CIADA.Comment: Code and dataset will be made publicly available at
https://github.com/zju-jiangqi/CIAD
kTrans: Knowledge-Aware Transformer for Binary Code Embedding
Binary Code Embedding (BCE) has important applications in various reverse
engineering tasks such as binary code similarity detection, type recovery,
control-flow recovery and data-flow analysis. Recent studies have shown that
the Transformer model can comprehend the semantics of binary code to support
downstream tasks. However, existing models overlooked the prior knowledge of
assembly language. In this paper, we propose a novel Transformer-based
approach, namely kTrans, to generate knowledge-aware binary code embedding. By
feeding explicit knowledge as additional inputs to the Transformer, and fusing
implicit knowledge with a novel pre-training task, kTrans provides a new
perspective to incorporating domain knowledge into a Transformer framework. We
inspect the generated embeddings with outlier detection and visualization, and
also apply kTrans to 3 downstream tasks: Binary Code Similarity Detection
(BCSD), Function Type Recovery (FTR) and Indirect Call Recognition (ICR).
Evaluation results show that kTrans can generate high-quality binary code
embeddings, and outperforms state-of-the-art (SOTA) approaches on downstream
tasks by 5.2%, 6.8%, and 12.6% respectively. kTrans is publicly available at:
https://github.com/Learner0x5a/kTrans-releas
Less is More: Towards Efficient Few-shot 3D Semantic Segmentation via Training-free Networks
To reduce the reliance on large-scale datasets, recent works in 3D
segmentation resort to few-shot learning. Current 3D few-shot semantic
segmentation methods first pre-train the models on `seen' classes, and then
evaluate their generalization performance on `unseen' classes. However, the
prior pre-training stage not only introduces excessive time overhead, but also
incurs a significant domain gap on `unseen' classes. To tackle these issues, we
propose an efficient Training-free Few-shot 3D Segmentation netwrok, TFS3D, and
a further training-based variant, TFS3D-T. Without any learnable parameters,
TFS3D extracts dense representations by trigonometric positional encodings, and
achieves comparable performance to previous training-based methods. Due to the
elimination of pre-training, TFS3D can alleviate the domain gap issue and save
a substantial amount of time. Building upon TFS3D, TFS3D-T only requires to
train a lightweight query-support transferring attention (QUEST), which
enhances the interaction between the few-shot query and support data.
Experiments demonstrate TFS3D-T improves previous state-of-the-art methods by
+6.93% and +17.96% mIoU respectively on S3DIS and ScanNet, while reducing the
training time by -90%, indicating superior effectiveness and efficiency.Comment: Code is available at https://github.com/yangyangyang127/TFS3
- …