241 research outputs found

    Super-reflection and Cloaking Based on Zero Index Metamaterial

    Full text link
    A zero index metamaterial (ZIM) can be utilized to block wave (super-reflection) or conceal objects completely (cloaking). The "super-reflection" device is realized by a ZIM with a perfect electric (magnetic) conductor inclusion of arbitrary shape and size for a transverse electric (magnetic) incident wave. In contrast, a ZIM with a perfect magnetic (electric) conductor inclusion for a transverse electric (magnetic) incident wave can be used to conceal objects of arbitrary shape. The underlying physics here is determined by the intrinsic properties of the ZIM

    CIF-T: A Novel CIF-based Transducer Architecture for Automatic Speech Recognition

    Full text link
    RNN-T models are widely used in ASR, which rely on the RNN-T loss to achieve length alignment between input audio and target sequence. However, the implementation complexity and the alignment-based optimization target of RNN-T loss lead to computational redundancy and a reduced role for predictor network, respectively. In this paper, we propose a novel model named CIF-Transducer (CIF-T) which incorporates the Continuous Integrate-and-Fire (CIF) mechanism with the RNN-T model to achieve efficient alignment. In this way, the RNN-T loss is abandoned, thus bringing a computational reduction and allowing the predictor network a more significant role. We also introduce Funnel-CIF, Context Blocks, Unified Gating and Bilinear Pooling joint network, and auxiliary training strategy to further improve performance. Experiments on the 178-hour AISHELL-1 and 10000-hour WenetSpeech datasets show that CIF-T achieves state-of-the-art results with lower computational overhead compared to RNN-T models.Comment: Accepted by ICASSP 202

    Learning Raw Image Denoising with Bayer Pattern Unification and Bayer Preserving Augmentation

    Full text link
    In this paper, we present new data pre-processing and augmentation techniques for DNN-based raw image denoising. Compared with traditional RGB image denoising, performing this task on direct camera sensor readings presents new challenges such as how to effectively handle various Bayer patterns from different data sources, and subsequently how to perform valid data augmentation with raw images. To address the first problem, we propose a Bayer pattern unification (BayerUnify) method to unify different Bayer patterns. This allows us to fully utilize a heterogeneous dataset to train a single denoising model instead of training one model for each pattern. Furthermore, while it is essential to augment the dataset to improve model generalization and performance, we discovered that it is error-prone to modify raw images by adapting augmentation methods designed for RGB images. Towards this end, we present a Bayer preserving augmentation (BayerAug) method as an effective approach for raw image augmentation. Combining these data processing technqiues with a modified U-Net, our method achieves a PSNR of 52.11 and a SSIM of 0.9969 in NTIRE 2019 Real Image Denoising Challenge, demonstrating the state-of-the-art performance. Our code is available at https://github.com/Jiaming-Liu/BayerUnifyAug.Comment: Accepted by CVPRW 201

    Nowhere to Go: Benchmarking Multi-robot Collaboration in Target Trapping Environment

    Full text link
    Collaboration is one of the most important factors in multi-robot systems. Considering certain real-world applications and to further promote its development, we propose a new benchmark to evaluate multi-robot collaboration in Target Trapping Environment (T2E). In T2E, two kinds of robots (called captor robot and target robot) share the same space. The captors aim to catch the target collaboratively, while the target will try to escape from the trap. Both the trapping and escaping process can use the environment layout to help achieve the corresponding objective, which requires high collaboration between robots and the utilization of the environment. For the benchmark, we present and evaluate multiple learning-based baselines in T2E, and provide insights into regimes of multi-robot collaboration. We also make our benchmark publicly available and encourage researchers from related robotics disciplines to propose, evaluate, and compare their solutions in this benchmark. Our project is released at https://github.com/Dr-Xiaogaren/T2E

    Bi-Mapper: Holistic BEV Semantic Mapping for Autonomous Driving

    Full text link
    A semantic map of the road scene, covering fundamental road elements, is an essential ingredient in autonomous driving systems. It provides important perception foundations for positioning and planning when rendered in the Bird's-Eye-View (BEV). Currently, the prior knowledge of hypothetical depth can guide the learning of translating front perspective views into BEV directly with the help of calibration parameters. However, it suffers from geometric distortions in the representation of distant objects. In addition, another stream of methods without prior knowledge can learn the transformation between front perspective views and BEV implicitly with a global view. Considering that the fusion of different learning methods may bring surprising beneficial effects, we propose a Bi-Mapper framework for top-down road-scene semantic understanding, which incorporates a global view and local prior knowledge. To enhance reliable interaction between them, an asynchronous mutual learning strategy is proposed. At the same time, an Across-Space Loss (ASL) is designed to mitigate the negative impact of geometric distortions. Extensive results on nuScenes and Cam2BEV datasets verify the consistent effectiveness of each module in the proposed Bi-Mapper framework. Compared with exiting road mapping networks, the proposed Bi-Mapper achieves 2.1% higher IoU on the nuScenes dataset. Moreover, we verify the generalization performance of Bi-Mapper in a real-world driving scenario. The source code is publicly available at https://github.com/lynn-yu/Bi-Mapper.Comment: Accepted to IEEE Robotics and Automation Letters (RA-L). The source code is publicly available at https://github.com/lynn-yu/Bi-Mappe

    Computational Optics Meet Domain Adaptation: Transferring Semantic Segmentation Beyond Aberrations

    Full text link
    Semantic scene understanding with Minimalist Optical Systems (MOS) in mobile and wearable applications remains a challenge due to the corrupted imaging quality induced by optical aberrations. However, previous works only focus on improving the subjective imaging quality through computational optics, i.e. Computational Imaging (CI) technique, ignoring the feasibility in semantic segmentation. In this paper, we pioneer to investigate Semantic Segmentation under Optical Aberrations (SSOA) of MOS. To benchmark SSOA, we construct Virtual Prototype Lens (VPL) groups through optical simulation, generating Cityscapes-ab and KITTI-360-ab datasets under different behaviors and levels of aberrations. We look into SSOA via an unsupervised domain adaptation perspective to address the scarcity of labeled aberration data in real-world scenarios. Further, we propose Computational Imaging Assisted Domain Adaptation (CIADA) to leverage prior knowledge of CI for robust performance in SSOA. Based on our benchmark, we conduct experiments on the robustness of state-of-the-art segmenters against aberrations. In addition, extensive evaluations of possible solutions to SSOA reveal that CIADA achieves superior performance under all aberration distributions, paving the way for the applications of MOS in semantic scene understanding. Code and dataset will be made publicly available at https://github.com/zju-jiangqi/CIADA.Comment: Code and dataset will be made publicly available at https://github.com/zju-jiangqi/CIAD

    kTrans: Knowledge-Aware Transformer for Binary Code Embedding

    Full text link
    Binary Code Embedding (BCE) has important applications in various reverse engineering tasks such as binary code similarity detection, type recovery, control-flow recovery and data-flow analysis. Recent studies have shown that the Transformer model can comprehend the semantics of binary code to support downstream tasks. However, existing models overlooked the prior knowledge of assembly language. In this paper, we propose a novel Transformer-based approach, namely kTrans, to generate knowledge-aware binary code embedding. By feeding explicit knowledge as additional inputs to the Transformer, and fusing implicit knowledge with a novel pre-training task, kTrans provides a new perspective to incorporating domain knowledge into a Transformer framework. We inspect the generated embeddings with outlier detection and visualization, and also apply kTrans to 3 downstream tasks: Binary Code Similarity Detection (BCSD), Function Type Recovery (FTR) and Indirect Call Recognition (ICR). Evaluation results show that kTrans can generate high-quality binary code embeddings, and outperforms state-of-the-art (SOTA) approaches on downstream tasks by 5.2%, 6.8%, and 12.6% respectively. kTrans is publicly available at: https://github.com/Learner0x5a/kTrans-releas

    Less is More: Towards Efficient Few-shot 3D Semantic Segmentation via Training-free Networks

    Full text link
    To reduce the reliance on large-scale datasets, recent works in 3D segmentation resort to few-shot learning. Current 3D few-shot semantic segmentation methods first pre-train the models on `seen' classes, and then evaluate their generalization performance on `unseen' classes. However, the prior pre-training stage not only introduces excessive time overhead, but also incurs a significant domain gap on `unseen' classes. To tackle these issues, we propose an efficient Training-free Few-shot 3D Segmentation netwrok, TFS3D, and a further training-based variant, TFS3D-T. Without any learnable parameters, TFS3D extracts dense representations by trigonometric positional encodings, and achieves comparable performance to previous training-based methods. Due to the elimination of pre-training, TFS3D can alleviate the domain gap issue and save a substantial amount of time. Building upon TFS3D, TFS3D-T only requires to train a lightweight query-support transferring attention (QUEST), which enhances the interaction between the few-shot query and support data. Experiments demonstrate TFS3D-T improves previous state-of-the-art methods by +6.93% and +17.96% mIoU respectively on S3DIS and ScanNet, while reducing the training time by -90%, indicating superior effectiveness and efficiency.Comment: Code is available at https://github.com/yangyangyang127/TFS3
    corecore