185 research outputs found

    A THz Video SAR Imaging Algorithm Based on Chirp Scaling

    Full text link
    In video synthetic aperture radar (SAR) imaging mode, the polar format algorithm (PFA) is more computational effective than the backprojection algorithm (BPA). However, the two-dimensional (2-D) interpolation in PFA greatly affects its computational speed, which is detrimental to the real-time imaging of video SAR. In this paper, a terahertz (THz) video SAR imaging algorithm based on chirp scaling is proposed, which utilizes the small synthetic angular feature of THz SAR and the inherent property of linear frequency modulation. Then, two-step chirp scaling is used to replace the 2-D interpolation in the PFA to obtain a similar focusing effect, but with a faster operation. Point target simulation is used to verify the effectiveness of the proposed method.Comment: 5 pages, 7 figure

    Referring Camouflaged Object Detection

    Full text link
    In this paper, we consider the problem of referring camouflaged object detection (Ref-COD), a new task that aims to segment specified camouflaged objects based on some form of reference, e.g., image, text. We first assemble a large-scale dataset, called R2C7K, which consists of 7K images covering 64 object categories in real-world scenarios. Then, we develop a simple but strong dual-branch framework, dubbed R2CNet, with a reference branch learning common representations from the referring information and a segmentation branch identifying and segmenting camouflaged objects under the guidance of the common representations. In particular, we design a Referring Mask Generation module to generate pixel-level prior mask and a Referring Feature Enrichment module to enhance the capability of identifying camouflaged objects. Extensive experiments show the superiority of our Ref-COD methods over their COD counterparts in segmenting specified camouflaged objects and identifying the main body of target objects. Our code and dataset are publicly available at https://github.com/zhangxuying1004/RefCOD

    TeMO: Towards Text-Driven 3D Stylization for Multi-Object Meshes

    Full text link
    Recent progress in the text-driven 3D stylization of a single object has been considerably promoted by CLIP-based methods. However, the stylization of multi-object 3D scenes is still impeded in that the image-text pairs used for pre-training CLIP mostly consist of an object. Meanwhile, the local details of multiple objects may be susceptible to omission due to the existing supervision manner primarily relying on coarse-grained contrast of image-text pairs. To overcome these challenges, we present a novel framework, dubbed TeMO, to parse multi-object 3D scenes and edit their styles under the contrast supervision at multiple levels. We first propose a Decoupled Graph Attention (DGA) module to distinguishably reinforce the features of 3D surface points. Particularly, a cross-modal graph is constructed to align the object points accurately and noun phrases decoupled from the 3D mesh and textual description. Then, we develop a Cross-Grained Contrast (CGC) supervision system, where a fine-grained loss between the words in the textual description and the randomly rendered images are constructed to complement the coarse-grained loss. Extensive experiments show that our method can synthesize high-quality stylized content and outperform the existing methods over a wide range of multi-object 3D meshes. Our code and results will be made publicly availabl

    Survey on Video Object Tracking Algorithms

    Get PDF
    Video object tracking is an important research content in the field of computer vision, mainly studying the tracking of objects with interest in video streams or image sequences. Video object tracking has been widely used in cameras and surveillance, driverless, precision guidance and other fields. Therefore, a comprehensive review on video object tracking algorithms is of great significance. Firstly, according to different sources of challenges, the challenges faced by video object tracking are classified into two aspects, the objects’ factors and the backgrounds’ factors, and summed up respectively. Secondly, the typical video object tracking algorithms in recent years are classified into correlation filtering video object tracking algorithms and deep learning video object tracking algorithms. And further the correlation filtering video object tracking algorithms are classified into three categories: kernel correlation filtering algorithms, scale adaptive correlation filtering algorithms and multi-feature fusion corre-lation filtering algorithms. The deep learning video object tracking algorithms are classified into two categories: video object tracking algorithms based on siamese network and based on convolutional neural network. This paper analyzes various algorithms from the aspects of research motivation, algorithm ideas, advantages and disadvantages. Then, the widely used datasets and evaluation indicators are introduced. Finally, this paper sums up the research and looks forward to the development trends of video object tracking in the future

    Revisiting Single Image Reflection Removal In the Wild

    Full text link
    This research focuses on the issue of single-image reflection removal (SIRR) in real-world conditions, examining it from two angles: the collection pipeline of real reflection pairs and the perception of real reflection locations. We devise an advanced reflection collection pipeline that is highly adaptable to a wide range of real-world reflection scenarios and incurs reduced costs in collecting large-scale aligned reflection pairs. In the process, we develop a large-scale, high-quality reflection dataset named Reflection Removal in the Wild (RRW). RRW contains over 14,950 high-resolution real-world reflection pairs, a dataset forty-five times larger than its predecessors. Regarding perception of reflection locations, we identify that numerous virtual reflection objects visible in reflection images are not present in the corresponding ground-truth images. This observation, drawn from the aligned pairs, leads us to conceive the Maximum Reflection Filter (MaxRF). The MaxRF could accurately and explicitly characterize reflection locations from pairs of images. Building upon this, we design a reflection location-aware cascaded framework, specifically tailored for SIRR. Powered by these innovative techniques, our solution achieves superior performance than current leading methods across multiple real-world benchmarks. Codes and datasets will be publicly available

    SVDinsTN: An Integrated Method for Tensor Network Representation with Efficient Structure Search

    Full text link
    Tensor network (TN) representation is a powerful technique for data analysis and machine learning. It practically involves a challenging TN structure search (TN-SS) problem, which aims to search for the optimal structure to achieve a compact representation. Existing TN-SS methods mainly adopt a bi-level optimization method that leads to excessive computational costs due to repeated structure evaluations. To address this issue, we propose an efficient integrated (single-level) method named SVD-inspired TN decomposition (SVDinsTN), eliminating the need for repeated tedious structure evaluation. By inserting a diagonal factor for each edge of the fully-connected TN, we calculate TN cores and diagonal factors simultaneously, with factor sparsity revealing the most compact TN structure. Experimental results on real-world data demonstrate that SVDinsTN achieves approximately 102∼10310^2\sim{}10^3 times acceleration in runtime compared to the existing TN-SS methods while maintaining a comparable level of representation ability

    Rethinking RGB-D Salient Object Detection: Models, Data Sets, and Large-Scale Benchmarks

    Full text link
    The use of RGB-D information for salient object detection has been extensively explored in recent years. However, relatively few efforts have been put towards modeling salient object detection in real-world human activity scenes with RGBD. In this work, we fill the gap by making the following contributions to RGB-D salient object detection. (1) We carefully collect a new SIP (salient person) dataset, which consists of ~1K high-resolution images that cover diverse real-world scenes from various viewpoints, poses, occlusions, illuminations, and backgrounds. (2) We conduct a large-scale (and, so far, the most comprehensive) benchmark comparing contemporary methods, which has long been missing in the field and can serve as a baseline for future research. We systematically summarize 32 popular models and evaluate 18 parts of 32 models on seven datasets containing a total of about 97K images. (3) We propose a simple general architecture, called Deep Depth-Depurator Network (D3Net). It consists of a depth depurator unit (DDU) and a three-stream feature learning module (FLM), which performs low-quality depth map filtering and cross-modal feature learning respectively. These components form a nested structure and are elaborately designed to be learned jointly. D3Net exceeds the performance of any prior contenders across all five metrics under consideration, thus serving as a strong model to advance research in this field. We also demonstrate that D3Net can be used to efficiently extract salient object masks from real scenes, enabling effective background changing application with a speed of 65fps on a single GPU. All the saliency maps, our new SIP dataset, the D3Net model, and the evaluation tools are publicly available at https://github.com/DengPingFan/D3NetBenchmark.Comment: Accepted in TNNLS20. 15 pages, 12 figures. Code: https://github.com/DengPingFan/D3NetBenchmar
    • …
    corecore