74 research outputs found

    Aggregated Text Transformer for Scene Text Detection

    Full text link
    This paper explores the multi-scale aggregation strategy for scene text detection in natural images. We present the Aggregated Text TRansformer(ATTR), which is designed to represent texts in scene images with a multi-scale self-attention mechanism. Starting from the image pyramid with multiple resolutions, the features are first extracted at different scales with shared weight and then fed into an encoder-decoder architecture of Transformer. The multi-scale image representations are robust and contain rich information on text contents of various sizes. The text Transformer aggregates these features to learn the interaction across different scales and improve text representation. The proposed method detects scene texts by representing each text instance as an individual binary mask, which is tolerant of curve texts and regions with dense instances. Extensive experiments on public scene text detection datasets demonstrate the effectiveness of the proposed framework

    DDT: Dual-branch Deformable Transformer for Image Denoising

    Full text link
    Transformer is beneficial for image denoising tasks since it can model long-range dependencies to overcome the limitations presented by inductive convolutional biases. However, directly applying the transformer structure to remove noise is challenging because its complexity grows quadratically with the spatial resolution. In this paper, we propose an efficient Dual-branch Deformable Transformer (DDT) denoising network which captures both local and global interactions in parallel. We divide features with a fixed patch size and a fixed number of patches in local and global branches, respectively. In addition, we apply deformable attention operation in both branches, which helps the network focus on more important regions and further reduces computational complexity. We conduct extensive experiments on real-world and synthetic denoising tasks, and the proposed DDT achieves state-of-the-art performance with significantly fewer computational costs.Comment: The code is avaliable at: https://github.com/Merenguelkl/DD

    Progressive Scene Text Erasing with Self-Supervision

    Full text link
    Scene text erasing seeks to erase text contents from scene images and current state-of-the-art text erasing models are trained on large-scale synthetic data. Although data synthetic engines can provide vast amounts of annotated training samples, there are differences between synthetic and real-world data. In this paper, we employ self-supervision for feature representation on unlabeled real-world scene text images. A novel pretext task is designed to keep consistent among text stroke masks of image variants. We design the Progressive Erasing Network in order to remove residual texts. The scene text is erased progressively by leveraging the intermediate generated results which provide the foundation for subsequent higher quality results. Experiments show that our method significantly improves the generalization of the text erasing task and achieves state-of-the-art performance on public benchmarks
    • …
    corecore