11 research outputs found

    Detection and Rectification of Arbitrary Shaped Scene Texts by using Text Keypoints and Links

    Full text link
    Detection and recognition of scene texts of arbitrary shapes remain a grand challenge due to the super-rich text shape variation in text line orientations, lengths, curvatures, etc. This paper presents a mask-guided multi-task network that detects and rectifies scene texts of arbitrary shapes reliably. Three types of keypoints are detected which specify the centre line and so the shape of text instances accurately. In addition, four types of keypoint links are detected of which the horizontal links associate the detected keypoints of each text instance and the vertical links predict a pair of landmark points (for each keypoint) along the upper and lower text boundary, respectively. Scene texts can be located and rectified by linking up the associated landmark points (giving localization polygon boxes) and transforming the polygon boxes via thin plate spline, respectively. Extensive experiments over several public datasets show that the use of text keypoints is tolerant to the variation in text orientations, lengths, and curvatures, and it achieves superior scene text detection and rectification performance as compared with state-of-the-art methods

    Dataset Condensation via Generative Model

    Full text link
    Dataset condensation aims to condense a large dataset with a lot of training samples into a small set. Previous methods usually condense the dataset into the pixels format. However, it suffers from slow optimization speed and large number of parameters to be optimized. When increasing image resolutions and classes, the number of learnable parameters grows accordingly, prohibiting condensation methods from scaling up to large datasets with diverse classes. Moreover, the relations among condensed samples have been neglected and hence the feature distribution of condensed samples is often not diverse. To solve these problems, we propose to condense the dataset into another format, a generative model. Such a novel format allows for the condensation of large datasets because the size of the generative model remains relatively stable as the number of classes or image resolution increases. Furthermore, an intra-class and an inter-class loss are proposed to model the relation of condensed samples. Intra-class loss aims to create more diverse samples for each class by pushing each sample away from the others of the same class. Meanwhile, inter-class loss increases the discriminability of samples by widening the gap between the centers of different classes. Extensive comparisons with state-of-the-art methods and our ablation studies confirm the effectiveness of our method and its individual component. To our best knowledge, we are the first to successfully conduct condensation on ImageNet-1k.Comment: old work,done in 202

    Is synthetic data from generative models ready for image recognition?

    Full text link
    Recent text-to-image generation models have shown promising results in generating high-fidelity photo-realistic images. Though the results are astonishing to human eyes, how applicable these generated images are for recognition tasks remains under-explored. In this work, we extensively study whether and how synthetic images generated from state-of-the-art text-to-image generation models can be used for image recognition tasks, and focus on two perspectives: synthetic data for improving classification models in data-scarce settings (i.e. zero-shot and few-shot), and synthetic data for large-scale model pre-training for transfer learning. We showcase the powerfulness and shortcomings of synthetic data from existing generative models, and propose strategies for better applying synthetic data for recognition tasks. Code: https://github.com/CVMI-Lab/SyntheticData.Comment: ICLR 2023, spotligh

    Free-ATM: Exploring Unsupervised Learning on Diffusion-Generated Images with Free Attention Masks

    Full text link
    Despite the rapid advancement of unsupervised learning in visual representation, it requires training on large-scale datasets that demand costly data collection, and pose additional challenges due to concerns regarding data privacy. Recently, synthetic images generated by text-to-image diffusion models, have shown great potential for benefiting image recognition. Although promising, there has been inadequate exploration dedicated to unsupervised learning on diffusion-generated images. To address this, we start by uncovering that diffusion models' cross-attention layers inherently provide annotation-free attention masks aligned with corresponding text inputs on generated images. We then investigate the problems of three prevalent unsupervised learning techniques ( i.e., contrastive learning, masked modeling, and vision-language pretraining) and introduce customized solutions by fully exploiting the aforementioned free attention masks. Our approach is validated through extensive experiments that show consistent improvements in baseline models across various downstream tasks, including image classification, detection, segmentation, and image-text retrieval. By utilizing our method, it is possible to close the performance gap between unsupervised pretraining on synthetic data and real-world scenarios

    Analysis of the driving path of e-commerce to high-quality agricultural development in China: empirical evidence from mediating effect models

    Get PDF
    PurposeThis study investigates the impact of e-commerce on high-quality agricultural development (HQAD) in China. As the agricultural sector transitions towards higher quality production in the digital era, understanding the influence pathways and mechanisms of e-commerce becomes crucial. We aim to quantify this influence through a hierarchical approach.MethodsUtilizing provincial panel data from 2000 to 2021, we construct a comprehensive HQAD evaluation system using the entropy method. Parallel mediating effect models are employed to empirically assess the multi-level effects of e-commerce on HQAD.ResultsBenchmark regression analyzes reveal a significant positive effect of e-commerce on HQAD, indicating its role as a key driver in China’s agricultural advancement. Mechanism tests identify several intermediary pathways through which e-commerce indirectly promotes HQAD, including market expansion, agricultural value chain optimization, enhanced social services, and improved infrastructure. Notably, market expansion and value chain optimization demonstrate the most substantial mediation effects, accounting for 43.27 and 14.18% of the total effect, respectively.DiscussionThis research contributes to the literature by establishing a comprehensive HQAD evaluation framework, providing a theoretical foundation for future studies. By incorporating circulation factors into the production system, we elucidate the complex influence mechanisms of e-commerce on agricultural production, addressing a significant research gap. Furthermore, we propose a novel “demand-driven supply optimization” paradigm, offering valuable insights for policy formulation aimed at fostering HQAD in China

    Image aesthetic style classification and region detection using Convolutional Neural Network

    No full text
    Convolutional Neural Network (CNN) becomes popular in recent years, especially in the field of image processing. This algorithm has been successfully applied on object image classification, object detection, video analysis and so on with good results. Due to good feature extraction performance of CNN, research on automatically aesthetic analysis of images by deep learning has started. However, previous work for image aesthetic analysis like [5] are mainly about image aesthetic rating or image aesthetic binary classification. Therefore, our project aims at learning the image aesthetic styles using CNN as well as generating the bounding box of region for corresponding styles. This project comprises of two main parts, which are image aesthetic style classification and image aesthetic style region detection. We firstly build the network based on [5] and train an image aesthetic style classification model on AVA Dataset [4] with some selected style classes after data cleaning. By using this pre-trained model, we then apply Faster R-CNN [1] algorithm on image aesthetic style region detection. This is implemented by firstly manually labeling image aesthetic style region in selected images in AVA Dataset, building corresponding Region Proposal Network and Fast R-CNN Network [1] based on RAPID Network [5] and training on these labeled images with pre-trained image aesthetic style classification model.Bachelor of Engineering (Computer Engineering

    Accurate and robust detection and recognition of texts in scene

    No full text
    Scene text detection and recognition aim to localize the texts in natural scene images and output corresponding character sequences of texts. Automated scene text detection and recognition have attracted increasing interest in computer vision and deep learning communities due to its wide range of applications in neural machine translation, autonomous driving, etc. As compared with preliminary research that focuses on the design of hand-crafted features, modern deep-learning-based techniques have achieved significant improvements on scene text detection and recognition tasks. Such frameworks usually deploy convolutional neural networks (CNN), recurrent neural networks (RNN), or Transformers to extract image features for accurate text detection and recognition. However, automated detecting and recognizing texts in scenes remain challenging due to the complexity of scene text images. First, texts in scenes exhibit high variability and diversity in appearance due to the complex patterns of texts (e.g., colors, fonts, etc.) and various environments (e.g., lighting, occlusion, etc.). Second, scene texts usually have different lengths, orientations, and shapes that may suffer from both perspective and curvature distortions. Third, scene images usually have complex backgrounds that may contain similar patterns with texts (e.g., trees, traffic signs, etc.). Either of them will lead to incorrect prediction in scene text detection and recognition task. In this thesis, we propose several novel techniques for scene text detection and recognition that aim to produce more accurate detection and recognition of scene texts in different orientations, lengths, sizes, and shapes. First, we design a novel scene text detection approach that detects texts through border semantics awareness and bootstrapping. We introduce a bootstrapping technique that samples multiple `subsections' of a word or text line and accordingly relieves the constraint of limited training data effectively. In addition, a semantics-aware text border detection technique is designed which produces four types of text border segments for text detection. Second, we develop a novel multi-scale shape regression network (MSR) for accurate scene text detection. It detects scene texts by predicting dense text boundary points instead of sparse quadrilateral vertices which often suffers from regression errors while dealing with long text lines. Additionally, the multi-scale network extracts and fuses features at different scales concurrently and seamlessly which demonstrates superb tolerance to the text scale variation. Third, we design a mask-guided multi-task network that reliably detects and rectifies scene texts of arbitrary shapes. The proposed network detects text keypoints and landmark points for accurate text detection and rectification. Forth, we propose a novel scene text recognition method I2C2W that is tolerant to geometric and photometric degradation by decomposing scene text recognition into two inter-connected tasks and leveraging the advances of Transformer architecture. Extensive experiments show that the proposed techniques can accurately detect and recognize texts with various lengths, orientations, and shapes from natural scene images.Doctor of Philosoph
    corecore