699 research outputs found

    Learning scale-variant and scale-invariant features for deep image classification

    Get PDF
    Convolutional Neural Networks (CNNs) require large image corpora to be trained on classification tasks. The variation in image resolutions, sizes of objects and patterns depicted, and image scales, hampers CNN training and performance, because the task-relevant information varies over spatial scales. Previous work attempting to deal with such scale variations focused on encouraging scale-invariant CNN representations. However, scale-invariant representations are incomplete representations of images, because images contain scale-variant information as well. This paper addresses the combined development of scale-invariant and scale-variant representations. We propose a multi- scale CNN method to encourage the recognition of both types of features and evaluate it on a challenging image classification task involving task-relevant characteristics at multiple scales. The results show that our multi-scale CNN outperforms single-scale CNN. This leads to the conclusion that encouraging the combined development of a scale-invariant and scale-variant representation in CNNs is beneficial to image recognition performance

    Deep Learning Framework For Intelligent Pavement Condition Rating: A direct classification approach for regional and local roads

    Get PDF
    Transport authorities rely on pavement characteristics to determine a pavement condition rating index. However, manually computing ratings can be a tedious, subjective, time-consuming, and training-intensive process. This paper presents a deep-learning framework for automatically rating the condition of rural road pavements using digital images captured from a dashboard-mounted camera. The framework includes pavement segmentation, data cleaning, image cropping and resizing, and pavement condition rating classification. A dataset of images, captured from diverse roads in Ireland and rated by two expert raters using the pavement surface condition index (PSCI) scale, was created. Deep-learning models were developed to perform pavement segmentation and condition rating classification. The automated PSCI rating achieved an average Cohen Kappa score and F1-score of 0.9 and 0.85, respectively, across 1–10 rating classes on an independent test set. The incorporation of unique image augmentation during training enabled the models to exhibit increased robustness against variations in background and clutter

    Orientation-Independent Chinese Text Recognition in Scene Images

    Full text link
    Scene text recognition (STR) has attracted much attention due to its broad applications. The previous works pay more attention to dealing with the recognition of Latin text images with complex backgrounds by introducing language models or other auxiliary networks. Different from Latin texts, many vertical Chinese texts exist in natural scenes, which brings difficulties to current state-of-the-art STR methods. In this paper, we take the first attempt to extract orientation-independent visual features by disentangling content and orientation information of text images, thus recognizing both horizontal and vertical texts robustly in natural scenes. Specifically, we introduce a Character Image Reconstruction Network (CIRN) to recover corresponding printed character images with disentangled content and orientation information. We conduct experiments on a scene dataset for benchmarking Chinese text recognition, and the results demonstrate that the proposed method can indeed improve performance through disentangling content and orientation information. To further validate the effectiveness of our method, we additionally collect a Vertical Chinese Text Recognition (VCTR) dataset. The experimental results show that the proposed method achieves 45.63% improvement on VCTR when introducing CIRN to the baseline model.Comment: IJCAI 202

    Visual Representation Learning with Transformer: A Sequence-to-Sequence Perspective

    Full text link
    Visual representation learning is the key of solving various vision problems. Relying on the seminal grid structure priors, convolutional neural networks (CNNs) have been the de facto standard architectures of most deep vision models. For instance, classical semantic segmentation methods often adopt a fully-convolutional network (FCN) with an encoder-decoder architecture. The encoder progressively reduces the spatial resolution and learns more abstract visual concepts with larger receptive fields. Since context modeling is critical for segmentation, the latest efforts have been focused on increasing the receptive field, through either dilated (i.e., atrous) convolutions or inserting attention modules. However, the FCN-based architecture remains unchanged. In this paper, we aim to provide an alternative perspective by treating visual representation learning generally as a sequence-to-sequence prediction task. Specifically, we deploy a pure Transformer to encode an image as a sequence of patches, without local convolution and resolution reduction. With the global context modeled in every layer of the Transformer, stronger visual representation can be learned for better tackling vision tasks. In particular, our segmentation model, termed as SEgmentation TRansformer (SETR), excels on ADE20K (50.28% mIoU, the first position in the test leaderboard on the day of submission), Pascal Context (55.83% mIoU) and reaches competitive results on Cityscapes. Further, we formulate a family of Hierarchical Local-Global (HLG) Transformers characterized by local attention within windows and global-attention across windows in a hierarchical and pyramidal architecture. Extensive experiments show that our method achieves appealing performance on a variety of visual recognition tasks (e.g., image classification, object detection and instance segmentation and semantic segmentation).Comment: Extended version of CVPR 2021 paper arXiv:2012.1584

    Efficient Semantic Segmentation for Resource-Constrained Applications with Lightweight Neural Networks

    Get PDF
    This thesis focuses on developing lightweight semantic segmentation models tailored for resource-constrained applications, effectively balancing accuracy and computational efficiency. It introduces several novel concepts, including knowledge sharing, dense bottleneck, and feature re-usability, which enhance the feature hierarchy by capturing fine-grained details, long-range dependencies, and diverse geometrical objects within the scene. To achieve precise object localization and improved semantic representations in real-time environments, the thesis introduces multi-stage feature aggregation, feature scaling, and hybrid-path attention methods

    Research on rainy day traffic sign recognition algorithm based on PMRNet

    Get PDF
    The recognition of traffic signs is of great significance to intelligent driving and traffic systems. Most current traffic sign recognition algorithms do not consider the impact of rainy weather. The rain marks will obscure the recognition target in the image, which will lead to the performance degradation of the algorithm, a problem that has yet to be solved. In order to improve the accuracy of traffic sign recognition in rainy weather, we propose a rainy traffic sign recognition algorithm. The algorithm in this paper includes two modules. First, we propose an image deraining algorithm based on the Progressive multi-scale residual network (PMRNet), which uses a multi-scale residual structure to extract features of different scales, so as to improve the utilization rate of the algorithm for information, combined with the Convolutional long-short term memory (ConvLSTM) network to enhance the algorithm's ability to extract rain mark features. Second, we use the CoT-YOLOv5 algorithm to recognize traffic signs on the recovered images. In this paper, in order to improve the performance of YOLOv5 (You-Only-Look-Once, YOLO), the 3 × 3 convolution in the feature extraction module is replaced by the Contextual Transformer (CoT) module to make up for the lack of global modeling capability of Convolutional Neural Network (CNN), thus improving the recognition accuracy. The experimental results show that the deraining algorithm based on PMRNet can effectively remove rain marks, and the evaluation indicators Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index Measure (SSIM) are better than the other representative algorithms. The mean Average Precision (mAP) of the CoT-YOLOv5 algorithm on the TT100k datasets reaches 92.1%, which is 5% higher than the original YOLOv5
    • …
    corecore