1,103 research outputs found

    N-Gram in Swin Transformers for Efficient Lightweight Image Super-Resolution

    Full text link
    While some studies have proven that Swin Transformer (SwinT) with window self-attention (WSA) is suitable for single image super-resolution (SR), SwinT ignores the broad regions for reconstructing high-resolution images due to window and shift size. In addition, many deep learning SR methods suffer from intensive computations. To address these problems, we introduce the N-Gram context to the image domain for the first time in history. We define N-Gram as neighboring local windows in SwinT, which differs from text analysis that views N-Gram as consecutive characters or words. N-Grams interact with each other by sliding-WSA, expanding the regions seen to restore degraded pixels. Using the N-Gram context, we propose NGswin, an efficient SR network with SCDP bottleneck taking all outputs of the hierarchical encoder. Experimental results show that NGswin achieves competitive performance while keeping an efficient structure, compared with previous leading methods. Moreover, we also improve other SwinT-based SR methods with the N-Gram context, thereby building an enhanced model: SwinIR-NG. Our improved SwinIR-NG outperforms the current best lightweight SR approaches and establishes state-of-the-art results. Codes will be available soon.Comment: 8 pages (main content) + 14 pages (supplementary content

    Revealing More Details: Image Super-Resolution for Real-World Applications

    Get PDF

    Blind Image Super-resolution with Rich Texture-Aware Codebooks

    Full text link
    Blind super-resolution (BSR) methods based on high-resolution (HR) reconstruction codebooks have achieved promising results in recent years. However, we find that a codebook based on HR reconstruction may not effectively capture the complex correlations between low-resolution (LR) and HR images. In detail, multiple HR images may produce similar LR versions due to complex blind degradations, causing the HR-dependent only codebooks having limited texture diversity when faced with confusing LR inputs. To alleviate this problem, we propose the Rich Texture-aware Codebook-based Network (RTCNet), which consists of the Degradation-robust Texture Prior Module (DTPM) and the Patch-aware Texture Prior Module (PTPM). DTPM effectively mines the cross-resolution correlation of textures between LR and HR images by exploiting the cross-resolution correspondence of textures. PTPM uses patch-wise semantic pre-training to correct the misperception of texture similarity in the high-level semantic regularization. By taking advantage of this, RTCNet effectively gets rid of the misalignment of confusing textures between HR and LR in the BSR scenarios. Experiments show that RTCNet outperforms state-of-the-art methods on various benchmarks by up to 0.16 ~ 0.46dB

    Super-resolution assessment and detection

    Get PDF
    Super Resolution (SR) techniques are powerful digital manipulation tools that have significantly impacted various industries due to their ability to enhance the resolution of lower quality images and videos. Yet, the real-world adaptation of SR models poses numerous challenges, which blind SR models aim to overcome by emulating complex real-world degradations. In this thesis, we investigate these SR techniques, with a particular focus on comparing the performance of blind models to their non-blind counterparts under various conditions. Despite recent progress, the proliferation of SR techniques raises concerns about their potential misuse. These methods can easily manipulate real digital content and create misrepresentations, which highlights the need for robust SR detection mechanisms. In our study, we analyze the limitations of current SR detection techniques and propose a new detection system that exhibits higher performance in discerning real and upscaled videos. Moreover, we conduct several experiments to gain insights into the strengths and weaknesses of the detection models, providing a better understanding of their behavior and limitations. Particularly, we target 4K videos, which are rapidly becoming the standard resolution in various fields such as streaming services, gaming, and content creation. As part of our research, we have created and utilized a unique dataset in 4K resolution, specifically designed to facilitate the investigation of SR techniques and their detection

    Revisiting the Encoding of Satellite Image Time Series

    Full text link
    Satellite Image Time Series (SITS) representation learning is complex due to high spatiotemporal resolutions, irregular acquisition times, and intricate spatiotemporal interactions. These challenges result in specialized neural network architectures tailored for SITS analysis. The field has witnessed promising results achieved by pioneering researchers, but transferring the latest advances or established paradigms from Computer Vision (CV) to SITS is still highly challenging due to the existing suboptimal representation learning framework. In this paper, we develop a novel perspective of SITS processing as a direct set prediction problem, inspired by the recent trend in adopting query-based transformer decoders to streamline the object detection or image segmentation pipeline. We further propose to decompose the representation learning process of SITS into three explicit steps: collect-update-distribute, which is computationally efficient and suits for irregularly-sampled and asynchronous temporal satellite observations. Facilitated by the unique reformulation, our proposed temporal learning backbone of SITS, initially pre-trained on the resource efficient pixel-set format and then fine-tuned on the downstream dense prediction tasks, has attained new state-of-the-art (SOTA) results on the PASTIS benchmark dataset. Specifically, the clear separation between temporal and spatial components in the semantic/panoptic segmentation pipeline of SITS makes us leverage the latest advances in CV, such as the universal image segmentation architecture, resulting in a noticeable 2.5 points increase in mIoU and 8.8 points increase in PQ, respectively, compared to the best scores reported so far

    Neural architecture search: A contemporary literature review for computer vision applications

    Get PDF
    Deep Neural Networks have received considerable attention in recent years. As the complexity of network architecture increases in relation to the task complexity, it becomes harder to manually craft an optimal neural network architecture and train it to convergence. As such, Neural Architecture Search (NAS) is becoming far more prevalent within computer vision research, especially when the construction of efficient, smaller network architectures is becoming an increasingly important area of research, for which NAS is well suited. However, despite their promise, contemporary and end-to-end NAS pipeline require vast computational training resources. In this paper, we present a comprehensive overview of contemporary NAS approaches with respect to image classification, object detection, and image segmentation. We adopt consistent terminology to overcome contradictions common within existing NAS literature. Furthermore, we identify and compare current performance limitations in addition to highlighting directions for future NAS research

    A Review on Skin Disease Classification and Detection Using Deep Learning Techniques

    Get PDF
    Skin cancer ranks among the most dangerous cancers. Skin cancers are commonly referred to as Melanoma. Melanoma is brought on by genetic faults or mutations on the skin, which are caused by Unrepaired Deoxyribonucleic Acid (DNA) in skin cells. It is essential to detect skin cancer in its infancy phase since it is more curable in its initial phases. Skin cancer typically progresses to other regions of the body. Owing to the disease's increased frequency, high mortality rate, and prohibitively high cost of medical treatments, early diagnosis of skin cancer signs is crucial. Due to the fact that how hazardous these disorders are, scholars have developed a number of early-detection techniques for melanoma. Lesion characteristics such as symmetry, colour, size, shape, and others are often utilised to detect skin cancer and distinguish benign skin cancer from melanoma. An in-depth investigation of deep learning techniques for melanoma's early detection is provided in this study. This study discusses the traditional feature extraction-based machine learning approaches for the segmentation and classification of skin lesions. Comparison-oriented research has been conducted to demonstrate the significance of various deep learning-based segmentation and classification approaches

    Generic Object Detection and Segmentation for Real-World Environments

    Get PDF

    Generalized Differentiable Neural Architecture Search with Performance and Stability Improvements

    Get PDF
    This work introduces improvements to the stability and generalizability of Cyclic DARTS (CDARTS). CDARTS is a Differentiable Architecture Search (DARTS)-based approach to neural architecture search (NAS) that uses a cyclic feedback mechanism to train search and evaluation networks concurrently, thereby optimizing the search process by enforcing that the networks produce similar outputs. However, the dissimilarity between the loss functions used by the evaluation networks during the search and retraining phases results in a search-phase evaluation network, a sub-optimal proxy for the final evaluation network utilized during retraining. ICDARTS, a revised algorithm that reformulates the search phase loss functions to ensure the criteria for training the networks is consistent across both phases, is presented along with a modified process for discretizing the search network\u27s zero operations that allows the retention of these operations in the final evaluation networks. We pair the results of these changes with ablation studies of ICDARTS\u27 algorithm and network template. Multiple methods were then explored for expanding the search space of ICDARTS, including extending its operation set and implementing methods for discretizing its continuous search cells, further improving its discovered networks\u27 performance. In order to balance the flexibility of expanded search spaces with minimal compute costs, both a novel algorithm for incorporating efficient dynamic search spaces into ICDARTS and a multi-objective version of ICDARTS that incorporates an expected latency penalty term into its loss function are introduced. All enhancements to the original search algorithm are verified on two challenging scientific datasets. This work concludes by proposing and examining the preliminary results of a preliminary hierarchical version of ICDARTS that optimizes cell structures and network templates

    Image fusion for the novelty rotating synthetic aperture system based on vision transformer

    Get PDF
    Rotating synthetic aperture (RSA) technology offers a promising solution for achieving large-aperture and lightweight designs in optical remote-sensing systems. It employs a rectangular primary mirror, resulting in noncircular spatial symmetry in the point-spread function, which changes over time as the mirror rotates. Consequently, it is crucial to employ an appropriate image-fusion method to merge high-resolution information intermittently captured from different directions in the image sequence owing to the rotation of the mirror. However, existing image-fusion methods have struggled to address the unique imaging mechanism of this system and the characteristics of the geostationary orbit in which the system operates. To address this challenge, we model the imaging process of a noncircular rotating pupil and analyse its on-orbit imaging characteristics. Based on this analysis, we propose an image-fusion network based on a vision transformer. This network incorporates inter-frame mutual attention and intra-frame self-attention mechanisms, facilitating more effective extraction of temporal and spatial information from the image sequence. Specifically, mutual attention was used to model the correlation between pixels that were close to each other in the spatial and temporal dimensions, whereas long-range spatial dependencies were captured using intra-frame self-attention in the rotated variable-size attention block. We subsequently enhanced the fusion of spatiotemporal information using video swin transformer blocks. Extensive digital simulations and semi-physical imaging experiments on remote-sensing images obtained from the WorldView-3 satellite demonstrated that our method outperformed both image-fusion methods designed for the RSA system and state-of-the-art general deep learning-based methods
    • …
    corecore