84,792 research outputs found

    MaxSR: Image Super-Resolution Using Improved MaxViT

    Full text link
    While transformer models have been demonstrated to be effective for natural language processing tasks and high-level vision tasks, only a few attempts have been made to use powerful transformer models for single image super-resolution. Because transformer models have powerful representation capacity and the in-built self-attention mechanisms in transformer models help to leverage self-similarity prior in input low-resolution image to improve performance for single image super-resolution, we present a single image super-resolution model based on recent hybrid vision transformer of MaxViT, named as MaxSR. MaxSR consists of four parts, a shallow feature extraction block, multiple cascaded adaptive MaxViT blocks to extract deep hierarchical features and model global self-similarity from low-level features efficiently, a hierarchical feature fusion block, and finally a reconstruction block. The key component of MaxSR, i.e., adaptive MaxViT block, is based on MaxViT block which mixes MBConv with squeeze-and-excitation, block attention and grid attention. In order to achieve better global modelling of self-similarity in input low-resolution image, we improve block attention and grid attention in MaxViT block to adaptive block attention and adaptive grid attention which do self-attention inside each window across all grids and each grid across all windows respectively in the most efficient way. We instantiate proposed model for classical single image super-resolution (MaxSR) and lightweight single image super-resolution (MaxSR-light). Experiments show that our MaxSR and MaxSR-light establish new state-of-the-art performance efficiently

    Region-Based Approach for Single Image Super-Resolution

    Get PDF
    Single image super-resolution (SR) is a technique that generates a high- resolution image from a single low-resolution image [1,2,10,11]. Single image super- resolution can be generally classified into two groups: example-based and self-similarity based SR algorithms. The performance of the example-based SR algorithm depends on the similarity between testing data and the database. Usually, a large database is needed for better performance in general. This would result in heavy computational cost. The self-similarity based SR algorithm can generate a high-resolution (HR) image with sharper edges and fewer ringing artifacts if there is sufficient recurrence within or across scales of the same image [10, 11], but it is hard to generate HR details for an image region with fine texture. Based on the limitation of each type of SR algorithm, we propose to combine these two types of algorithms. We segment each image into regions based on image content, and choose the appropriate SR algorithm to recover the HR image for each region based on the texture feature. Our experimental results show that our proposed method takes advantage of each SR algorithm and can produce natural looking results with sharp edges, while suppressing ringing artifacts. We compute PSNR to qualitatively evaluate the SR results, and our proposed method outperforms the self-similarity based or example-based SR algorithm with higher PSNR (+0.1dB)

    Learning to super-resolve images using self-similarities

    Get PDF
    The single image super-resolution problem entails estimating a high-resolution version of a low-resolution image. Recent studies have shown that high resolution versions of the patches of a given low-resolution image are likely to be found within the given image itself. This recurrence of patches across scales in an image forms the basis of self-similarity driven algorithms for image super-resolution. Self-similarity driven approaches have the appeal that they do not require any external training set; the mapping from low-resolution to high-resolution is obtained using the cross scale patch recurrence. In this dissertation, we address three important problems in super-resolution, and present novel self-similarity based solutions to them: First, we push the state-of-the-art in terms of super-resolution of fine textural details in the scene. We propose two algorithms that use self-similarity in conjunction with the fact that textures are better characterized by their responses to a set of spatially localized bandpass filters, as compared to intensity values directly. Our proposed algorithms seek self-similarities in the sub-bands of the image, for better synthesizing fine textural details. Second, we address the problem of super-resolving an image in the presence of noise. To this end, we propose the first super-resolution algorithm based on self-similarity that effectively exploits the high-frequency content present in noise (which is ordinarily discarded by denoising algorithms) for synthesizing useful textures in high-resolution. Third, we present an algorithm that is able to better super-resolve images containing geometric regularities such as in urban scenes, cityscapes etc. We do so by extracting planar surfaces and their parameters (mid-level cues) from the scene and exploiting the detected scene geometry for better guiding the self-similarity search process. Apart from the above self-similarity algorithms, this dissertation also presents a novel edge-based super-resolution algorithm that super-resolves an image by learning from training data how edge profiles transform across resolutions. We obtain edge profiles via a detailed and explicit examination of local image structure, which we show to be more robust and accurate as compared to conventional gradient profiles

    Combined self-learning based single-image super-resolution and dual-tree complex wavelet transform denoising for medical images

    Get PDF
    In this paper, we propose a novel self-learning based single-image super-resolution (SR) method, which is coupled with dual-tree complex wavelet transform (DTCWT) based denoising to better recover high-resolution (HR) medical images. Unlike previous methods, this self-learning based SR approach enables us to reconstruct HR medical images from a single low-resolution (LR) image without extra training on HR image datasets in advance. The relationships between the given image and its scaled down versions are modeled using support vector regression with sparse coding and dictionary learning, without explicitly assuming reoccurrence or self-similarity across image scales. In addition, we perform DTCWT based denoising to initialize the HR images at each scale instead of simple bicubic interpolation. We evaluate our method on a variety of medical images. Both quantitative and qualitative results show that the proposed approach outperforms bicubic interpolation and state-of-the-art single-image SR methods while effectively removing noise

    Learning-Based Single Image Super Resolution

    Get PDF
    Recent advancements in signal processing techniques have led to obtain more high resolution images. A high resolution image refers to an image with high density of pixels. The importance and desire of high resolution images are obvious in the field of electronic and digital imaging applications.The quality of an image can be improved either by hardware or software approaches. Hardware approaches are straightforward solutions to enhance the quality of a given image, but some constraints, such as chip size increment, making them expensive to some extend. Therefore, most of the researchers are focused on software methods. Super resolution is one of the software image processing approaches where a high resolution image can be recovered from low resolution one(s). The main goal of super resolution is the resolution enhancement. This topic has been widely brought into attention in image processing society due to the current and future application demands especially in the field of medical applications. Super resolving a high resolution image can be performed from either a single low resolution or many low resolution images. This thesis is completely concentrated on Single Image Super Resolution (SISR) where a single low resolution image is the candidate to be exploited as the input image. There are several classes of methods to obtain SISR where three important ones, i.e., the Example-based, Regression-based and Self-similarity-based are investigated within this thesis. This thesis evaluates the performance of the above-mentioned methods. Based on achieved results, the Regression method shows better performance compared to other approaches. Furthermore, we utilize parameters, such as patch size, to improve the numerical and virtual results in term of PSNR and resolution, respectively. These modifications are applied to the Regression-based and Self-similarity-based methods. The modified algorithms in both methods lead to improve results and obtain the best ones

    Inverse Problems and Self-similarity in Imaging

    Get PDF
    This thesis examines the concept of image self-similarity and provides solutions to various associated inverse problems such as resolution enhancement and missing fractal codes. In general, many real-world inverse problems are ill-posed, mainly because of the lack of existence of a unique solution. The procedure of providing acceptable unique solutions to such problems is known as regularization. The concept of image prior, which has been of crucial importance in image modelling and processing, has also been important in solving inverse problems since it algebraically translates to the regularization procedure. Indeed, much recent progress in imaging has been due to advances in the formulation and practice of regularization. This, coupled with progress in optimization and numerical analysis, has yielded much improvement in computational methods of solving inverse imaging problems. Historically, the idea of self-similarity was important in the development of fractal image coding. Here we show that the self-similarity properties of natural images may be used to construct image priors for the purpose of addressing certain inverse problems. Indeed, new trends in the area of non-local image processing have provided a rejuvenated appreciation of image self-similarity and opportunities to explore novel self-similarity-based priors. We first revisit the concept of fractal-based methods and address some open theoretical problems in the area. This includes formulating a necessary and sufficient condition for the contractivity of the block fractal transform operator. We shall also provide some more generalized formulations of fractal-based self-similarity constraints of an image. These formulations can be developed algebraically and also in terms of the set-based method of Projection Onto Convex Sets (POCS). We then revisit the traditional inverse problems of single frame image zooming and multi-frame resolution enhancement, also known as super-resolution. Some ideas will be borrowed from newly developed non-local denoising algorithms in order to formulate self-similarity priors. Understanding the role of scale and choice of examples/samples is also important in these proposed models. For this purpose, we perform an extensive series of numerical experiments and analyze the results. These ideas naturally lead to the method of self-examples, which relies on the regularity properties of natural images at different scales, as a means of solving the single-frame image zooming problem. Furthermore, we propose and investigate a multi-frame super-resolution counterpart which does not require explicit motion estimation among video sequences

    Exploring the Internal Statistics: Single Image Super-Resolution, Completion and Captioning

    Full text link
    Image enhancement has drawn increasingly attention in improving image quality or interpretability. It aims to modify images to achieve a better perception for human visual system or a more suitable representation for further analysis in a variety of applications such as medical imaging, remote sensing, and video surveillance. Based on different attributes of the given input images, enhancement tasks vary, e.g., noise removal, deblurring, resolution enhancement, prediction of missing pixels, etc. The latter two are usually referred to as image super-resolution and image inpainting (or completion). Image super-resolution and completion are numerically ill-posed problems. Multi-frame-based approaches make use of the presence of aliasing in multiple frames of the same scene. For cases where only one input image is available, it is extremely challenging to estimate the unknown pixel values. In this dissertation, we target at single image super-resolution and completion by exploring the internal statistics within the input image and across scales. An internal gradient similarity-based single image super-resolution algorithm is first presented. Then we demonstrate that the proposed framework could be naturally extended to accomplish super-resolution and completion simultaneously. Afterwards, a hybrid learning-based single image super-resolution approach is proposed to benefit from both external and internal statistics. This framework hinges on image-level hallucination from externally learned regression models as well as gradient level pyramid self-awareness for edges and textures refinement. The framework is then employed to break the resolution limitation of the passive microwave imagery and to boost the tracking accuracy of the sea ice movements. To extend our research to the quality enhancement of the depth maps, a novel system is presented to handle circumstances where only one pair of registered low-resolution intensity and depth images are available. High quality RGB and depth images are generated after the system. Extensive experimental results have demonstrated the effectiveness of all the proposed frameworks both quantitatively and qualitatively. Different from image super-resolution and completion which belong to low-level vision research, image captioning is a high-level vision task related to the semantic understanding of an input image. It is a natural task for human beings. However, image captioning remains challenging from a computer vision point of view especially due to the fact that the task itself is ambiguous. In principle, descriptions of an image can talk about any visual aspects in it varying from object attributes to scene features, or even refer to objects that are not depicted and the hidden interaction or connection that requires common sense knowledge to analyze. Therefore, learning-based image captioning is in general a data-driven task, which relies on the training dataset. Descriptions in the majority of the existing image-sentence datasets are generated by humans under specific instructions. Real-world sentence data is rarely directly utilized for training since it is sometimes noisy and unbalanced, which makes it ‘imperfect’ for the training of the image captioning task. In this dissertation, we present a novel image captioning framework to deal with the uncontrolled image-sentence dataset where descriptions could be strongly or weakly correlated to the image content and in arbitrary lengths. A self-guiding learning process is proposed to fully reveal the internal statistics of the training dataset and to look into the learning process in a global way and generate descriptions that are syntactically correct and semantically sound

    Single Image Super-Resolution Using Convolutional Neural Networks

    Get PDF
    Enlargement of images is a common need in many applications. Although increasing the pixel count of an image is easy with simple interpolation methods, those fail to increase the amount of details in the image. Single image super-resolution (SISR) aims to solve this ill-posed problem of producing a high resolution (HR) image from a given low resolution (LR) image. A single LR image has always an infinite number of corresponding LR images, but some of those are more probable than others. This probability density can be estimated with machine learning techniques, and the most probable HR image can be constructed based on that estimate. In recent years artificial neural networks have become the most popular machine learning methods. Convolutional neural networks (CNN) are a subtype of them, inspired by the human visual system. They are used extensively in all fields of image processing, including single image super-resolution. In this thesis different CNN based methods for SISR are compared, and their performance is analyzed using both quantitative and qualitative methods. In total four CNN methods were chosen, and they were compared to three other methods. One of the reference methods was based on more traditional machine learning, and the two others were based on self-similarity of the input images. In contrast to machine learning approach, self-similarity based methods utilize only information in the input image and do not require any training on external images. The results show that CNN based methods outperform the alternative approaches in both quantitative metrics and qualitative analysis. The methods perform especially well with images that have clear structures and sharp edges, but highly textured images tend to be problematic. Six of the methods aim to minimize pixel-wise reconstruction error, which leads to overly smooth output on textured areas. One method was instead designed to maximize the perceptual quality of the images, at the cost of increased reconstruction error. It was able to generate very realistic textures in some cases, but had a tendency to hallucinate very implausible textures into flat areas. Also other CNN based methods tended to create erroneous but plausible details, which might be misleading in critical applications like medical imaging. CNN based SISR is more suitable for entertainment and other consumer applications, especially when the perceptually optimized methods are developed further
    • …
    corecore