338 research outputs found

    Transform recipes for efficient cloud photo enhancement

    Get PDF
    Cloud image processing is often proposed as a solution to the limited computing power and battery life of mobile devices: it allows complex algorithms to run on powerful servers with virtually unlimited energy supply. Unfortunately, this overlooks the time and energy cost of uploading the input and downloading the output images. When transfer overhead is accounted for, processing images on a remote server becomes less attractive and many applications do not benefit from cloud offloading. We aim to change this in the case of image enhancements that preserve the overall content of an image. Our key insight is that, in this case, the server can compute and transmit a description of the transformation from input to output, which we call a transform recipe. At equivalent quality, our recipes are much more compact than JPEG images: this reduces the client's download. Furthermore, recipes can be computed from highly compressed inputs which significantly reduces the data uploaded to the server. The client reconstructs a high-fidelity approximation of the output by applying the recipe to its local high-quality input. We demonstrate our results on 168 images and 10 image processing applications, showing that our recipes form a compact representation for a diverse set of image filters. With an equivalent transmission budget, they provide higher-quality results than JPEG-compressed input/output images, with a gain of the order of 10 dB in many cases. We demonstrate the utility of recipes on a mobile phone by profiling the energy consumption and latency for both local and cloud computation: a transform recipe-based pipeline runs 2--4x faster and uses 2--7x less energy than local or naive cloud computation.Qatar Computing Research InstituteUnited States. Defense Advanced Research Projects Agency (Agreement FA8750-14-2-0009)Stanford University. Stanford Pervasive Parallelism LaboratoryAdobe System

    Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial Representation Learning

    Full text link
    Large, pretrained models are commonly finetuned with imagery that is heavily augmented to mimic different conditions and scales, with the resulting models used for various tasks with imagery from a range of spatial scales. Such models overlook scale-specific information in the data for scale-dependent domains, such as remote sensing. In this paper, we present Scale-MAE, a pretraining method that explicitly learns relationships between data at different, known scales throughout the pretraining process. Scale-MAE pretrains a network by masking an input image at a known input scale, where the area of the Earth covered by the image determines the scale of the ViT positional encoding, not the image resolution. Scale-MAE encodes the masked image with a standard ViT backbone, and then decodes the masked image through a bandpass filter to reconstruct low/high frequency images at lower/higher scales. We find that tasking the network with reconstructing both low/high frequency images leads to robust multiscale representations for remote sensing imagery. Scale-MAE achieves an average of a 2.45.6%2.4 - 5.6\% non-parametric kNN classification improvement across eight remote sensing datasets compared to current state-of-the-art and obtains a 0.90.9 mIoU to 1.71.7 mIoU improvement on the SpaceNet building segmentation transfer task for a range of evaluation scales

    State-of-the-Art and Trends in Scalable Video Compression with Wavelet Based Approaches

    Get PDF
    3noScalable Video Coding (SVC) differs form traditional single point approaches mainly because it allows to encode in a unique bit stream several working points corresponding to different quality, picture size and frame rate. This work describes the current state-of-the-art in SVC, focusing on wavelet based motion-compensated approaches (WSVC). It reviews individual components that have been designed to address the problem over the years and how such components are typically combined to achieve meaningful WSVC architectures. Coding schemes which mainly differ from the space-time order in which the wavelet transforms operate are here compared, discussing strengths and weaknesses of the resulting implementations. An evaluation of the achievable coding performances is provided considering the reference architectures studied and developed by ISO/MPEG in its exploration on WSVC. The paper also attempts to draw a list of major differences between wavelet based solutions and the SVC standard jointly targeted by ITU and ISO/MPEG. A major emphasis is devoted to a promising WSVC solution, named STP-tool, which presents architectural similarities with respect to the SVC standard. The paper ends drawing some evolution trends for WSVC systems and giving insights on video coding applications which could benefit by a wavelet based approach.partially_openpartially_openADAMI N; SIGNORONI. A; R. LEONARDIAdami, Nicola; Signoroni, Alberto; Leonardi, Riccard

    Development and implementation of image fusion algorithms based on wavelets

    Get PDF
    Image fusion is a process of blending the complementary as well as the common features of a set of images, to generate a resultant image with superior information content in terms of subjective as well as objective analysis point of view. The objective of this research work is to develop some novel image fusion algorithms and their applications in various fields such as crack detection, multi spectra sensor image fusion, medical image fusion and edge detection of multi-focus images etc. The first part of this research work deals with a novel crack detection technique based on Non-Destructive Testing (NDT) for cracks in walls suppressing the diversity and complexity of wall images. It follows different edge tracking algorithms such as Hyperbolic Tangent (HBT) filtering and canny edge detection algorithm. The second part of this research work deals with a novel edge detection approach for multi-focused images by means of complex wavelets based image fusion. An illumination invariant hyperbolic tangent filter (HBT) is applied followed by an adaptive thresholding to get the real edges. The shift invariance and directionally selective diagonal filtering as well as the ease of implementation of Dual-Tree Complex Wavelet Transform (DT-CWT) ensure robust sub band fusion. It helps in avoiding the ringing artefacts that are more pronounced in Discrete Wavelet Transform (DWT). The fusion using DT-CWT also solves the problem of low contrast and blocking effects. In the third part, an improved DT-CWT based image fusion technique has been developed to compose a resultant image with better perceptual as well as quantitative image quality indices. A bilateral sharpness based weighting scheme has been implemented for the high frequency coefficients taking both gradient and its phase coherence in accoun

    Color monogenic wavelet representation based on a tensor-like use of the riesz transform: application to image coding

    No full text
    11 pagesInternational audienceWe propose a new extension of monogenic analysis to multi-valued signals like color images. This generalization is based on an analogy between the Riesz transform and structure tensors and takes advantage of the well defined vector differential geometry. Our color wavelet transform is non-marginal and its coefficients - separated into amplitude, phase, orientation and local color axis - have interesting physical interpretation in terms of local energy, contour model, and colorimetric features. An image coding application is proposed as a practical study

    Visual motion : algorithms for analysis and application

    Get PDF
    Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Architecture, 1990.Includes bibliographical references (leaves 71-73).by Michael Adam Sokolov.M.S

    Comparison Of Sparse Coding And Jpeg Coding Schemes For Blurred Retinal Images.

    Get PDF
    Overcomplete representations are currently one of the highly researched areas especially in the field of signal processing due to their strong potential to generate sparse representation of signals. Sparse representation implies that given signal can be represented with components that are only rarely significantly active. It has been strongly argued that the mammalian visual system is highly related towards sparse and overcomplete representations. The primary visual cortex has overcomplete responses in representing an input signal which leads to the use of sparse neuronal activity for further processing. This work investigates the sparse coding with an overcomplete basis set representation which is believed to be the strategy employed by the mammalian visual system for efficient coding of natural images. This work analyzes the Sparse Code Learning algorithm in which the given image is represented by means of linear superposition of sparse statistically independent events on a set of overcomplete basis functions. This algorithm trains and adapts the overcomplete basis functions such as to represent any given image in terms of sparse structures. The second part of the work analyzes an inhibition based sparse coding model in which the Gabor based overcomplete representations are used to represent the image. It then applies an iterative inhibition algorithm based on competition between neighboring transform coefficients to select subset of Gabor functions such as to represent the given image with sparse set of coefficients. This work applies the developed models for the image compression applications and tests the achievable levels of compression of it. The research towards these areas so far proves that sparse coding algorithms are inefficient in representing high frequency sharp image features. So this work analyzes the performance of these algorithms only on the natural images which does not have sharp features and compares the compression results with the current industrial standard coding schemes such as JPEG and JPEG 2000. It also models the characteristics of an image falling on the retina after the distortion effects of the eye and then applies the developed algorithms towards these images and tests compression results
    corecore