199 research outputs found

    Transform recipes for efficient cloud photo enhancement

    Get PDF
    Cloud image processing is often proposed as a solution to the limited computing power and battery life of mobile devices: it allows complex algorithms to run on powerful servers with virtually unlimited energy supply. Unfortunately, this overlooks the time and energy cost of uploading the input and downloading the output images. When transfer overhead is accounted for, processing images on a remote server becomes less attractive and many applications do not benefit from cloud offloading. We aim to change this in the case of image enhancements that preserve the overall content of an image. Our key insight is that, in this case, the server can compute and transmit a description of the transformation from input to output, which we call a transform recipe. At equivalent quality, our recipes are much more compact than JPEG images: this reduces the client's download. Furthermore, recipes can be computed from highly compressed inputs which significantly reduces the data uploaded to the server. The client reconstructs a high-fidelity approximation of the output by applying the recipe to its local high-quality input. We demonstrate our results on 168 images and 10 image processing applications, showing that our recipes form a compact representation for a diverse set of image filters. With an equivalent transmission budget, they provide higher-quality results than JPEG-compressed input/output images, with a gain of the order of 10 dB in many cases. We demonstrate the utility of recipes on a mobile phone by profiling the energy consumption and latency for both local and cloud computation: a transform recipe-based pipeline runs 2--4x faster and uses 2--7x less energy than local or naive cloud computation.Qatar Computing Research InstituteUnited States. Defense Advanced Research Projects Agency (Agreement FA8750-14-2-0009)Stanford University. Stanford Pervasive Parallelism LaboratoryAdobe System

    Towards a High Quality Real-Time Graphics Pipeline

    Get PDF
    Modern graphics hardware pipelines create photorealistic images with high geometric complexity in real time. The quality is constantly improving and advanced techniques from feature film visual effects, such as high dynamic range images and support for higher-order surface primitives, have recently been adopted. Visual effect techniques have large computational costs and significant memory bandwidth usage. In this thesis, we identify three problem areas and propose new algorithms that increase the performance of a set of computer graphics techniques. Our main focus is on efficient algorithms for the real-time graphics pipeline, but parts of our research are equally applicable to offline rendering. Our first focus is texture compression, which is a technique to reduce the memory bandwidth usage. The core idea is to store images in small compressed blocks which are sent over the memory bus and are decompressed on-the-fly when accessed. We present compression algorithms for two types of texture formats. High dynamic range images capture environment lighting with luminance differences over a wide intensity range. Normal maps store perturbation vectors for local surface normals, and give the illusion of high geometric surface detail. Our compression formats are tailored to these texture types and have compression ratios of 6:1, high visual fidelity, and low-cost decompression logic. Our second focus is tessellation culling. Culling is a commonly used technique in computer graphics for removing work that does not contribute to the final image, such as completely hidden geometry. By discarding rendering primitives from further processing, substantial arithmetic computations and memory bandwidth can be saved. Modern graphics processing units include flexible tessellation stages, where rendering primitives are subdivided for increased geometric detail. Images with highly detailed models can be synthesized, but the incurred cost is significant. We have devised a simple remapping technique that allowsfor better tessellation distribution in screen space. Furthermore, we present programmable tessellation culling, where bounding volumes for displaced geometry are computed and used to conservatively test if a primitive can be discarded before tessellation. We introduce a general tessellation culling framework, and an optimized algorithm for rendering of displaced BĂ©zier patches, which is expected to be a common use case for graphics hardware tessellation. Our third and final focus is forward-looking, and relates to efficient algorithms for stochastic rasterization, a rendering technique where camera effects such as depth of field and motion blur can be faithfully simulated. We extend a graphics pipeline with stochastic rasterization in spatio-temporal space and show that stochastic motion blur can be rendered with rather modest pipeline modifications. Furthermore, backface culling algorithms for motion blur and depth of field rendering are presented, which are directly applicable to stochastic rasterization. Hopefully, our work in this field brings us closer to high quality real-time stochastic rendering

    De-velopment of Demosaicking Techniques for Multi-Spectral Imaging Using Mosaic Focal Plane Arrays

    Get PDF
    The use of mosaicked array technology in commercial digital cameras has madethem smaller, cheaper and mechanically more robust. In a mosaicked sensor, each pixel detector is covered with a wavelength-specific optical filter. Since only one spectral band is sensed per pixel location, there is an absence of information from the rest of the spectral bands. These unmeasured spectral bands are estimated by using information obtained from the neighborhood pixels. This process of estimating the unmeasured spectral band information is called demosaicking. The demosaicking process uses interpolation strategies to estimate the missing pixels. Sophisticated interpolation methods have been developed for performing this task in digital color cameras.In this thesis we propose to evaluate the adaptation of the mosaicked technol- ogy for multi-spectral cameras. Existing multi-spectral cameras use traditional methods like imaging spectrometers to capture a multi-spectral image. These methods are very expensive and delicate in nature. The objective of using the mosaicked technology for multi-spectral cameras is to reap the same benefits it offers in the commercial digital color cameras. However, the problem in using the mosaicked technology for multi-spectral images is the huge amount of missing pixels that need to be estimated in order to form the multi-spectral image. The estimation process becomes even more complicated as the number of bands in the multi-spectral image increases. Traditional demosaicking algorithms cannot be used because they have been specifically designed to suit three-band color images.This thesis focuses on developing new demosaicking algorithms for multi- spectral images. The existing demosaicking algorithms for color images have been extended for multi-spectral images. A new variation of the bilinear interpolationbased strategy has been developed to perform demosaicking. This demosaicking method uses variable neighborhood definitions to interpolate the missing spectral band values at each pixel locations in a multi-spectral image. A novel Maximum a-Posteriori (MAP) based demosaicking method has also been developed. This method treats demosaicking as an image restoration problem. It can derive op- timal estimation result that resembles the original image the best. In addition, it can simultaneously perform interpolation of missing spectral bands at pixel locations and also remove noise and degradations in the image.Extensive experimentation and comparisons have shown that the new demo- saicking methods for multi-spectral images developed in this thesis perform better than the traditional interpolation trategies. The outputs from the demosaicking methods have been shown to be better reconstructed estimates of the original im- ages and also have the ability to produce good classification results in applicationslike target recognition and discrimination

    Advancements in multi-view processing for reconstruction, registration and visualization.

    Get PDF
    The ever-increasing diffusion of digital cameras and the advancements in computer vision, image processing and storage capabilities have lead, in the latest years, to the wide diffusion of digital image collections. A set of digital images is usually referred as a multi-view images set when the pictures cover different views of the same physical object or location. In multi-view datasets, correlations between images are exploited in many different ways to increase our capability to gather enhanced understanding and information on a scene. For example, a collection can be enhanced leveraging on the camera position and orientation, or with information about the 3D structure of the scene. The range of applications of multi-view data is really wide, encompassing diverse fields such as image-based reconstruction, image-based localization, navigation of virtual environments, collective photographic retouching, computational photography, object recognition, etc. For all these reasons, the development of new algorithms to effectively create, process, and visualize this type of data is an active research trend. The thesis will present four different advancements related to different aspects of the multi-view data processing: - Image-based 3D reconstruction: we present a pre-processing algorithm, that is a special color-to-gray conversion. This was developed with the aim to improve the accuracy of image-based reconstruction algorithms. In particular, we show how different dense stereo matching results can be enhanced by application of a domain separation approach that pre-computes a single optimized numerical value for each image location. - Image-based appearance reconstruction: we present a multi-view processing algorithm, this can enhance the quality of the color transfer from multi-view images to a geo-referenced 3D model of a location of interest. The proposed approach computes virtual shadows and allows to automatically segment shadowed regions from the input images preventing to use those pixels in subsequent texture synthesis. - 2D to 3D registration: we present an unsupervised localization and registration system. This system can recognize a site that has been framed in a multi-view data and calibrate it on a pre-existing 3D representation. The system has a very high accuracy and it can validate the result in a completely unsupervised manner. The system accuracy is enough to seamlessly view input images correctly super-imposed on the 3D location of interest. - Visualization: we present PhotoCloud, a real-time client-server system for interactive exploration of high resolution 3D models and up to several thousand photographs aligned over this 3D data. PhotoCloud supports any 3D models that can be rendered in a depth-coherent way and arbitrary multi-view image collections. Moreover, it tolerates 2D-to-2D and 2D-to-3D misalignments, and it provides scalable visualization of generic integrated 2D and 3D datasets by exploiting data duality. A set of effective 3D navigation controls, tightly integrated with innovative thumbnail bars, enhances the user navigation. These advancements have been developed in tourism and cultural heritage application contexts, but they are not limited to these

    Accurate and discernible photocollages

    Get PDF
    There currently exist several techniques for selecting and combining images from a digital image library into a single image so that the result meets certain prespecified visual criteria. Image mosaic methods, first explored by Connors and Trivedi[18], arrange library images according to some tiling arrangement, often a regular grid, so that the combination of images, when viewed as a whole, resembles some input target image. Other techniques, such as Autocollage of Rother et al.[78], seek only to combine images in an interesting and visually pleasing manner, according to certain composition principles, without attempting to approximate any target image. Each of these techniques provide a myriad of creative options for artists who wish to combine several levels of meaning into a single image or who wish to exploit the meaning and symbolism contained in each of a large set of images through an efficient and easy process. We first examine the most notable and successful of these methods, and summarize the advantages and limitations of each. We then formulate a set of goals for an image collage system that combines the advantages of these methods while addressing and mitigating the drawbacks. Particularly, we propose a system for creating photocollages that approximate a target image as an aggregation of smaller images, chosen from a large library, so that interesting visual correspondences between images are exploited. In this way, we allow users to create collages in which multiple layers of meaning are encoded, with meaningful visual links between each layer. In service of this goal, we ensure that the images used are as large as possible and are combined in such a way that boundaries between images are not immediately apparent, as in Autocollage. This has required us to apply a multiscale approach to searching and comparing images from a large database, which achieves both speed and accuracy. We also propose a new framework for color post-processing, and propose novel techniques for decomposing images according to object and texture information

    Merging chrominance and luminance in early, medium, and late fusion using Convolutional Neural Networks

    Get PDF
    The field of Machine Learning has received extensive attention in recent years. More particularly, computer vision problems have got abundant consideration as the use of images and pictures in our daily routines is growing. The classification of images is one of the most important tasks that can be used to organize, store, retrieve, and explain pictures. In order to do that, researchers have been designing algorithms that automatically detect objects in images. During last decades, the common approach has been to create sets of features -- manually designed -- that could be exploited by image classification algorithms. More recently, researchers designed algorithms that automatically learn these sets of features, surpassing state-of-the-art performances. However, learning optimal sets of features is computationally expensive and it can be relaxed by adding prior knowledge about the task, improving and accelerating the learning phase. Furthermore, with problems with a large feature space the complexity of the models need to be reduced to make it computationally tractable (e.g. the recognition of human actions in videos). Consequently, we propose to use multimodal learning techniques to reduce the complexity of the learning phase in Artificial Neural Networks by incorporating prior knowledge about the connectivity of the network. Furthermore, we analyze state-of-the-art models for image classification and propose new architectures that can learn a locally optimal set of features in an easier and faster manner. In this thesis, we demonstrate that merging the luminance and the chrominance part of the images using multimodal learning techniques can improve the acquisition of good visual set of features. We compare the validation accuracy of several models and we demonstrate that our approach outperforms the basic model with statistically significant results

    Skin detection by dual maximization of detectors agreement for video monitoring

    Full text link
    This is the author’s version of a work that was accepted for publication in Journal Pattern Recognition Letters. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Journal Pattern Recognition Letters, 34, 16 (2013) DOI: 10.1016/j.patrec.2013.07.016This paper presents an approach for skin detection which is able to adapt its parameters to image data captured from video monitoring tasks with a medium field of view. It is composed of two detectors designed to get high and low probable skin pixels (respectively, regions and isolated pixels). Each one is based on thresholding two color channels, which are dynamically selected. Adaptation is based on the agreement maximization framework, whose aim is to find the configuration with the highest similarity between the channel results. Moreover, we improve such framework by learning how detector parameters are related and proposing an agreement function to consider expected skin properties. Finally, both detectors are combined by morphological reconstruction filtering to keep the skin regions whilst removing wrongly detected regions. The proposed approach is evaluated on heterogeneous human activity recognition datasets outperforming the most relevant state-of-the-art approaches.This work has been partially supported by the Spanish Government (TEC2011-25995 EventVideo)

    Digital multimedia development processes and optimizing techniques

    Get PDF
    • …
    corecore