37 research outputs found
Sparse Modeling for Image and Vision Processing
In recent years, a large amount of multi-disciplinary research has been
conducted on sparse models and their applications. In statistics and machine
learning, the sparsity principle is used to perform model selection---that is,
automatically selecting a simple model among a large collection of them. In
signal processing, sparse coding consists of representing data with linear
combinations of a few dictionary elements. Subsequently, the corresponding
tools have been widely adopted by several scientific communities such as
neuroscience, bioinformatics, or computer vision. The goal of this monograph is
to offer a self-contained view of sparse modeling for visual recognition and
image processing. More specifically, we focus on applications where the
dictionary is learned and adapted to data, yielding a compact representation
that has been successful in various contexts.Comment: 205 pages, to appear in Foundations and Trends in Computer Graphics
and Visio
Deep Structured Layers for Instance-Level Optimization in 2D and 3D Vision
The approach we present in this thesis is that of integrating optimization problems
as layers in deep neural networks. Optimization-based modeling provides an additional set of tools enabling the design of powerful neural networks for a wide
battery of computer vision tasks. This thesis shows formulations and experiments
for vision tasks ranging from image reconstruction to 3D reconstruction.
We first propose an unrolled optimization method with implicit regularization
properties for reconstructing images from noisy camera readings. The method resembles an unrolled majorization minimization framework with convolutional neural networks acting as regularizers. We report state-of-the-art performance in image
reconstruction on both noisy and noise-free evaluation setups across many datasets.
We further focus on the task of monocular 3D reconstruction of articulated objects using video self-supervision. The proposed method uses a structured layer for
accurate object deformation that controls a 3D surface by displacing a small number
of learnable handles. While relying on a small set of training data per category for
self-supervision, the method obtains state-of-the-art reconstruction accuracy with
diverse shapes and viewpoints for multiple articulated objects.
We finally address the shortcomings of the previous method that revolve
around regressing the camera pose using multiple hypotheses. We propose a method
that recovers a 3D shape from a 2D image by relying solely on 3D-2D correspondences regressed from a convolutional neural network. These correspondences are
used in conjunction with an optimization problem to estimate per sample the camera pose and deformation. We quantitatively show the effectiveness of the proposed
method on self-supervised 3D reconstruction on multiple categories without the need for multiple hypotheses
GECOM: GREEN COMMUNICATION CONCEPTS FOR ENERGY EFFICIENCY IN WIRELESS MULTIMEDIA SENSOR NETWORK
Wireless multimedia sensor network (WMSN) is one of broad wide application for developing a smart city. Each node in the WMSN has some primary components: sensor, microcontroller, wireless radio, and battery. The components of WMSN are used for sensing, computing, communicating between nodes, and flexibility of placement. However, the WMSN technology has some weakness, i.e. enormous power consumption when sending a media with a large size such as image, audio, and video files. Research had been conducted to reduce power consumption, such as file compression or power consumption management, in the process of sending data. We propose Green Communication (GeCom), which combines power control management and file compression methods to reduce the energy consumption. The power control management method controls data transmission. If the current data has high similarity with the previous one, then the data will not be sent. The compression method compresses massive data such as images before sending the data. We used the low energy image compression algorithm algorithm to compress the data for its ability to maintain the quality of images while producing a significant compression ratio. This method successfully reduced energy usage by 2% to 17% for each data. Â
Efficient training procedures for multi-spectral demosaicing
The simultaneous acquisition of multi-spectral images on a single sensor can be efficiently performed by single shot capture using a mutli-spectral filter array. This paper focused on the demosaicing of color and near-infrared bands and relied on a convolutional neural network (CNN). To train the deep learning model robustly and accurately, it is necessary to provide enough training data, with sufficient variability. We focused on the design of an efficient training procedure by discovering an optimal training dataset. We propose two data selection strategies, motivated by slightly different concepts. The general term that will be used for the proposed models trained using data selection is data selection-based multi-spectral demosaicing (DSMD). The first idea is clustering-based data selection (DSMD-C), with the goal to discover a representative subset with a high variance so as to train a robust model. The second is an adaptive-based data selection (DSMD-A), a self-guided approach that selects new data based on the current model accuracy. We performed a controlled experimental evaluation of the proposed training strategies and the results show that a careful selection of data does benefit the speed and accuracy of training. We are still able to achieve high reconstruction accuracy with a lightweight model
Graph Spectral Image Processing
Recent advent of graph signal processing (GSP) has spurred intensive studies
of signals that live naturally on irregular data kernels described by graphs
(e.g., social networks, wireless sensor networks). Though a digital image
contains pixels that reside on a regularly sampled 2D grid, if one can design
an appropriate underlying graph connecting pixels with weights that reflect the
image structure, then one can interpret the image (or image patch) as a signal
on a graph, and apply GSP tools for processing and analysis of the signal in
graph spectral domain. In this article, we overview recent graph spectral
techniques in GSP specifically for image / video processing. The topics covered
include image compression, image restoration, image filtering and image
segmentation
Deep Residual Network for Joint Demosaicing and Super-Resolution
In digital photography, two image restoration tasks have been studied
extensively and resolved independently: demosaicing and super-resolution. Both
these tasks are related to resolution limitations of the camera. Performing
super-resolution on a demosaiced images simply exacerbates the artifacts
introduced by demosaicing. In this paper, we show that such accumulation of
errors can be easily averted by jointly performing demosaicing and
super-resolution. To this end, we propose a deep residual network for learning
an end-to-end mapping between Bayer images and high-resolution images. By
training on high-quality samples, our deep residual demosaicing and
super-resolution network is able to recover high-quality super-resolved images
from low-resolution Bayer mosaics in a single step without producing the
artifacts common to such processing when the two operations are done
separately. We perform extensive experiments to show that our deep residual
network achieves demosaiced and super-resolved images that are superior to the
state-of-the-art both qualitatively and in terms of PSNR and SSIM metrics