12,389 research outputs found
Unsupervised Sparse Dirichlet-Net for Hyperspectral Image Super-Resolution
In many computer vision applications, obtaining images of high resolution in
both the spatial and spectral domains are equally important. However, due to
hardware limitations, one can only expect to acquire images of high resolution
in either the spatial or spectral domains. This paper focuses on hyperspectral
image super-resolution (HSI-SR), where a hyperspectral image (HSI) with low
spatial resolution (LR) but high spectral resolution is fused with a
multispectral image (MSI) with high spatial resolution (HR) but low spectral
resolution to obtain HR HSI. Existing deep learning-based solutions are all
supervised that would need a large training set and the availability of HR HSI,
which is unrealistic. Here, we make the first attempt to solving the HSI-SR
problem using an unsupervised encoder-decoder architecture that carries the
following uniquenesses. First, it is composed of two encoder-decoder networks,
coupled through a shared decoder, in order to preserve the rich spectral
information from the HSI network. Second, the network encourages the
representations from both modalities to follow a sparse Dirichlet distribution
which naturally incorporates the two physical constraints of HSI and MSI.
Third, the angular difference between representations are minimized in order to
reduce the spectral distortion. We refer to the proposed architecture as
unsupervised Sparse Dirichlet-Net, or uSDN. Extensive experimental results
demonstrate the superior performance of uSDN as compared to the
state-of-the-art.Comment: Accepted by The IEEE Conference on Computer Vision and Pattern
Recognition (CVPR 2018, Spotlight
FRESH – FRI-based single-image super-resolution algorithm
In this paper, we consider the problem of single image super-resolution and propose a novel algorithm that outperforms state-of-the-art methods without the need of learning patches pairs from external data sets. We achieve this by modeling images and, more precisely, lines of images as piecewise smooth functions and propose a resolution enhancement method for this type of functions. The method makes use of the theory of sampling signals with finite rate of innovation (FRI) and combines it with traditional linear reconstruction methods. We combine the two reconstructions by leveraging from the multi-resolution analysis in wavelet theory and show how an FRI reconstruction and a linear reconstruction can be fused using filter banks. We then apply this method along vertical, horizontal, and diagonal directions in an image to obtain a single-image super-resolution algorithm. We also propose a further improvement of the method based on learning from the errors of our super-resolution result at lower resolution levels. Simulation results show that our method outperforms state-of-the-art algorithms under different blurring kernels
Deep Autoencoder for Combined Human Pose Estimation and body Model Upscaling
We present a method for simultaneously estimating 3D human pose and body
shape from a sparse set of wide-baseline camera views. We train a symmetric
convolutional autoencoder with a dual loss that enforces learning of a latent
representation that encodes skeletal joint positions, and at the same time
learns a deep representation of volumetric body shape. We harness the latter to
up-scale input volumetric data by a factor of , whilst recovering a
3D estimate of joint positions with equal or greater accuracy than the state of
the art. Inference runs in real-time (25 fps) and has the potential for passive
human behaviour monitoring where there is a requirement for high fidelity
estimation of human body shape and pose
Steered mixture-of-experts for light field images and video : representation and coding
Research in light field (LF) processing has heavily increased over the last decade. This is largely driven by the desire to achieve the same level of immersion and navigational freedom for camera-captured scenes as it is currently available for CGI content. Standardization organizations such as MPEG and JPEG continue to follow conventional coding paradigms in which viewpoints are discretely represented on 2-D regular grids. These grids are then further decorrelated through hybrid DPCM/transform techniques. However, these 2-D regular grids are less suited for high-dimensional data, such as LFs. We propose a novel coding framework for higher-dimensional image modalities, called Steered Mixture-of-Experts (SMoE). Coherent areas in the higher-dimensional space are represented by single higher-dimensional entities, called kernels. These kernels hold spatially localized information about light rays at any angle arriving at a certain region. The global model consists thus of a set of kernels which define a continuous approximation of the underlying plenoptic function. We introduce the theory of SMoE and illustrate its application for 2-D images, 4-D LF images, and 5-D LF video. We also propose an efficient coding strategy to convert the model parameters into a bitstream. Even without provisions for high-frequency information, the proposed method performs comparable to the state of the art for low-to-mid range bitrates with respect to subjective visual quality of 4-D LF images. In case of 5-D LF video, we observe superior decorrelation and coding performance with coding gains of a factor of 4x in bitrate for the same quality. At least equally important is the fact that our method inherently has desired functionality for LF rendering which is lacking in other state-of-the-art techniques: (1) full zero-delay random access, (2) light-weight pixel-parallel view reconstruction, and (3) intrinsic view interpolation and super-resolution
- …