11,960 research outputs found
Steered mixture-of-experts for light field images and video : representation and coding
Research in light field (LF) processing has heavily increased over the last decade. This is largely driven by the desire to achieve the same level of immersion and navigational freedom for camera-captured scenes as it is currently available for CGI content. Standardization organizations such as MPEG and JPEG continue to follow conventional coding paradigms in which viewpoints are discretely represented on 2-D regular grids. These grids are then further decorrelated through hybrid DPCM/transform techniques. However, these 2-D regular grids are less suited for high-dimensional data, such as LFs. We propose a novel coding framework for higher-dimensional image modalities, called Steered Mixture-of-Experts (SMoE). Coherent areas in the higher-dimensional space are represented by single higher-dimensional entities, called kernels. These kernels hold spatially localized information about light rays at any angle arriving at a certain region. The global model consists thus of a set of kernels which define a continuous approximation of the underlying plenoptic function. We introduce the theory of SMoE and illustrate its application for 2-D images, 4-D LF images, and 5-D LF video. We also propose an efficient coding strategy to convert the model parameters into a bitstream. Even without provisions for high-frequency information, the proposed method performs comparable to the state of the art for low-to-mid range bitrates with respect to subjective visual quality of 4-D LF images. In case of 5-D LF video, we observe superior decorrelation and coding performance with coding gains of a factor of 4x in bitrate for the same quality. At least equally important is the fact that our method inherently has desired functionality for LF rendering which is lacking in other state-of-the-art techniques: (1) full zero-delay random access, (2) light-weight pixel-parallel view reconstruction, and (3) intrinsic view interpolation and super-resolution
Change blindness: eradication of gestalt strategies
Arrays of eight, texture-defined rectangles were used as stimuli in a one-shot change blindness (CB) task where there was a 50% chance that one rectangle would change orientation between two successive presentations separated by an interval. CB was eliminated by cueing the target rectangle in the first stimulus, reduced by cueing in the interval and unaffected by cueing in the second presentation. This supports the idea that a representation was formed that persisted through the interval before being 'overwritten' by the second presentation (Landman et al, 2003 Vision Research 43149â164]. Another possibility is that participants used some kind of grouping or Gestalt strategy. To test this we changed the spatial position of the rectangles in the second presentation by shifting them along imaginary spokes (by ±1 degree) emanating from the central fixation point. There was no significant difference seen in performance between this and the standard task [F(1,4)=2.565, p=0.185]. This may suggest two things: (i) Gestalt grouping is not used as a strategy in these tasks, and (ii) it gives further weight to the argument that objects may be stored and retrieved from a pre-attentional store during this task
Perceptual Image Similarity Metrics and Applications.
This dissertation presents research in perceptual image similarity metrics and applications, e.g., content-based image retrieval, perceptual image compression, image similarity assessment and texture analysis.
The first part aims to design texture similarity metrics consistent with human perception. A new family of statistical texture similarity features, called Local Radius Index (LRI), and corresponding similarity metrics are proposed. Compared to state-of-the-art metrics in the STSIM family, LRI-based metrics achieve better texture retrieval performance with much less computation. When applied to the recently developed perceptual image coder, Matched Texture Coding (MTC), they enable similar performance while significantly accelerating encoding. Additionally, in photographic paper classification, LRI-based metrics also outperform pre-existing metrics. To fulfill the needs of texture classification and other applications, a rotation-invariant version of LRI, called Rotation-Invariant Local Radius Index (RI-LRI), is proposed. RI-LRI is also grayscale and illuminance insensitive. The corresponding similarity metric achieves texture classification accuracy comparable to state-of-the-art metrics. Moreover, its much lower dimensional feature vector requires substantially less computation and storage than other state-of-the-art texture features.
The second part of the dissertation focuses on bilevel images, which are images whose pixels are either black or white. The contributions include new objective similarity metrics intended to quantify similarity consistent with human perception, and a subjective experiment to obtain ground truth for judging the performance of objective metrics. Several similarity metrics are proposed that outperform existing ones in the sense of attaining significantly higher Pearson and Spearman-rank correlations with the ground truth. The new metrics include Adjusted Percentage Error, Bilevel Gradient Histogram, Connected Components Comparison and combinations of such.
Another portion of the dissertation focuses on the aforementioned MTC, which is a block-based image coder that uses texture similarity metrics to decide if blocks of the image can be encoded by pointing to perceptually similar ones in the already coded region. The key to its success is an effective texture similarity metric, such as an LRI-based metric, and an effective search strategy. Compared to traditional image compression algorithms, e.g., JPEG, MTC achieves similar coding rate with higher reconstruction quality. And the advantage of MTC becomes larger as coding rate decreases.PhDElectrical Engineering: SystemsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/113586/1/yhzhai_1.pd
Extended object reconstruction in adaptive-optics imaging: the multiresolution approach
We propose the application of multiresolution transforms, such as wavelets
(WT) and curvelets (CT), to the reconstruction of images of extended objects
that have been acquired with adaptive optics (AO) systems. Such multichannel
approaches normally make use of probabilistic tools in order to distinguish
significant structures from noise and reconstruction residuals. Furthermore, we
aim to check the historical assumption that image-reconstruction algorithms
using static PSFs are not suitable for AO imaging. We convolve an image of
Saturn taken with the Hubble Space Telescope (HST) with AO PSFs from the 5-m
Hale telescope at the Palomar Observatory and add both shot and readout noise.
Subsequently, we apply different approaches to the blurred and noisy data in
order to recover the original object. The approaches include multi-frame blind
deconvolution (with the algorithm IDAC), myopic deconvolution with
regularization (with MISTRAL) and wavelets- or curvelets-based static PSF
deconvolution (AWMLE and ACMLE algorithms). We used the mean squared error
(MSE) and the structural similarity index (SSIM) to compare the results. We
discuss the strengths and weaknesses of the two metrics. We found that CT
produces better results than WT, as measured in terms of MSE and SSIM.
Multichannel deconvolution with a static PSF produces results which are
generally better than the results obtained with the myopic/blind approaches
(for the images we tested) thus showing that the ability of a method to
suppress the noise and to track the underlying iterative process is just as
critical as the capability of the myopic/blind approaches to update the PSF.Comment: In revision in Astronomy & Astrophysics. 19 pages, 13 figure
Full Reference Objective Quality Assessment for Reconstructed Background Images
With an increased interest in applications that require a clean background
image, such as video surveillance, object tracking, street view imaging and
location-based services on web-based maps, multiple algorithms have been
developed to reconstruct a background image from cluttered scenes.
Traditionally, statistical measures and existing image quality techniques have
been applied for evaluating the quality of the reconstructed background images.
Though these quality assessment methods have been widely used in the past,
their performance in evaluating the perceived quality of the reconstructed
background image has not been verified. In this work, we discuss the
shortcomings in existing metrics and propose a full reference Reconstructed
Background image Quality Index (RBQI) that combines color and structural
information at multiple scales using a probability summation model to predict
the perceived quality in the reconstructed background image given a reference
image. To compare the performance of the proposed quality index with existing
image quality assessment measures, we construct two different datasets
consisting of reconstructed background images and corresponding subjective
scores. The quality assessment measures are evaluated by correlating their
objective scores with human subjective ratings. The correlation results show
that the proposed RBQI outperforms all the existing approaches. Additionally,
the constructed datasets and the corresponding subjective scores provide a
benchmark to evaluate the performance of future metrics that are developed to
evaluate the perceived quality of reconstructed background images.Comment: Associated source code: https://github.com/ashrotre/RBQI, Associated
Database:
https://drive.google.com/drive/folders/1bg8YRPIBcxpKIF9BIPisULPBPcA5x-Bk?usp=sharing
(Email for permissions at: ashrotreasuedu
DARTS: Double Attention Reference-based Transformer for Super-resolution
We present DARTS, a transformer model for reference-based image
super-resolution. DARTS learns joint representations of two image distributions
to enhance the content of low-resolution input images through matching
correspondences learned from high-resolution reference images. Current
state-of-the-art techniques in reference-based image super-resolution are based
on a multi-network, multi-stage architecture. In this work, we adapt the double
attention block from the GAN literature, processing the two visual streams
separately and combining self-attention and cross-attention blocks through a
gating attention strategy. Our work demonstrates how the attention mechanism
can be adapted for the particular requirements of reference-based image
super-resolution, significantly simplifying the architecture and training
pipeline. We show that our transformer-based model performs competitively with
state-of-the-art models, while maintaining a simpler overall architecture and
training process. In particular, we obtain state-of-the-art on the SUN80
dataset, with a PSNR/SSIM of 29.83 / .809. These results show that attention
alone is sufficient for the RSR task, without multiple purpose-built
subnetworks, knowledge distillation, or multi-stage training
- âŠ