11,837 research outputs found
A temporal phase coherence estimation algorithm and its application on DInSAR pixel selection
© 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Pixel selection is a crucial step of all advanced Differential Interferometric Synthetic Aperture Radar (DInSAR) techniques that have a direct impact on the quality of the final DInSAR products. In this paper, a full-resolution phase quality estimator, i.e., the temporal phase coherence (TPC), is proposed for DInSAR pixel selection. The method is able to work with both distributed scatterers (DSs) and permanent scatterers (PSs). The influence of different neighboring window sizes and types of interferograms combinations [both the single-master (SM) and the multi-master (MM)] on TPC has been studied. The relationship between TPC and phase standard deviation (STD) of the selected pixels has also been derived. Together with the classical coherence and amplitude dispersion methods, the TPC pixel selection algorithm has been tested on 37 VV polarization Radarsat-2 images of Barcelona Airport. Results show the feasibility and effectiveness of TPC pixel selection algorithm. Besides obvious improvements in the number of selected pixels, the new method shows some other advantages comparing with the other classical two. The proposed pixel selection algorithm, which presents an affordable computational cost, is easy to be implemented and incorporated into any advanced DInSAR processing chain for high-quality pixels' identification.Peer ReviewedPostprint (author's final draft
Solving Inverse Problems with Piecewise Linear Estimators: From Gaussian Mixture Models to Structured Sparsity
A general framework for solving image inverse problems is introduced in this
paper. The approach is based on Gaussian mixture models, estimated via a
computationally efficient MAP-EM algorithm. A dual mathematical interpretation
of the proposed framework with structured sparse estimation is described, which
shows that the resulting piecewise linear estimate stabilizes the estimation
when compared to traditional sparse inverse problem techniques. This
interpretation also suggests an effective dictionary motivated initialization
for the MAP-EM algorithm. We demonstrate that in a number of image inverse
problems, including inpainting, zooming, and deblurring, the same algorithm
produces either equal, often significantly better, or very small margin worse
results than the best published ones, at a lower computational cost.Comment: 30 page
JND-Based Perceptual Video Coding for 4:4:4 Screen Content Data in HEVC
The JCT-VC standardized Screen Content Coding (SCC) extension in the HEVC HM
RExt + SCM reference codec offers an impressive coding efficiency performance
when compared with HM RExt alone; however, it is not significantly perceptually
optimized. For instance, it does not include advanced HVS-based perceptual
coding methods, such as JND-based spatiotemporal masking schemes. In this
paper, we propose a novel JND-based perceptual video coding technique for HM
RExt + SCM. The proposed method is designed to further improve the compression
performance of HM RExt + SCM when applied to YCbCr 4:4:4 SC video data. In the
proposed technique, luminance masking and chrominance masking are exploited to
perceptually adjust the Quantization Step Size (QStep) at the Coding Block (CB)
level. Compared with HM RExt 16.10 + SCM 8.0, the proposed method considerably
reduces bitrates (Kbps), with a maximum reduction of 48.3%. In addition to
this, the subjective evaluations reveal that SC-PAQ achieves visually lossless
coding at very low bitrates.Comment: Preprint: 2018 IEEE International Conference on Acoustics, Speech and
Signal Processing (ICASSP 2018
Subjective Annotation for a Frame Interpolation Benchmark using Artefact Amplification
Current benchmarks for optical flow algorithms evaluate the estimation either
directly by comparing the predicted flow fields with the ground truth or
indirectly by using the predicted flow fields for frame interpolation and then
comparing the interpolated frames with the actual frames. In the latter case,
objective quality measures such as the mean squared error are typically
employed. However, it is well known that for image quality assessment, the
actual quality experienced by the user cannot be fully deduced from such simple
measures. Hence, we conducted a subjective quality assessment crowdscouring
study for the interpolated frames provided by one of the optical flow
benchmarks, the Middlebury benchmark. We collected forced-choice paired
comparisons between interpolated images and corresponding ground truth. To
increase the sensitivity of observers when judging minute difference in paired
comparisons we introduced a new method to the field of full-reference quality
assessment, called artefact amplification. From the crowdsourcing data, we
reconstructed absolute quality scale values according to Thurstone's model. As
a result, we obtained a re-ranking of the 155 participating algorithms w.r.t.
the visual quality of the interpolated frames. This re-ranking not only shows
the necessity of visual quality assessment as another evaluation metric for
optical flow and frame interpolation benchmarks, the results also provide the
ground truth for designing novel image quality assessment (IQA) methods
dedicated to perceptual quality of interpolated images. As a first step, we
proposed such a new full-reference method, called WAE-IQA. By weighing the
local differences between an interpolated image and its ground truth WAE-IQA
performed slightly better than the currently best FR-IQA approach from the
literature.Comment: arXiv admin note: text overlap with arXiv:1901.0536
Real-Time RGB-D Camera Pose Estimation in Novel Scenes using a Relocalisation Cascade
Camera pose estimation is an important problem in computer vision. Common
techniques either match the current image against keyframes with known poses,
directly regress the pose, or establish correspondences between keypoints in
the image and points in the scene to estimate the pose. In recent years,
regression forests have become a popular alternative to establish such
correspondences. They achieve accurate results, but have traditionally needed
to be trained offline on the target scene, preventing relocalisation in new
environments. Recently, we showed how to circumvent this limitation by adapting
a pre-trained forest to a new scene on the fly. The adapted forests achieved
relocalisation performance that was on par with that of offline forests, and
our approach was able to estimate the camera pose in close to real time. In
this paper, we present an extension of this work that achieves significantly
better relocalisation performance whilst running fully in real time. To achieve
this, we make several changes to the original approach: (i) instead of
accepting the camera pose hypothesis without question, we make it possible to
score the final few hypotheses using a geometric approach and select the most
promising; (ii) we chain several instantiations of our relocaliser together in
a cascade, allowing us to try faster but less accurate relocalisation first,
only falling back to slower, more accurate relocalisation as necessary; and
(iii) we tune the parameters of our cascade to achieve effective overall
performance. These changes allow us to significantly improve upon the
performance our original state-of-the-art method was able to achieve on the
well-known 7-Scenes and Stanford 4 Scenes benchmarks. As additional
contributions, we present a way of visualising the internal behaviour of our
forests and show how to entirely circumvent the need to pre-train a forest on a
generic scene.Comment: Tommaso Cavallari, Stuart Golodetz, Nicholas Lord and Julien Valentin
assert joint first authorshi
Recurrent Scene Parsing with Perspective Understanding in the Loop
Objects may appear at arbitrary scales in perspective images of a scene,
posing a challenge for recognition systems that process images at a fixed
resolution. We propose a depth-aware gating module that adaptively selects the
pooling field size in a convolutional network architecture according to the
object scale (inversely proportional to the depth) so that small details are
preserved for distant objects while larger receptive fields are used for those
nearby. The depth gating signal is provided by stereo disparity or estimated
directly from monocular input. We integrate this depth-aware gating into a
recurrent convolutional neural network to perform semantic segmentation. Our
recurrent module iteratively refines the segmentation results, leveraging the
depth and semantic predictions from the previous iterations.
Through extensive experiments on four popular large-scale RGB-D datasets, we
demonstrate this approach achieves competitive semantic segmentation
performance with a model which is substantially more compact. We carry out
extensive analysis of this architecture including variants that operate on
monocular RGB but use depth as side-information during training, unsupervised
gating as a generic attentional mechanism, and multi-resolution gating. We find
that gated pooling for joint semantic segmentation and depth yields
state-of-the-art results for quantitative monocular depth estimation
Foreground Detection in Camouflaged Scenes
Foreground detection has been widely studied for decades due to its
importance in many practical applications. Most of the existing methods assume
foreground and background show visually distinct characteristics and thus the
foreground can be detected once a good background model is obtained. However,
there are many situations where this is not the case. Of particular interest in
video surveillance is the camouflage case. For example, an active attacker
camouflages by intentionally wearing clothes that are visually similar to the
background. In such cases, even given a decent background model, it is not
trivial to detect foreground objects. This paper proposes a texture guided
weighted voting (TGWV) method which can efficiently detect foreground objects
in camouflaged scenes. The proposed method employs the stationary wavelet
transform to decompose the image into frequency bands. We show that the small
and hardly noticeable differences between foreground and background in the
image domain can be effectively captured in certain wavelet frequency bands. To
make the final foreground decision, a weighted voting scheme is developed based
on intensity and texture of all the wavelet bands with weights carefully
designed. Experimental results demonstrate that the proposed method achieves
superior performance compared to the current state-of-the-art results.Comment: IEEE International Conference on Image Processing, 201
Geometry-based spherical JND modeling for 360 display
360 videos have received widespread attention due to its realistic
and immersive experiences for users. To date, how to accurately model the user
perceptions on 360 display is still a challenging issue. In this paper,
we exploit the visual characteristics of 360 projection and display and
extend the popular just noticeable difference (JND) model to spherical JND
(SJND). First, we propose a quantitative 2D-JND model by jointly considering
spatial contrast sensitivity, luminance adaptation and texture masking effect.
In particular, our model introduces an entropy-based region classification and
utilizes different parameters for different types of regions for better
modeling performance. Second, we extend our 2D-JND model to SJND by jointly
exploiting latitude projection and field of view during 360 display.
With this operation, SJND reflects both the characteristics of human vision
system and the 360 display. Third, our SJND model is more consistent
with user perceptions during subjective test and also shows more tolerance in
distortions with fewer bit rates during 360 video compression. To
further examine the effectiveness of our SJND model, we embed it in Versatile
Video Coding (VVC) compression. Compared with the state-of-the-arts, our
SJND-VVC framework significantly reduced the bit rate with negligible loss in
visual quality
- …