25,932 research outputs found
Graph Spectral Image Processing
Recent advent of graph signal processing (GSP) has spurred intensive studies
of signals that live naturally on irregular data kernels described by graphs
(e.g., social networks, wireless sensor networks). Though a digital image
contains pixels that reside on a regularly sampled 2D grid, if one can design
an appropriate underlying graph connecting pixels with weights that reflect the
image structure, then one can interpret the image (or image patch) as a signal
on a graph, and apply GSP tools for processing and analysis of the signal in
graph spectral domain. In this article, we overview recent graph spectral
techniques in GSP specifically for image / video processing. The topics covered
include image compression, image restoration, image filtering and image
segmentation
Coding local and global binary visual features extracted from video sequences
Binary local features represent an effective alternative to real-valued
descriptors, leading to comparable results for many visual analysis tasks,
while being characterized by significantly lower computational complexity and
memory requirements. When dealing with large collections, a more compact
representation based on global features is often preferred, which can be
obtained from local features by means of, e.g., the Bag-of-Visual-Word (BoVW)
model. Several applications, including for example visual sensor networks and
mobile augmented reality, require visual features to be transmitted over a
bandwidth-limited network, thus calling for coding techniques that aim at
reducing the required bit budget, while attaining a target level of efficiency.
In this paper we investigate a coding scheme tailored to both local and global
binary features, which aims at exploiting both spatial and temporal redundancy
by means of intra- and inter-frame coding. In this respect, the proposed coding
scheme can be conveniently adopted to support the Analyze-Then-Compress (ATC)
paradigm. That is, visual features are extracted from the acquired content,
encoded at remote nodes, and finally transmitted to a central controller that
performs visual analysis. This is in contrast with the traditional approach, in
which visual content is acquired at a node, compressed and then sent to a
central unit for further processing, according to the Compress-Then-Analyze
(CTA) paradigm. In this paper we experimentally compare ATC and CTA by means of
rate-efficiency curves in the context of two different visual analysis tasks:
homography estimation and content-based retrieval. Our results show that the
novel ATC paradigm based on the proposed coding primitives can be competitive
with CTA, especially in bandwidth limited scenarios.Comment: submitted to IEEE Transactions on Image Processin
Transforms for intra prediction residuals based on prediction inaccuracy modeling
In intra video coding and image coding, the directional intra prediction is used to reduce spatial redundancy. Intra prediction residuals are encoded with transforms. In this paper, we develop transforms for directional intra prediction residuals. Specifically, we observe that the directional intra prediction is most effective in smooth regions and edges with a particular direction. In the ideal case, edges can be predicted fairly accurately with an accurate prediction direction. In practice, an accurate prediction direction is hard to obtain. Based on the inaccuracy of prediction direction that arises in the design of many practical video coding systems, we can estimate the residual variance and propose a class of transforms based on the estimated variance function. The proposed method is evaluated by the energy compaction property. Experimental results show that with the proposed method, the same amount of energy in directional intra prediction residuals can be preserved with a significantly smaller number of transform coefficients
Key-point Detection based Fast CU Decision for HEVC Intra Encoding
As the most recent video coding standard, High Efficiency Video Coding (HEVC) adopts various novel techniques, including a quad-tree based coding unit (CU) structure and additional angular modes used for intra encoding. These newtechniques achieve a notable improvement in coding efficiency at the penalty of significant computational complexity increase. Thus, a fast HEVC coding algorithm is highly desirable. In this paper, we propose a fast intra CU decision algorithm for HEVC to reduce the coding complexity, mainly based on a key-point detection. A CU block is considered to have multiple gradients and is early split if corner points are detected inside the block. On the other hand, a CU block without corner points is treated to be terminated when its RD cost is also small according to statistics of the previous frames. The proposed fast algorithm achieves over 62% encoding time reduction with 3.66%, 2.82%, and 2.53% BD-Rate loss for Y, U, and V components, averagely. The experimental results show that the proposed method is efficient to fast decide CU size in HEVC intra coding, even though only static parameters are applied to all test sequences
Mitigation of H.264 and H.265 Video Compression for Reliable PRNU Estimation
The photo-response non-uniformity (PRNU) is a distinctive image sensor
characteristic, and an imaging device inadvertently introduces its sensor's
PRNU into all media it captures. Therefore, the PRNU can be regarded as a
camera fingerprint and used for source attribution. The imaging pipeline in a
camera, however, involves various processing steps that are detrimental to PRNU
estimation. In the context of photographic images, these challenges are
successfully addressed and the method for estimating a sensor's PRNU pattern is
well established. However, various additional challenges related to generation
of videos remain largely untackled. With this perspective, this work introduces
methods to mitigate disruptive effects of widely deployed H.264 and H.265 video
compression standards on PRNU estimation. Our approach involves an intervention
in the decoding process to eliminate a filtering procedure applied at the
decoder to reduce blockiness. It also utilizes decoding parameters to develop a
weighting scheme and adjust the contribution of video frames at the macroblock
level to PRNU estimation process. Results obtained on videos captured by 28
cameras show that our approach increases the PRNU matching metric up to more
than five times over the conventional estimation method tailored for photos
A Deep-structured Conditional Random Field Model for Object Silhouette Tracking
In this work, we introduce a deep-structured conditional random field
(DS-CRF) model for the purpose of state-based object silhouette tracking. The
proposed DS-CRF model consists of a series of state layers, where each state
layer spatially characterizes the object silhouette at a particular point in
time. The interactions between adjacent state layers are established by
inter-layer connectivity dynamically determined based on inter-frame optical
flow. By incorporate both spatial and temporal context in a dynamic fashion
within such a deep-structured probabilistic graphical model, the proposed
DS-CRF model allows us to develop a framework that can accurately and
efficiently track object silhouettes that can change greatly over time, as well
as under different situations such as occlusion and multiple targets within the
scene. Experiment results using video surveillance datasets containing
different scenarios such as occlusion and multiple targets showed that the
proposed DS-CRF approach provides strong object silhouette tracking performance
when compared to baseline methods such as mean-shift tracking, as well as
state-of-the-art methods such as context tracking and boosted particle
filtering.Comment: 17 page
Quality Adaptive Least Squares Trained Filters for Video Compression Artifacts Removal Using a No-reference Block Visibility Metric
Compression artifacts removal is a challenging problem because videos can be compressed at different qualities. In this paper, a least squares approach that is self-adaptive to the visual quality of the input sequence is proposed. For compression artifacts, the visual quality of an image is measured by a no-reference block visibility metric. According to the blockiness visibility of an input image, an appropriate set of filter coefficients that are trained beforehand is selected for optimally removing coding artifacts and reconstructing object details. The performance of the proposed algorithm is evaluated on a variety of sequences compressed at different qualities in comparison to several other deblocking techniques. The proposed method outperforms the others significantly both objectively and subjectively
- …