1,272 research outputs found
Steerable Discrete Cosine Transform
In image compression, classical block-based separable transforms tend to be
inefficient when image blocks contain arbitrarily shaped discontinuities. For
this reason, transforms incorporating directional information are an appealing
alternative. In this paper, we propose a new approach to this problem, namely a
discrete cosine transform (DCT) that can be steered in any chosen direction.
Such transform, called steerable DCT (SDCT), allows to rotate in a flexible way
pairs of basis vectors, and enables precise matching of directionality in each
image block, achieving improved coding efficiency. The optimal rotation angles
for SDCT can be represented as solution of a suitable rate-distortion (RD)
problem. We propose iterative methods to search such solution, and we develop a
fully fledged image encoder to practically compare our techniques with other
competing transforms. Analytical and numerical results prove that SDCT
outperforms both DCT and state-of-the-art directional transforms
Steerable Discrete Fourier Transform
Directional transforms have recently raised a lot of interest thanks to their
numerous applications in signal compression and analysis. In this letter, we
introduce a generalization of the discrete Fourier transform, called steerable
DFT (SDFT). Since the DFT is used in numerous fields, it may be of interest in
a wide range of applications. Moreover, we also show that the SDFT is highly
related to other well-known transforms, such as the Fourier sine and cosine
transforms and the Hilbert transforms
VLSI Architectures for the Steerable-Discrete-Cosine-Transform (SDCT)
Since frame resolution of modern video streams is rapidly growing, the need for more complex and efficient video compression methods arises. H.265/HEVC represents the state of the art in video coding standard. Its architecture is however not completely standardized, as many parts are only described at software level to allow the designer to implement new compression techniques. This paper presents an innovative hardware architecture for the Steerable Discrete Cosine Transform (SDCT), which has been recently embedded into the HEVC standard, providing better compression ratios. Such technique exploits directional DCT using basis having different orientation angles, leading to a sparser representation which translates to an improved coding efficiency. The final design is able to work at a frequency of 188 MHZ, reaching a throughput of 3.00 GSample/s. In particular, this architecture supports 8k UltraHigh Definition (UHD) (7680 Ă— 4320) with a frame rate of 60 Hz, which is one of the best resolutions supported by HEVC
On The Continuous Steering of the Scale of Tight Wavelet Frames
In analogy with steerable wavelets, we present a general construction of
adaptable tight wavelet frames, with an emphasis on scaling operations. In
particular, the derived wavelets can be "dilated" by a procedure comparable to
the operation of steering steerable wavelets. The fundamental aspects of the
construction are the same: an admissible collection of Fourier multipliers is
used to extend a tight wavelet frame, and the "scale" of the wavelets is
adapted by scaling the multipliers. As an application, the proposed wavelets
can be used to improve the frequency localization. Importantly, the localized
frequency bands specified by this construction can be scaled efficiently using
matrix multiplication
Interpretable Transformations with Encoder-Decoder Networks
Deep feature spaces have the capacity to encode complex transformations of
their input data. However, understanding the relative feature-space
relationship between two transformed encoded images is difficult. For instance,
what is the relative feature space relationship between two rotated images?
What is decoded when we interpolate in feature space? Ideally, we want to
disentangle confounding factors, such as pose, appearance, and illumination,
from object identity. Disentangling these is difficult because they interact in
very nonlinear ways. We propose a simple method to construct a deep feature
space, with explicitly disentangled representations of several known
transformations. A person or algorithm can then manipulate the disentangled
representation, for example, to re-render an image with explicit control over
parameterized degrees of freedom. The feature space is constructed using a
transforming encoder-decoder network with a custom feature transform layer,
acting on the hidden representations. We demonstrate the advantages of explicit
disentangling on a variety of datasets and transformations, and as an aid for
traditional tasks, such as classification.Comment: Accepted at ICCV 201
Design and Optimization of Graph Transform for Image and Video Compression
The main contribution of this thesis is the introduction of new methods for designing adaptive transforms for image and video compression. Exploiting graph signal processing techniques, we develop new graph construction methods targeted for image and video compression applications. In this way, we obtain a graph that is, at the same time, a good representation of the image and easy to transmit to the decoder. To do so, we investigate different research directions. First, we propose a new method for graph construction that employs innovative edge metrics, quantization and edge prediction techniques. Then, we propose to use a graph learning approach and we introduce a new graph learning algorithm targeted for image compression that defines the connectivities between pixels by taking into consideration the coding of the image signal and the graph topology in rate-distortion term. Moreover, we also present a new superpixel-driven graph transform that uses clusters of superpixel as coding blocks and then computes the graph transform inside each region.
In the second part of this work, we exploit graphs to design directional transforms. In fact, an efficient representation of the image directional information is extremely important in order to obtain high performance image and video coding. In this thesis, we present a new directional transform, called Steerable Discrete Cosine Transform (SDCT). This new transform can be obtained by steering the 2D-DCT basis in any chosen direction. Moreover, we can also use more complex steering patterns than a single pure rotation. In order to show the advantages of the SDCT, we present a few image and video compression methods based on this new directional transform. The obtained results show that the SDCT can be efficiently applied to image and video compression and it outperforms the classical DCT and other directional transforms. Along the same lines, we present also a new generalization of the DFT, called Steerable DFT (SDFT). Differently from the SDCT, the SDFT can be defined in one or two dimensions. The 1D-SDFT represents a rotation in the complex plane, instead the 2D-SDFT performs a rotation in the 2D Euclidean space
Graph Spectral Image Processing
Recent advent of graph signal processing (GSP) has spurred intensive studies
of signals that live naturally on irregular data kernels described by graphs
(e.g., social networks, wireless sensor networks). Though a digital image
contains pixels that reside on a regularly sampled 2D grid, if one can design
an appropriate underlying graph connecting pixels with weights that reflect the
image structure, then one can interpret the image (or image patch) as a signal
on a graph, and apply GSP tools for processing and analysis of the signal in
graph spectral domain. In this article, we overview recent graph spectral
techniques in GSP specifically for image / video processing. The topics covered
include image compression, image restoration, image filtering and image
segmentation
A Panorama on Multiscale Geometric Representations, Intertwining Spatial, Directional and Frequency Selectivity
The richness of natural images makes the quest for optimal representations in
image processing and computer vision challenging. The latter observation has
not prevented the design of image representations, which trade off between
efficiency and complexity, while achieving accurate rendering of smooth regions
as well as reproducing faithful contours and textures. The most recent ones,
proposed in the past decade, share an hybrid heritage highlighting the
multiscale and oriented nature of edges and patterns in images. This paper
presents a panorama of the aforementioned literature on decompositions in
multiscale, multi-orientation bases or dictionaries. They typically exhibit
redundancy to improve sparsity in the transformed domain and sometimes its
invariance with respect to simple geometric deformations (translation,
rotation). Oriented multiscale dictionaries extend traditional wavelet
processing and may offer rotation invariance. Highly redundant dictionaries
require specific algorithms to simplify the search for an efficient (sparse)
representation. We also discuss the extension of multiscale geometric
decompositions to non-Euclidean domains such as the sphere or arbitrary meshed
surfaces. The etymology of panorama suggests an overview, based on a choice of
partially overlapping "pictures". We hope that this paper will contribute to
the appreciation and apprehension of a stream of current research directions in
image understanding.Comment: 65 pages, 33 figures, 303 reference
- …