2,180 research outputs found
GLOBAL MOTION GUIDED ADAPTIVE TEMPORAL INTER- / EXTRAPOLATION FOR SIDE INFORMATION GENERATION IN DISTRIBUTED VIDEO CODING
The ongoing research on Distributed Video Coding (DVC) is focused on flexibility and increased rate distortion (RD) performance. Besides Slepian-Wolf (SW) coding and coder control, a good side information (SI) quality is essential for high RD performance. The SI is typically extracted by temporal inter-/extrapolation. Fast motion is still a challenge. Two global motion guided adaptive temporal inter-/extrapolation (GTIE/GpTIE) schemes are proposed. They incorporate fast camera motion by global motion estimation and subsequent refinement. The issue of occlusion and revelation at the frame border is solved by an adaptive temporal inter-/extrapolation method. Simulation results show an RD performance increase of up to 3.1 dB
Fusion of Global and Local Motion Estimation Using Foreground Objects for Distributed Video Coding
International audienceThe side information in distributed video coding is estimated using the available decoded frames, and exploited for the decoding and reconstruction of other frames. The quality of the side information has a strong impact on the performance of distributed video coding. Here we propose a new approach that combines both global and local side information to improve coding performance. Since the background pixels in a frame are assigned to global estimation and the foreground objects to local estimation, one needs to estimate foreground objects in the side information using the backward and forward foreground objects, The background pixels are directly taken from the global side information. Specifically, elastic curves and local motion compensation are used to generate the foreground objects masks in the side information. Experimental results show that, as far as the rate-distortion performance is concerned, the proposed approach can achieve a PSNR improvement of up to 1.39 dB for a GOP size of 2, and up to 4.73 dB for larger GOP sizes, with respect to the reference DISCOVER codec. Index Terms A. ABOU-ELAILAH, F. DUFAUX, M. CAGNAZZO, and B. PESQUET-POPESCU are with the Signal and Image Processin
Recommended from our members
Intelligent Side Information Generation in Distributed Video Coding
Distributed video coding (DVC) reverses the traditional coding paradigm of complex encoders allied with basic decoding to one where the computational cost is largely incurred by the decoder. This is attractive as the proven theoretical work of Wyner-Ziv (WZ) and Slepian-Wolf (SW) shows that the performance by such a system should be exactly the same as a conventional coder. Despite the solid theoretical foundations, current DVC qualitative and quantitative performance falls short of existing conventional coders and there remain crucial limitations. A key constraint governing DVC performance is the quality of side information (SI), a coarse representation of original video frames which are not available at the decoder. Techniques to generate SI have usually been based on linear motion compensated temporal interpolation (LMCTI), though these do not always produce satisfactory SI quality, especially in sequences exhibiting non-linear motion.
This thesis presents an intelligent higher order piecewise trajectory temporal interpolation (HOPTTI) framework for SI generation with original contributions that afford better SI quality in comparison to existing LMCTI-based approaches. The major elements in this framework are: (i) a cubic trajectory interpolation algorithm model that significantly improves the accuracy of motion vector estimations; (ii) an adaptive overlapped block motion compensation (AOBMC) model which reduces both blocking and overlapping artefacts in the SI emanating from the block matching algorithm; (iii) the development of an empirical mode switching algorithm; and (iv) an intelligent switching mechanism to construct SI by automatically selecting the best macroblock from the intermediate SI generated by HOPTTI and AOBMC algorithms. Rigorous analysis and evaluation confirms that significant quantitative and perceptual improvements in SI quality are achieved with the new framework
REGION-BASED ADAPTIVE DISTRIBUTED VIDEO CODING CODEC
The recently developed Distributed Video Coding (DVC) is typically suitable for the
applications where the conventional video coding is not feasible because of its
inherent high-complexity encoding. Examples include video surveillance usmg
wireless/wired video sensor network and applications using mobile cameras etc. With
DVC, the complexity is shifted from the encoder to the decoder.
The practical application of DVC is referred to as Wyner-Ziv video coding (WZ)
where an estimate of the original frame called "side information" is generated using
motion compensation at the decoder. The compression is achieved by sending only
that extra information that is needed to correct this estimation. An error-correcting
code is used with the assumption that the estimate is a noisy version of the original
frame and the rate needed is certain amount of the parity bits. The side information is
assumed to have become available at the decoder through a virtual channel. Due to
the limitation of compensation method, the predicted frame, or the side information, is
expected to have varying degrees of success. These limitations stem from locationspecific
non-stationary estimation noise. In order to avoid these, the conventional
video coders, like MPEG, make use of frame partitioning to allocate optimum coder
for each partition and hence achieve better rate-distortion performance. The same,
however, has not been used in DVC as it increases the encoder complexity.
This work proposes partitioning the considered frame into many coding units
(region) where each unit is encoded differently. This partitioning is, however, done at
the decoder while generating the side-information and the region map is sent over to
encoder at very little rate penalty. The partitioning allows allocation of appropriate
DVC coding parameters (virtual channel, rate, and quantizer) to each region. The
resulting regions map is compressed by employing quadtree algorithm and
communicated to the encoder via the feedback channel. The rate control in DVC is
performed by channel coding techniques (turbo codes, LDPC, etc.). The performance
of the channel code depends heavily on the accuracy of virtual channel model that models estimation error for each region. In this work, a turbo code has been used and
an adaptive WZ DVC is designed both in transform domain and in pixel domain. The
transform domain WZ video coding (TDWZ) has distinct superior performance as
compared to the normal Pixel Domain Wyner-Ziv (PDWZ), since it exploits the
'
spatial redundancy during the encoding. The performance evaluations show that the
proposed system is superior to the existing distributed video coding solutions.
Although the, proposed system requires extra bits representing the "regions map" to be
transmitted, fuut still the rate gain is noticeable and it outperforms the state-of-the-art
frame based DVC by 0.6-1.9 dB.
The feedback channel (FC) has the role to adapt the bit rate to the changing
'
statistics between the side infonmation and the frame to be encoded. In the
unidirectional scenario, the encoder must perform the rate control. To correctly
estimate the rate, the encoder must calculate typical side information. However, the
rate cannot be exactly calculated at the encoder, instead it can only be estimated. This
work also prbposes a feedback-free region-based adaptive DVC solution in pixel
domain based on machine learning approach to estimate the side information.
Although the performance evaluations show rate-penalty but it is acceptable
considering the simplicity of the proposed algorithm.
vii
Ein Beitrag zur Pixel-basierten Verteilten Videocodierung: Seiteninformationsgenerierung, WZ-Codierung und flexible Decodierung
Moderne Anwendungsszenarien, wie die individuelle Übertragung von Videodaten zwischen mobilen Endgeräten, stellen neue Herausforderungen an das Videoübertragungssystem. Hierbei liegt ein besonderer Fokus auf der geringen Komplexität des Videoencoders. Diese Anforderung kann mit Hilfe der Verteilten Videocodierung erfüllt werden.
Im Fokus der vorliegenden Arbeit liegen die sehr geringe Encoderkomplexität sowie auch die Steigerung der Leistungsfähigkeit und die Verbesserung der Flexibilität des Decodierungsprozesses.
Einer der wesentlichen Beiträge der Arbeit bezieht sich auf die Verbesserung der Seiteninformationsqualität durch temporale Interpolation
Highly efficient low-level feature extraction for video representation and retrieval.
PhDWitnessing the omnipresence of digital video media, the research community has
raised the question of its meaningful use and management. Stored in immense
multimedia databases, digital videos need to be retrieved and structured in an
intelligent way, relying on the content and the rich semantics involved. Current
Content Based Video Indexing and Retrieval systems face the problem of the semantic
gap between the simplicity of the available visual features and the richness of user
semantics.
This work focuses on the issues of efficiency and scalability in video indexing and
retrieval to facilitate a video representation model capable of semantic annotation. A
highly efficient algorithm for temporal analysis and key-frame extraction is developed.
It is based on the prediction information extracted directly from the compressed domain
features and the robust scalable analysis in the temporal domain. Furthermore,
a hierarchical quantisation of the colour features in the descriptor space is presented.
Derived from the extracted set of low-level features, a video representation model that
enables semantic annotation and contextual genre classification is designed.
Results demonstrate the efficiency and robustness of the temporal analysis algorithm
that runs in real time maintaining the high precision and recall of the detection task.
Adaptive key-frame extraction and summarisation achieve a good overview of the
visual content, while the colour quantisation algorithm efficiently creates hierarchical
set of descriptors. Finally, the video representation model, supported by the genre
classification algorithm, achieves excellent results in an automatic annotation system by
linking the video clips with a limited lexicon of related keywords
Intra-Key-Frame Coding and Side Information Generation Schemes in Distributed Video Coding
In this thesis investigation has been made to propose improved schemes for intra-key-frame coding and side information (SI) generation in a distributed video
coding (DVC) framework. From the DVC developments in last few years it has
been observed that schemes put more thrust on intra-frame coding and better
quality side information (SI) generation. In fact both are interrelated as SI
generation is dependent on decoded key frame quality. Hence superior quality
key frames generated through intra-key frame coding will in turn are utilized to
generate good quality SI frames. As a result, DVC needs less number of parity
bits to reconstruct the WZ frames at the decoder. Keeping this in mind, we have
proposed two schemes for intra-key frame coding namely,
(a) Borrows Wheeler Transform based H.264/AVC (Intra) intra-frame coding
(BWT-H.264/AVC(Intra))
(b) Dictionary based H.264/AVC (Intra) intra-frame coding using orthogonal
matching pursuit (DBOMP-H.264/AVC (Intra))
BWT-H.264/AVC (Intra) scheme is a modified version of H.264/AVC (Intra)
scheme where a regularized bit stream is generated prior to compression. This
scheme results in higher compression efficiency as well as high quality decoded
key frames. DBOMP-H.264/AVC (Intra) scheme is based on an adaptive
dictionary and H.264/AVC (Intra) intra-frame coding. The traditional transform
is replaced with a dictionary trained with K-singular value decomposition (K-SVD)
algorithm. The dictionary elements are coded using orthogonal matching pursuit
(OMP).
Further, two side information generation schemes have been suggested namely,
(a) Multilayer Perceptron based side information generation (MLP - SI)
(b) Multivariable support vector regression based side information generation
(MSVR-SI)
MLP-SI scheme utilizes a multilayer perceptron (MLP) to estimate SI frames
from the decoded key frames block-by-block. The network is trained offline using
training patterns from different frames collected from standard video sequences.
MSVR-SI scheme uses an optimized multi variable support vector regression
(M-SVR) to generate SI frames from decoded key frames block-by-block. Like
MLP, the training for M-SVR is made offline with known training patterns apriori.
Both intra-key-frame coding and SI generation schemes are embedded in
the Stanford based DVC architecture and studied individually to compare
performances with their competitive schemes. Visual as well as quantitative
evaluations have been made to show the efficacy of the schemes. To exploit the
usefulness of intra-frame coding schemes in SI generation, four hybrid schemes
have been formulated by combining the aforesaid suggested schemes as follows:
(a) BWT-MLP scheme that uses BWT-H.264/AVC (Intra) intra-frame
coding scheme and MLP-SI side information generation scheme.
(b) BWT-MSVR scheme, where we utilize BWT-H.264/AVC (Intra)
for intra-frame coding followed by MSVR-SI based side information
generation.
(c) DBOMP-MLP scheme is an outcome of putting DBOMP-H.264/AVC
(Intra) intra-frame coding and MLP-SI side information generation
schemes.
(d) DBOMP-MSVR scheme deals with DBOMP-H.264/AVC (Intra)
intra-frame coding and MSVR-SI side information generation together.
The hybrid schemes are also incorporated into the Stanford based DVC
architecture and simulation has been carried out on standard video sequences.
The performance analysis with respect to overall rate distortion, number requests
per SI frame, temporal evaluation, and decoding time requirement has been made
to derive an overall conclusion
Video modeling via implicit motion representations
Video modeling refers to the development of analytical representations for explaining the intensity distribution in video signals. Based on the analytical representation, we can develop algorithms for accomplishing particular video-related tasks. Therefore video modeling provides us a foundation to bridge video data and related-tasks. Although there are many video models proposed in the past decades, the rise of new applications calls for more efficient and accurate video modeling approaches.;Most existing video modeling approaches are based on explicit motion representations, where motion information is explicitly expressed by correspondence-based representations (i.e., motion velocity or displacement). Although it is conceptually simple, the limitations of those representations and the suboptimum of motion estimation techniques can degrade such video modeling approaches, especially for handling complex motion or non-ideal observation video data. In this thesis, we propose to investigate video modeling without explicit motion representation. Motion information is implicitly embedded into the spatio-temporal dependency among pixels or patches instead of being explicitly described by motion vectors.;Firstly, we propose a parametric model based on a spatio-temporal adaptive localized learning (STALL). We formulate video modeling as a linear regression problem, in which motion information is embedded within the regression coefficients. The coefficients are adaptively learned within a local space-time window based on LMMSE criterion. Incorporating a spatio-temporal resampling and a Bayesian fusion scheme, we can enhance the modeling capability of STALL on more general videos. Under the framework of STALL, we can develop video processing algorithms for a variety of applications by adjusting model parameters (i.e., the size and topology of model support and training window). We apply STALL on three video processing problems. The simulation results show that motion information can be efficiently exploited by our implicit motion representation and the resampling and fusion do help to enhance the modeling capability of STALL.;Secondly, we propose a nonparametric video modeling approach, which is not dependent on explicit motion estimation. Assuming the video sequence is composed of many overlapping space-time patches, we propose to embed motion-related information into the relationships among video patches and develop a generic sparsity-based prior for typical video sequences. First, we extend block matching to more general kNN-based patch clustering, which provides an implicit and distributed representation for motion information. We propose to enforce the sparsity constraint on a higher-dimensional data array signal, which is generated by packing the patches in the similar patch set. Then we solve the inference problem by updating the kNN array and the wanted signal iteratively. Finally, we present a Bayesian fusion approach to fuse multiple-hypothesis inferences. Simulation results in video error concealment, denoising, and deartifacting are reported to demonstrate its modeling capability.;Finally, we summarize the proposed two video modeling approaches. We also point out the perspectives of implicit motion representations in applications ranging from low to high level problems
- …