8 research outputs found
Multimodal Stereoscopic Movie Summarization Conforming to Narrative Characteristics
Video summarization is a timely and rapidly developing research field with broad commercial interest, due to the increasing availability of massive video data. Relevant algorithms face the challenge of needing to achieve a careful balance between summary compactness, enjoyability, and content coverage. The specific case of stereoscopic 3D theatrical films has become more important over the past years, but not received corresponding research attention. In this paper, a multi-stage, multimodal summarization process for such stereoscopic movies is proposed, that is able to extract a short, representative video skim conforming to narrative characteristics from a 3D film. At the initial stage, a novel, low-level video frame description method is introduced (frame moments descriptor) that compactly captures informative image statistics from luminance, color, optical flow, and stereoscopic disparity video data, both in a global and in a local scale. Thus, scene texture, illumination, motion, and geometry properties may succinctly be contained within a single frame feature descriptor, which can subsequently be employed as a building block in any key-frame extraction scheme, e.g., for intra-shot frame clustering. The computed key-frames are then used to construct a movie summary in the form of a video skim, which is post-processed in a manner that also considers the audio modality. The next stage of the proposed summarization pipeline essentially performs shot pruning, controlled by a user-provided shot retention parameter, that removes segments from the skim based on the narrative prominence of movie characters in both the visual and the audio modalities. This novel process (multimodal shot pruning) is algebraically modeled as a multimodal matrix column subset selection problem, which is solved using an evolutionary computing approach. Subsequently, disorienting editing effects induced by summarization are dealt with, through manipulation of the video skim. At the last step, the skim is suitably post-processed in order to reduce stereoscopic video defects that may cause visual fatigue
Towards key-frame extraction methods for 3D video: a review
The increasing rate of creation and use of 3D video content leads to a pressing need for methods capable of lowering
the cost of 3D video searching, browsing and indexing operations, with improved content selection performance.
Video summarisation methods specifically tailored for 3D video content fulfil these requirements. This paper presents
a review of the state-of-the-art of a crucial component of 3D video summarisation algorithms: the key-frame
extraction methods. The methods reviewed cover 3D video key-frame extraction as well as shot boundary detection
methods specific for use in 3D video. The performance metrics used to evaluate the key-frame extraction methods
and the summaries derived from those key-frames are presented and discussed. The applications of these methods
are also presented and discussed, followed by an exposition about current research challenges on 3D video
summarisation methods
Video object segmentation.
Wei Wei.Thesis submitted in: December 2005.Thesis (M.Phil.)--Chinese University of Hong Kong, 2006.Includes bibliographical references (leaves 112-122).Abstracts in English and Chinese.Abstract --- p.IIList of Abbreviations --- p.IVChapter Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Overview of Content-based Video Standard --- p.1Chapter 1.2 --- Video Object Segmentation --- p.4Chapter 1.2.1 --- Video Object Plane (VOP) --- p.4Chapter 1.2.2 --- Object Segmentation --- p.5Chapter 1.3 --- Problems of Video Object Segmentation --- p.6Chapter 1.4 --- Objective of the research work --- p.7Chapter 1.5 --- Organization of This Thesis --- p.8Chapter 1.6 --- Notes on Publication --- p.8Chapter Chapter 2 --- Literature Review --- p.10Chapter 2.1 --- What is segmentation? --- p.10Chapter 2.1.1 --- Manual Segmentation --- p.10Chapter 2.1.2 --- Automatic Segmentation --- p.11Chapter 2.1.3 --- Semi-automatic segmentation --- p.12Chapter 2.2 --- Segmentation Strategy --- p.14Chapter 2.3 --- Segmentation of Moving Objects --- p.17Chapter 2.3.1 --- Motion --- p.18Chapter 2.3.2 --- Motion Field Representation --- p.19Chapter 2.3.3 --- Video Object Segmentation --- p.25Chapter 2.4 --- Summary --- p.35Chapter Chapter 3 --- Automatic Video Object Segmentation Algorithm --- p.37Chapter 3.1 --- Spatial Segmentation --- p.38Chapter 3.1.1 --- k:-Medians Clustering Algorithm --- p.39Chapter 3.1.2 --- Cluster Number Estimation --- p.41Chapter 3.1.2 --- Region Merging --- p.46Chapter 3.2 --- Foreground Detection --- p.48Chapter 3.2.1 --- Global Motion Estimation --- p.49Chapter 3.2.2 --- Detection of Moving Objects --- p.50Chapter 3.3 --- Object Tracking and Extracting --- p.50Chapter 3.3.1 --- Binary Model Tracking --- p.51Chapter 3.3.1.2 --- Initial Model Extraction --- p.53Chapter 3.3.2 --- Region Descriptor Tracking --- p.59Chapter 3.4 --- Results and Discussions --- p.65Chapter 3.4.1 --- Objective Evaluation --- p.65Chapter 3.4.2 --- Subjective Evaluation --- p.66Chapter 3.5 --- Conclusion --- p.74Chapter Chapter 4 --- Disparity Estimation and its Application in Video Object Segmentation --- p.76Chapter 4.1 --- Disparity Estimation --- p.79Chapter 4.1.1. --- Seed Selection --- p.80Chapter 4.1.2. --- Edge-based Matching by Propagation --- p.82Chapter 4.2 --- Remedy Matching Sparseness by Interpolation --- p.84Chapter 4.2 --- Disparity Applications in Video Conference Segmentation --- p.92Chapter 4.3 --- Conclusion --- p.106Chapter Chapter 5 --- Conclusion and Future Work --- p.108Chapter 5.1 --- Conclusion and Contribution --- p.108Chapter 5.2 --- Future work --- p.109Reference --- p.11
A family of stereoscopic image compression algorithms using wavelet transforms
With the standardization of JPEG-2000, wavelet-based image and video
compression technologies are gradually replacing the popular DCT-based methods. In
parallel to this, recent developments in autostereoscopic display technology is now
threatening to revolutionize the way in which consumers are used to enjoying the
traditional 2-D display based electronic media such as television, computer and
movies. However, due to the two-fold bandwidth/storage space requirement of
stereoscopic imaging, an essential requirement of a stereo imaging system is efficient
data compression.
In this thesis, seven wavelet-based stereo image compression algorithms are
proposed, to take advantage of the higher data compaction capability and better
flexibility of wavelets. [Continues.
A family of stereoscopic image compression algorithms using wavelet transforms
With the standardization of JPEG-2000, wavelet-based image and video
compression technologies are gradually replacing the popular DCT-based methods. In
parallel to this, recent developments in autostereoscopic display technology is now
threatening to revolutionize the way in which consumers are used to enjoying the
traditional 2D display based electronic media such as television, computer and
movies. However, due to the two-fold bandwidth/storage space requirement of
stereoscopic imaging, an essential requirement of a stereo imaging system is efficient
data compression.
In this thesis, seven wavelet-based stereo image compression algorithms are
proposed, to take advantage of the higher data compaction capability and better
flexibility of wavelets. In the proposed CODEC I, block-based disparity
estimation/compensation (DE/DC) is performed in pixel domain. However, this
results in an inefficiency when DWT is applied on the whole predictive error image
that results from the DE process. This is because of the existence of artificial block
boundaries between error blocks in the predictive error image. To overcome this
problem, in the remaining proposed CODECs, DE/DC is performed in the wavelet
domain. Due to the multiresolution nature of the wavelet domain, two methods of
disparity estimation and compensation have been proposed. The first method is
performing DEJDC in each subband of the lowest/coarsest resolution level and then
propagating the disparity vectors obtained to the corresponding subbands of
higher/finer resolution. Note that DE is not performed in every subband due to the
high overhead bits that could be required for the coding of disparity vectors of all
subbands. This method is being used in CODEC II. In the second method, DEJDC is
performed m the wavelet-block domain. This enables disparity estimation to be
performed m all subbands simultaneously without increasing the overhead bits
required for the coding disparity vectors. This method is used by CODEC III.
However, performing disparity estimation/compensation in all subbands would result
in a significant improvement of CODEC III. To further improve the performance of
CODEC ill, pioneering wavelet-block search technique is implemented in CODEC
IV. The pioneering wavelet-block search technique enables the right/predicted image
to be reconstructed at the decoder end without the need of transmitting the disparity
vectors. In proposed CODEC V, pioneering block search is performed in all subbands
of DWT decomposition which results in an improvement of its performance. Further,
the CODEC IV and V are able to perform at very low bit rates(< 0.15 bpp). In
CODEC VI and CODEC VII, Overlapped Block Disparity Compensation (OBDC) is
used with & without the need of coding disparity vector. Our experiment results
showed that no significant coding gains could be obtained for these CODECs over
CODEC IV & V.
All proposed CODECs m this thesis are wavelet-based stereo image coding
algorithms that maximise the flexibility and benefits offered by wavelet transform
technology when applied to stereo imaging. In addition the use of a baseline-JPEG
coding architecture would enable the easy adaptation of the proposed algorithms
within systems originally built for DCT-based coding. This is an important feature
that would be useful during an era where DCT-based technology is only slowly being
phased out to give way for DWT based compression technology.
In addition, this thesis proposed a stereo image coding algorithm that uses JPEG-2000
technology as the basic compression engine. The proposed CODEC, named RASTER
is a rate scalable stereo image CODEC that has a unique ability to preserve the image
quality at binocular depth boundaries, which is an important requirement in the design
of stereo image CODEC. The experimental results have shown that the proposed
CODEC is able to achieve PSNR gains of up to 3.7 dB as compared to directly
transmitting the right frame using JPEG-2000