1,348 research outputs found
A Two-Stage Training Framework for Joint Speech Compression and Enhancement
This paper considers the joint compression and enhancement problem for speech
signal in the presence of noise. Recently, the SoundStream codec, which relies
on end-to-end joint training of an encoder-decoder pair and a residual vector
quantizer by a combination of adversarial and reconstruction losses,has shown
very promising performance, especially in subjective perception quality. In
this work, we provide a theoretical result to show that, to simultaneously
achieve low distortion and high perception in the presence of noise, there
exist an optimal two-stage optimization procedure for the joint compression and
enhancement problem. This procedure firstly optimizes an encoder-decoder pair
using only distortion loss and then fixes the encoder to optimize a perceptual
decoder using perception loss. Based on this result, we construct a two-stage
training framework for joint compression and enhancement of noisy speech
signal. Unlike existing training methods which are heuristic, the proposed
two-stage training method has a theoretical foundation. Finally, experimental
results for various noise and bit-rate conditions are provided. The results
demonstrate that a codec trained by the proposed framework can outperform
SoundStream and other representative codecs in terms of both objective and
subjective evaluation metrics. Code is available at
\textit{https://github.com/jscscloris/SEStream}
The JPEG2000 still image compression standard
The development of standards (emerging and established) by the International Organization for Standardization (ISO), the International Telecommunications Union (ITU), and the International Electrotechnical Commission (IEC) for audio, image, and video, for both transmission and storage, has led to worldwide activity in developing hardware and software systems and products applicable to a number of diverse disciplines [7], [22], [23], [55], [56], [73]. Although the standards implicitly address the basic encoding operations, there is freedom and flexibility in the actual design and development of devices. This is because only the syntax and semantics of the bit stream for decoding are specified by standards, their main objective being the compatibility and interoperability among the systems (hardware/software) manufactured by different companies. There is, thus, much room for innovation and ingenuity. Since the mid 1980s, members from both the ITU and the ISO have been working together to establish a joint international standard for the compression of grayscale and color still images. This effort has been known as JPEG, the Join
Combined Source and Channel Strategies for Optimized Video Communications
ISBN 978-953-7619-70-
Computational Complexity Optimization on H.264 Scalable/Multiview Video Coding
The H.264/MPEG-4 Advanced Video Coding (AVC) standard is a high efficiency and flexible video coding standard compared to previous standards. The high efficiency is achieved by utilizing a comprehensive full search motion estimation method. Although the H.264 standard improves the visual quality at low bitrates, it enormously increases the computational complexity. The research described in this thesis focuses on optimization of the computational complexity on H.264 scalable and multiview video coding.
Nowadays, video application areas range from multimedia messaging and mobile to high definition television, and they use different type of transmission systems. The Scalable Video Coding (SVC) extension of the H.264/AVC standard is able to scale the video stream in order to adapt to a variety of devices with different capabilities. Furthermore, a rate control scheme is utilized to improve the visual quality under the constraints of capability and channel bandwidth. However, the computational complexity is increased. A simplified rate control scheme is proposed to reduce the computational complexity. In the proposed scheme, the quantisation parameter can be computed directly instead of using the exhaustive Rate-Quantization model. The linear Mean Absolute Distortion (MAD) prediction model is used to predict the scene change, and the quantisation parameter will be increased directly by a threshold when the scene changes abruptly; otherwise, the comprehensive Rate-Quantisation model will be used. Results show that the optimized rate control scheme is efficient on time saving.
Multiview Video Coding (MVC) is efficient on reducing the huge amount of data in multiple-view video coding. The inter-view reference frames from the adjacent views are exploited for prediction in addition to the temporal prediction. However, due to the increase in the number of reference frames, the computational complexity is also increased. In order to manage the reference frame efficiently, a phase correlation algorithm is utilized to remove the inefficient inter-view reference frame from the reference list. The dependency between the inter-view reference frame and current frame is decided based on the phase correlation coefficients. If the inter-view reference frame is highly related to the current frame, it is still enabled in the reference list; otherwise, it will be disabled. The experimental results show that the proposed scheme is efficient on time saving and without loss in visual quality and increase in bitrate.
The proposed optimization algorithms are efficient in reducing the computational complexity on H.264/AVC extension. The low computational complexity algorithm is useful in the design of future video coding standards, especially on low power handheld devices
Efficient compression of motion compensated residuals
EThOS - Electronic Theses Online ServiceGBUnited Kingdo
- …