3 research outputs found

    Squash: low latency multi-path video streaming using multi-bitrate encoding

    Get PDF
    The demand for low latency video streaming has dramatically increased as live video streaming applications, such as Twitch and Youtube Live, are becoming more popular. According to the 2021 Bitmovin video developer report, the biggest challenge that video developers are experiencing today is providing low latency video streaming. One of the most common on-site live streaming methods is using a wireless LTE network. There have been many approaches for characterizing wireless links and accurately measuring available bandwidth to provide low latency streaming over a wireless LTE network link. However, even with fine-grained bandwidth estimation, video streaming on a single LTE link is still susceptible to unexpected network delay from a sudden drop in available bandwidth or temporal disconnection. People can utilize multiple wireless LTE links to overcome the limitations of using a single LTE link for low latency video streaming. Using multiple links can enhance video quality through increased bandwidth and resilience. However, multi-homed low latency video streaming protocols may achieve lower video quality than single-homed protocols when a frame is split and sent over more than one link. Suppose one of the links becomes congested or gets disconnected. In that case, the part of the frame sent on stable links must wait until the packets sent on the problematic link are re-transmitted through another link. Re-transmission requires at least one extra round trip time. A video player may skip the late frame or serve only the received part of the frame due to the re-transmission delay. Ferlin et al. suggest using Forward Error Correction (FEC) on Multipath TCP (MPTCP) to reduce re-transmission delay. However, FEC is not helpful in the event of a significant bandwidth drop. If the sender does not use sufficient redundancy to handle a significant bandwidth drop, the receiver will not receive enough blocks to decode the video data. FEC requires using a large portion of the network bandwidth for redundancy to handle significant bandwidth drops even when the links are stable. In this thesis, I present Squash, a low latency video transport protocol that encodes each frame at multiple bitrates and sends them across different links to minimize video stream disruption in the event of unexpected bandwidth drops. The encoder encodes a frame into multiple different bitrates, which are high-bitrate and low-bitrate. When a high- bitrate frame cannot arrive on time due to congestion from an unexpected drop in available bandwidth, the low-bitrate frame is used to replace the missing frame. This is because the low-bitrate frame is smaller and is sent on the links that are disjoint from those used by the high-bitrate frame. To the best of my knowledge, Squash is the first architecture that uses multi-bitrate frames to increase resilience against unexpected bandwidth drops in low latency video streaming over multiple wireless LTE links. In emulated wireless LTE network environment using Mahimahi network traces, the average SSIM of the video streamed on Squash is 13 – 58% higher than that streamed on the baseline protocol, which is designed in the same manner as Squash except that it employs single-frame encoding

    SSIM-Inspired Quality Assessment, Compression, and Processing for Visual Communications

    Get PDF
    Objective Image and Video Quality Assessment (I/VQA) measures predict image/video quality as perceived by human beings - the ultimate consumers of visual data. Existing research in the area is mainly limited to benchmarking and monitoring of visual data. The use of I/VQA measures in the design and optimization of image/video processing algorithms and systems is more desirable, challenging and fruitful but has not been well explored. Among the recently proposed objective I/VQA approaches, the structural similarity (SSIM) index and its variants have emerged as promising measures that show superior performance as compared to the widely used mean squared error (MSE) and are computationally simple compared with other state-of-the-art perceptual quality measures. In addition, SSIM has a number of desirable mathematical properties for optimization tasks. The goal of this research is to break the tradition of using MSE as the optimization criterion for image and video processing algorithms. We tackle several important problems in visual communication applications by exploiting SSIM-inspired design and optimization to achieve significantly better performance. Firstly, the original SSIM is a Full-Reference IQA (FR-IQA) measure that requires access to the original reference image, making it impractical in many visual communication applications. We propose a general purpose Reduced-Reference IQA (RR-IQA) method that can estimate SSIM with high accuracy with the help of a small number of RR features extracted from the original image. Furthermore, we introduce and demonstrate the novel idea of partially repairing an image using RR features. Secondly, image processing algorithms such as image de-noising and image super-resolution are required at various stages of visual communication systems, starting from image acquisition to image display at the receiver. We incorporate SSIM into the framework of sparse signal representation and non-local means methods and demonstrate improved performance in image de-noising and super-resolution. Thirdly, we incorporate SSIM into the framework of perceptual video compression. We propose an SSIM-based rate-distortion optimization scheme and an SSIM-inspired divisive optimization method that transforms the DCT domain frame residuals to a perceptually uniform space. Both approaches demonstrate the potential to largely improve the rate-distortion performance of state-of-the-art video codecs. Finally, in real-world visual communications, it is a common experience that end-users receive video with significantly time-varying quality due to the variations in video content/complexity, codec configuration, and network conditions. How human visual quality of experience (QoE) changes with such time-varying video quality is not yet well-understood. We propose a quality adaptation model that is asymmetrically tuned to increasing and decreasing quality. The model improves upon the direct SSIM approach in predicting subjective perceptual experience of time-varying video quality

    Análise do impacto de pel decimation na codificação de vídeos de alta resolução

    Get PDF
    Dissertação (mestrado) - Universidade Federal de Santa Catarina, Centro Tecnológico, Programa de Pós-Graduação em Ciência da Computação, Florianópolis, 2014.Ao mesmo tempo em que o número de pixels por quadro tende a aumentar pela iminente adoção de resoluções ultra altas, a subamostragem de pixels, também conhecida por pel decimation, surge como uma opção viável para aumentar a eficiência energética da codificação de vídeo. Este trabalho investiga os impactos em energia e qualidade, quando pel decimation é aplicado ao cálculo da Soma das Diferenças Absolutas (SAD), a qual é a métrica de similaridade mais utilizada durante a etapa de estimação de movimento. Primeiramente, apresenta-se uma análise de qualidade de 15 padrões de subamostragem. Os 10.860 pontos experimentais usados proporcionam evidência estatística de que a razão de amostragem 4:3 proposta apresenta velocidade de codificação duas vezes maior do que a amostragem completa, perdendo apenas 5% em DSSIM e 1% em PSNR. A razão 4:3 apresenta o melhor custo-benefício entre aceleração e redução de qualidade, comparando-se com razões de menor amostragem. Para obter estimativas de área em silício e energia por bloco, cinco arquiteturas para cálculo da SAD foram projetadas e sintetizadas para uma biblioteca standard cell industrial. Dentre elas, uma pode ser configurada para operar com razões de amostragem 1:1, 4:3, 2:1 ou 4:1, ao passo que as demais foram projetadas para operar exclusivamente com cada uma destas razões de amostragem. A arquitetura configurável, operando em amostragem completa, consome 3,54 pJ/bloco (60% menos que a versão não-configurável), podendo ser reduzida até 1,34 pJ/bloco utilizando-se a razão de amostragem 4:1, com redução de 2,8% em PSNR e 14,1% em DSSIM. Finalmente, demonstra-se que a aceleração de codificação de um determinado padrão de subamostragem deve-se à redução conjunta do número de pixels amostradas e do número total de cálculos de SAD. Assim, modelando-se as componentes de energia da codificação de vídeos, demonstra-se que a eficiência energética do processo de codificação como um todo pode ser melhorada além da razão de subamostragem. Utilizando-se uma arquitetura de SAD configurável, a economia de energia pode ser de até 95,11%.Abstract : As the number of pixels per frame tends to increase by the upcoming adoption of ultra high resolutions, pixel subsampling, also known as pel decimation, appears as a viable means to improve the energy efficiency of video coding. This work investigates the impacts on energy and quality when pel decimation is applied to the Sum of Absolute Differences (SAD) calculation, which is the most used similarity metric in motion estimation step of video coding. Firstly, a quality assessment of 15 pel decimation patterns is presented. The 10,680 experimental points used provide statistical evidence that the proposed 4:3 ratio leads to an encoding speedup of more than two times in comparison to full sampling, losing only 5% in DSSIM and 1% in PSNR. Compared with lower sampling ratios, it presents a better trade-off between speedup and quality loss. To obtain estimates for silicon area and energy per block, five SAD architectures were designed and synthesized for an industrial standard cell library. Among those, one can be configured to operate with 1:1, 4:3, 2:1 or 4:1 sampling ratios, whereas the rest are tailored to operate exclusively with each one of these ratios. The configurable architecture consumes 3.54pJ/block operating in full sampling (60% lower than the nonconfigurable). The energy can be further reduced until 1.34pJ/block by using 4:1 ratio, with losses of 2.8% in PSNR and 14.1% in DSSIM. Finally, it is shown that the speedup of a given subsampling pattern is due the reduction of both the number of sampled pixels and the total number of SAD calculations. Therefore, by modeling the video coding energy components, it is shown that the whole video compression energy efficiency can be increased beyond the sampling ratio. By using a configurable SAD architecture operating in 4:1 ratio the energy savings are up to 95:11%
    corecore