1,027 research outputs found
LSCD: A Large-Scale Screen Content Dataset for Video Compression
Multimedia compression allows us to watch videos, see pictures and hear
sounds within a limited bandwidth, which helps the flourish of the internet.
During the past decades, multimedia compression has achieved great success
using hand-craft features and systems. With the development of artificial
intelligence and video compression, there emerges a lot of research work
related to using the neural network on the video compression task to get rid of
the complicated system. Not only producing the advanced algorithms, but
researchers also spread the compression to different content, such as User
Generated Content(UGC). With the rapid development of mobile devices, screen
content videos become an important part of multimedia data. In contrast, we
find community lacks a large-scale dataset for screen content video
compression, which impedes the fast development of the corresponding
learning-based algorithms. In order to fulfill this blank and accelerate the
research of this special type of videos, we propose the Large-scale Screen
Content Dataset(LSCD), which contains 714 source sequences. Meanwhile, we
provide the analysis of the proposed dataset to show some features of screen
content videos, which will help researchers have a better understanding of how
to explore new algorithms. Besides collecting and post-processing the data to
organize the dataset, we also provide a benchmark containing the performance of
both traditional codec and learning-based methods
Dynamic adaptation of streamed real-time E-learning videos over the internet
Even though the e-learning is becoming increasingly popular in the academic environment,
the quality of synchronous e-learning video is still substandard and significant work needs to be
done to improve it. The improvements have to be brought about taking into considerations both:
the network requirements and the psycho- physical aspects of the human visual system.
One of the problems of the synchronous e-learning video is that the head-and-shoulder video
of the instructor is mostly transmitted. This video presentation can be made more interesting by
transmitting shots from different angles and zooms. Unfortunately, the transmission of such
multi-shot videos will increase packet delay, jitter and other artifacts caused by frequent
changes of the scenes. To some extent these problems may be reduced by controlled reduction
of the quality of video so as to minimise uncontrolled corruption of the stream. Hence, there is a
need for controlled streaming of a multi-shot e-learning video in response to the changing
availability of the bandwidth, while utilising the available bandwidth to the maximum.
The quality of transmitted video can be improved by removing the redundant background
data and utilising the available bandwidth for sending high-resolution foreground information.
While a number of schemes exist to identify and remove the background from the foreground,
very few studies exist on the identification and separation of the two based on the understanding
of the human visual system. Research has been carried out to define foreground and background
in the context of e-learning video on the basis of human psychology. The results have been
utilised to propose methods for improving the transmission of e-learning videos.
In order to transmit the video sequence efficiently this research proposes the use of Feed-
Forward Controllers that dynamically characterise the ongoing scene and adjust the streaming
of video based on the availability of the bandwidth. In order to satisfy a number of receivers
connected by varied bandwidth links in a heterogeneous environment, the use of Multi-Layer
Feed-Forward Controller has been researched. This controller dynamically characterises the
complexity (number of Macroblocks per frame) of the ongoing video sequence and combines it
with the knowledge of availability of the bandwidth to various receivers to divide the video
sequence into layers in an optimal way before transmitting it into network.
The Single-layer Feed-Forward Controller inputs the complexity (Spatial Information and
Temporal Information) of the on-going video sequence along with the availability of bandwidth
to a receiver and adjusts the resolution and frame rate of individual scenes to transmit the
sequence optimised to give the most acceptable perceptual quality within the bandwidth
constraints.
The performance of the Feed-Forward Controllers have been evaluated under simulated
conditions and have been found to effectively regulate the streaming of real-time e-learning
videos in order to provide perceptually improved video quality within the constraints of the
available bandwidth
- …