290 research outputs found

    Multi-frame reconstruction using super-resolution, inpainting, segmentation and codecs

    Get PDF
    In this thesis, different aspects of video and light field reconstruction are considered such as super-resolution, inpainting, segmentation and codecs. For this purpose, each of these strategies are analyzed based on a specific goal and a specific database. Accordingly, databases which are relevant to film industry, sport videos, light fields and hyperspectral videos are used for the sake of improvement. This thesis is constructed around six related manuscripts, in which several approaches are proposed for multi-frame reconstruction. Initially, a novel multi-frame reconstruction strategy is proposed for lightfield super-resolution in which graph-based regularization is applied along with edge preserving filtering for improving the spatio-angular quality of lightfield. Second, a novel video reconstruction is proposed which is built based on compressive sensing (CS), Gaussian mixture models (GMM) and sparse 3D transform-domain block matching. The motivation of the proposed technique is the improvement in visual quality performance of the video frames and decreasing the reconstruction error in comparison with the former video reconstruction methods. In the next approach, student-t mixture models and edge preserving filtering are applied for the purpose of video super-resolution. Student-t mixture model has a heavy tail which makes it robust and suitable as a video frame patch prior and rich in terms of log likelihood for information retrieval. In another approach, a hyperspectral video database is considered, and a Bayesian dictionary learning process is used for hyperspectral video super-resolution. To that end, Beta process is used in Bayesian dictionary learning and a sparse coding is generated regarding the hyperspectral video super-resolution. The spatial super-resolution is followed by a spectral video restoration strategy, and the whole process leveraged two different dictionary learnings, in which the first one is trained for spatial super-resolution and the second one is trained for the spectral restoration. Furthermore, in another approach, a novel framework is proposed for replacing advertisement contents in soccer videos in an automatic way by using deep learning strategies. For this purpose, a UNET architecture is applied (an image segmentation convolutional neural network technique) for content segmentation and detection. Subsequently, after reconstructing the segmented content in the video frames (considering the apparent loss in detection), the unwanted content is replaced by new one using a homography mapping procedure. In addition, in another research work, a novel video compression framework is presented using autoencoder networks that encode and decode videos by using less chroma information than luma information. For this purpose, instead of converting Y'CbCr 4:2:2/4:2:0 videos to and from RGB 4:4:4, the video is kept in Y'CbCr 4:2:2/4:2:0 and merged the luma and chroma channels after the luma is downsampled to match the chroma size. An inverse function is performed for the decoder. The performance of these models is evaluated by using CPSNR, MS-SSIM, and VMAF metrics. The experiments reveal that, as compared to video compression involving conversion to and from RGB 4:4:4, the proposed method increases the video quality by about 5.5% for Y'CbCr 4:2:2 and 8.3% for Y'CbCr 4:2:0 while reducing the amount of computation by nearly 37% for Y'CbCr 4:2:2 and 40% for Y'CbCr 4:2:0. The thread that ties these approaches together is reconstruction of the video and light field frames based on different aspects of problems such as having loss of information, blur in the frames, existing noise after reconstruction, existing unpleasant content, excessive size of information and high computational overhead. In three of the proposed approaches, we have used Plug-and-Play ADMM model for the first time regarding reconstruction of videos and light fields in order to address both information retrieval in the frames and tackling noise/blur at the same time. In two of the proposed models, we applied sparse dictionary learning to reduce the data dimension and demonstrate them as an efficient linear combination of basis frame patches. Two of the proposed approaches are developed in collaboration with industry, in which deep learning frameworks are used to handle large set of features and to learn high-level features from the data

    A study of the transmission of VBR encoded video over ATM networks.

    Get PDF
    by Ngai Li.Thesis (M.Phil.)--Chinese University of Hong Kong, 1997.Includes bibliographical references (leaves 66-69).Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Video Compression and Transport --- p.2Chapter 1.2 --- Research Contributions --- p.6Chapter 1.2.1 --- Joint Rate Control of VBR Encoded Video --- p.6Chapter 1.2.2 --- Transporting VBR Video on LB Controlled Channel --- p.7Chapter 1.3 --- Organization of Thesis --- p.7Chapter 2 --- Preliminary --- p.9Chapter 2.1 --- Statistical Characteristics of MPEG-1 Encoded Video --- p.9Chapter 2.2 --- Temporal and Spatial Smoothing --- p.14Chapter 2.2.1 --- Temporal Smoothing --- p.14Chapter 2.2.2 --- Spatial Smoothing --- p.15Chapter 2.3 --- A Single Source Control-Theoretic Framework for VBR-to-CBR Video Adaptation --- p.16Chapter 3 --- Joint Rate Control of VBR Encoded Video --- p.19Chapter 3.1 --- Analytical Models --- p.21Chapter 3.2 --- Analysis --- p.27Chapter 3.2.1 --- Stable Region --- p.29Chapter 3.2.2 --- Final Value of the State Variables --- p.33Chapter 3.2.3 --- Peak Values of Buffer-occupancy Deviation and Image- quality Fluctuation --- p.35Chapter 3.2.4 --- SAE of Buffer-occupancy Deviation and Image-quality Fluc- tuation --- p.42Chapter 3.3 --- Experimental Results --- p.43Chapter 3.4 --- Concluding Remarks --- p.48Chapter 4 --- Transporting VBR Video on LB Controlled Channel --- p.50Chapter 4.1 --- Leaky Bucket Access Control --- p.51Chapter 4.2 --- Greedy Token-usage Strategy --- p.53Chapter 4.3 --- Non-greedy Token-usage Strategy --- p.57Chapter 4.4 --- Concluding Remarks --- p.60Chapter 5 --- Conclusions --- p.62Chapter 5.1 --- Joint Rate Control of Multiple VBR Videos --- p.62Chapter 5.2 --- LB Video Compression --- p.63Chapter 5.3 --- Further Study --- p.64Chapter 5.4 --- Publications --- p.65Bibliography --- p.6

    Rate Control in Video Coding

    Get PDF

    A Parametric Sound Object Model for Sound Texture Synthesis

    Get PDF
    This thesis deals with the analysis and synthesis of sound textures based on parametric sound objects. An overview is provided about the acoustic and perceptual principles of textural acoustic scenes, and technical challenges for analysis and synthesis are considered. Four essential processing steps for sound texture analysis are identifi ed, and existing sound texture systems are reviewed, using the four-step model as a guideline. A theoretical framework for analysis and synthesis is proposed. A parametric sound object synthesis (PSOS) model is introduced, which is able to describe individual recorded sounds through a fi xed set of parameters. The model, which applies to harmonic and noisy sounds, is an extension of spectral modeling and uses spline curves to approximate spectral envelopes, as well as the evolution of parameters over time. In contrast to standard spectral modeling techniques, this representation uses the concept of objects instead of concatenated frames, and it provides a direct mapping between sounds of diff erent length. Methods for automatic and manual conversion are shown. An evaluation is presented in which the ability of the model to encode a wide range of di fferent sounds has been examined. Although there are aspects of sounds that the model cannot accurately capture, such as polyphony and certain types of fast modulation, the results indicate that high quality synthesis can be achieved for many different acoustic phenomena, including instruments and animal vocalizations. In contrast to many other forms of sound encoding, the parametric model facilitates various techniques of machine learning and intelligent processing, including sound clustering and principal component analysis. Strengths and weaknesses of the proposed method are reviewed, and possibilities for future development are discussed

    A cross-layer quality-oriented energy-efficient scheme for multimedia delivery in wireless local area networks

    Get PDF
    Wireless communication technologies, although emerged only a few decades ago, have grown fast in both popularity and technical maturity. As a result, mobile devices such as Personal Digital Assistants (PDA) or smart phones equipped with embedded wireless cards have seen remarkable growth in popularity and are quickly becoming one of the most widely used communication tools. This is mainly determined by the flexibility, convenience and relatively low costs associated with these devices and wireless communications. Multimedia applications have become by far one of the most popular applications among mobile users. However this type of application has very high bandwidth requirements, seriously restricting the usage of portable devices. Moreover, the wireless technology involves increased energy consumption and consequently puts huge pressure on the limited battery capacity which presents many design challenges in the context of battery powered devices. As a consequence, power management has raised awareness in both research and industrial communities and huge efforts have been invested into energy conservation techniques and strategies deployed within different components of the mobile devices. Our research presented in this thesis focuses on energy efficient data transmission in wireless local networks, and mainly contributes in the following aspects: 1. Static STELA, which is a Medium Access Control (MAC) layer solution that adapts the sleep/wakeup state schedule of the radio transceiver according to the bursty nature of data traffic and real time observation of data packets in terms of arrival time. The algorithm involves three phases– slow start phase, exponential increase phase, and linear increase phase. The initiation and termination of each phase is self-adapted to real time traffic and user configuration. It is designed to provide either maximum energy efficiency or best Quality of Service (QoS) according to user preference. 2. Dynamic STELA, which is a MAC layer solution deployed on the mobile devices and provides balanced performance between energy efficiency and QoS. Dynamic STELA consists of the three phase algorithm used in static STELA, and additionally employs a traffic modeling algorithm to analyze historical traffic data and estimate the arrival time of the next burst. Dynamic STELA achieves energy saving through intelligent and adaptive increase of Wireless Network Interface Card (WNIC) sleeping interval in the second and the third phase and at the same time guarantees delivery performance through optimal WNIC waking timing before the estimated arrival of new data burst. 3. Q-PASTE, which is a quality-oriented cross-layer solution with two components employed at different network layers, designed for multimedia content delivery. First component, the Packet/ApplicaTion manager (PAT) is deployed at the application layer of both service gateway and client host. The gateway level PAT utilizes fast start, as a widely supported technique for multimedia content delivery, to achieve high QoS and shapes traffic into bursts to reduce the wireless transceiver’s duty cycle. Additionally, gateway-side PAT informs client host the starting and ending time of fast start to assist parameter tuning. The client-side PAT monitors each active session and informs the MAC layer about their traffic-related behavior. The second component, dynamic STELA, deployed at MAC layer, adaptively adjusts the sleep/wake-up behavior of mobile device wireless interfaces in order to reduce energy consumption while also maintaining high Quality of Service (QoS) levels. 4. A comprehensive survey on energy efficient standards and some of the most important state-of-the-art energy saving technologies is also provided as part of the work
    corecore