385 research outputs found
Video modeling via implicit motion representations
Video modeling refers to the development of analytical representations for explaining the intensity distribution in video signals. Based on the analytical representation, we can develop algorithms for accomplishing particular video-related tasks. Therefore video modeling provides us a foundation to bridge video data and related-tasks. Although there are many video models proposed in the past decades, the rise of new applications calls for more efficient and accurate video modeling approaches.;Most existing video modeling approaches are based on explicit motion representations, where motion information is explicitly expressed by correspondence-based representations (i.e., motion velocity or displacement). Although it is conceptually simple, the limitations of those representations and the suboptimum of motion estimation techniques can degrade such video modeling approaches, especially for handling complex motion or non-ideal observation video data. In this thesis, we propose to investigate video modeling without explicit motion representation. Motion information is implicitly embedded into the spatio-temporal dependency among pixels or patches instead of being explicitly described by motion vectors.;Firstly, we propose a parametric model based on a spatio-temporal adaptive localized learning (STALL). We formulate video modeling as a linear regression problem, in which motion information is embedded within the regression coefficients. The coefficients are adaptively learned within a local space-time window based on LMMSE criterion. Incorporating a spatio-temporal resampling and a Bayesian fusion scheme, we can enhance the modeling capability of STALL on more general videos. Under the framework of STALL, we can develop video processing algorithms for a variety of applications by adjusting model parameters (i.e., the size and topology of model support and training window). We apply STALL on three video processing problems. The simulation results show that motion information can be efficiently exploited by our implicit motion representation and the resampling and fusion do help to enhance the modeling capability of STALL.;Secondly, we propose a nonparametric video modeling approach, which is not dependent on explicit motion estimation. Assuming the video sequence is composed of many overlapping space-time patches, we propose to embed motion-related information into the relationships among video patches and develop a generic sparsity-based prior for typical video sequences. First, we extend block matching to more general kNN-based patch clustering, which provides an implicit and distributed representation for motion information. We propose to enforce the sparsity constraint on a higher-dimensional data array signal, which is generated by packing the patches in the similar patch set. Then we solve the inference problem by updating the kNN array and the wanted signal iteratively. Finally, we present a Bayesian fusion approach to fuse multiple-hypothesis inferences. Simulation results in video error concealment, denoising, and deartifacting are reported to demonstrate its modeling capability.;Finally, we summarize the proposed two video modeling approaches. We also point out the perspectives of implicit motion representations in applications ranging from low to high level problems
Robust density modelling using the student's t-distribution for human action recognition
The extraction of human features from videos is often inaccurate and prone to outliers. Such outliers can severely affect density modelling when the Gaussian distribution is used as the model since it is highly sensitive to outliers. The Gaussian distribution is also often used as base component of graphical models for recognising human actions in the videos (hidden Markov model and others) and the presence of outliers can significantly affect the recognition accuracy. In contrast, the Student's t-distribution is more robust to outliers and can be exploited to improve the recognition rate in the presence of abnormal data. In this paper, we present an HMM which uses mixtures of t-distributions as observation probabilities and show how experiments over two well-known datasets (Weizmann, MuHAVi) reported a remarkable improvement in classification accuracy. © 2011 IEEE
Recommended from our members
Time-frequency analysis based on split spectrum applied to audio and ultrasonic signals
This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonSignal processing is a large subject with applications integral to a number of technological fields such as communication, audio, Voice over IP (VoIP), pattern recognition, sonar, radar, ultrasound and medical imaging. Techniques exist for the analysis, modelling, extraction, recognition and synthesis of signals of interest. The focus of this thesis is signal processing for acoustics (both sonic and ultrasonic). In the applications examined, signals of interest are usually incomplete, distorted and/or noisy. Therefore, reconstructing the signal, noise reduction and removal of any distortion/interference are the main goals of the signal processing techniques presented. The primary aim is to study and develop an advanced time-frequency signal processing technique for acoustic applications to enhance the quality of the signals. In the first part of the thesis, a technique is presented that models and maintains the correlation between temporal and spectral parameters of audio signals. A novel Packet Loss Concealment (PLC) method is developed with applications to VoIP, audio broadcasting, and streaming. The problem of modelling the time-varying frequency spectrum in the context of PLC is addressed, and a novel solution is proposed for tracking and using the temporal motion of spectral flow to reconstruct the signal. The proposed method utilises a Time-Frequency Motion (TFM) matrix representation of the audio signal, where each frequency is tagged with a motion vector estimate that is assessed by cross-correlation of the movement of spectral energy within sub-bands across time frames. The missing packets are estimated using extrapolation or interpolation algorithms using a TFM matrix and then inverse transformed to the time-domain for reconstruction of the signal. The proposed method is compared with conventional approaches using objective Performance Evaluation of Speech Quality (PESQ), and subjective Mean Opinion Scores (MOS) in a range of packet loss from 5% to 20%. The evaluation results demonstrate that the proposed algorithm substantially improves performance by an average of 2.85% and 5.9% in terms of PESQ and MOS respectively. In the second part of the thesis, the proposed method is extended and modified to address challenges of excessive coherent noise arising from ultrasonic signals gathered during Guided Wave Testing (GWT). It is an advanced Non-destructive testing technique which is used over several branches of industry to inspect large structures for defects where the structural integrity is of concern. In such systems, signal interpretation can often be challenging due to the multi-modal and dispersive propagation of Ultrasonic Guided Waves (UGWs). The multi-modal and dispersive nature of the received signals hampers the ability to detect defects in a given structure. The Split-Spectrum Processing (SSP) method with application for such signal has been studied and reviewed quantitatively to measure the enhancement in terms of Signal-to-Noise Ratio (SNR) and spatial resolution. In this thesis, the influence of SSP filter bank parameters on these signals is studied and optimised to improve SNR and spatial resolution considerably. The proposed method is compared analytically and experimentally with conventional approaches. The proposed SSP algorithm substantially improves SNR by an average of 30dB. The conclusions reached in this thesis will contribute to the progression of the GWT technique through considerable improvement in defect detection capability.Centre for Electronic Systems Research (CESR) of Brunel University London, The National Structural Integrity Research Centre (NSIRC) and TWI Ltd
Mathematical Approaches for Image Enhancement Problems
This thesis develops novel techniques that can solve some image enhancement problems using theoretically and technically proven and very useful mathematical tools to image processing such as wavelet transforms, partial differential equations, and variational models. Three subtopics are mainly covered. First, color image denoising framework is introduced to achieve high quality denoising results by considering correlations between color components while existing denoising approaches can be plugged in flexibly. Second, a new and efficient framework for image contrast and color enhancement in the compressed wavelet domain is proposed. The proposed approach is capable of enhancing both global and local contrast and brightness as well as preserving color consistency. The framework does not require inverse transform for image enhancement since linear scale factors are directly applied to both scaling and wavelet coefficients in the compressed domain, which results in high computational efficiency. Also contaminated noise in the image can be efficiently reduced by introducing wavelet shrinkage terms adaptively in different scales. The proposed method is able to enhance a wavelet-coded image computationally efficiently with high image quality and less noise or other artifact. The experimental results show that the proposed method produces encouraging results both visually and numerically compared to some existing approaches. Finally, image inpainting problem is discussed. Literature review, psychological analysis, and challenges on image inpainting problem and related topics are described. An inpainting algorithm using energy minimization and texture mapping is proposed. Mumford-Shah energy minimization model detects and preserves edges in the inpainting domain by detecting both the main structure and the detailed edges. This approach utilizes faster hierarchical level set method and guarantees convergence independent of initial conditions. The estimated segmentation results in the inpainting domain are stored in segmentation map, which is referred by a texture mapping algorithm for filling textured regions. We also propose an inpainting algorithm using wavelet transform that can expect better global structure estimation of the unknown region in addition to shape and texture properties since wavelet transforms have been used for various image analysis problems due to its nice multi-resolution properties and decoupling characteristics
- …