33 research outputs found

    Detection and Classification of Multiple Person Interaction

    Get PDF
    Institute of Perception, Action and BehaviourThis thesis investigates the classification of the behaviour of multiple persons when viewed from a video camera. Work upon a constrained case of multiple person interaction in the form of team games is investigated. A comparison between attempting to model individual features using a (hierarchical dynamic model) and modelling the team as a whole (using a support vector machine) is given. It is shown that for team games such as handball it is preferable to model the whole team. In such instances correct classification performance of over 80% are attained. A more general case of interaction is then considered. Classification of interacting people in a surveillance situation over several datasets is then investigated. We introduce a new feature set and compare several methods with the previous best published method (Oliver 2000) and demonstrate an improvement in performance. Classification rates of over 95% on real video data sequences are demonstrated. An investigation into how the length of time a sequence is observed is then performed. This results in an improved classifier (of over 2%) which uses a class dependent window size. The question of detecting pre/post and actual fighting situations is then addressed. A hierarchical AdaBoost classifier is used to demonstrate the ability to classify such situations. It is demonstrated that such an approach can classify 91% of fighting situations correctly

    Efficient duration modelling in the hierarchical hidden semi-Markov models and their applications

    Get PDF
    Modeling patterns in temporal data has arisen as an important problem in engineering and science. This has led to the popularity of several dynamic models, in particular the renowned hidden Markov model (HMM) [Rabiner, 1989]. Despite its widespread success in many cases, the standard HMM often fails to model more complex data whose elements are correlated hierarchically or over a long period. Such problems are, however, frequently encountered in practice. Existing efforts to overcome this weakness often address either one of these two aspects separately, mainly due to computational intractability. Motivated by this modeling challenge in many real world problems, in particular, for video surveillance and segmentation, this thesis aims to develop tractable probabilistic models that can jointly model duration and hierarchical information in a unified framework. We believe that jointly exploiting statistical strength from both properties will lead to more accurate and robust models for the needed task. To tackle the modeling aspect, we base our work on an intersection between dynamic graphical models and statistics of lifetime modeling. Realizing that the key bottleneck found in the existing works lies in the choice of the distribution for a state, we have successfully integrated the discrete Coxian distribution [Cox, 1955], a special class of phase-type distributions, into the HMM to form a novel and powerful stochastic model termed as the Coxian Hidden Semi-Markov Model (CxHSMM). We show that this model can still be expressed as a dynamic Bayesian network, and inference and learning can be derived analytically.Most importantly, it has four superior features over existing semi-Markov modelling: the parameter space is compact, computation is fast (almost the same as the HMM), close-formed estimation can be derived, and the Coxian is flexible enough to approximate a large class of distributions. Next, we exploit hierarchical decomposition in the data by borrowing analogy from the hierarchical hidden Markov model in [Fine et al., 1998, Bui et al., 2004] and introduce a new type of shallow structured graphical model that combines both duration and hierarchical modelling into a unified framework, termed the Coxian Switching Hidden Semi-Markov Models (CxSHSMM). The top layer is a Markov sequence of switching variables, while the bottom layer is a sequence of concatenated CxHSMMs whose parameters are determined by the switching variable at the top. Again, we provide a thorough analysis along with inference and learning machinery. We also show that semi-Markov models with arbitrary depth structure can easily be developed. In all cases we further address two practical issues: missing observations to unstable tracking and the use of partially labelled data to improve training accuracy. Motivated by real-world problems, our application contribution is a framework to recognize complex activities of daily livings (ADLs) and detect anomalies to provide better intelligent caring services for the elderly.Coarser activities with self duration distributions are represented using the CxHSMM. Complex activities are made of a sequence of coarser activities and represented at the top level in the CxSHSMM. Intensive experiments are conducted to evaluate our solutions against existing methods. In many cases, the superiority of the joint modeling and the Coxian parameterization over traditional methods is confirmed. The robustness of our proposed models is further demonstrated in a series of more challenging experiments, in which the tracking is often lost and activities considerably overlap. Our final contribution is an application of the switching Coxian model to segment education-oriented videos into coherent topical units. Our results again demonstrate such segmentation processes can benefit greatly from the joint modeling of duration and hierarchy

    Driver lane change intention inference using machine learning methods.

    Get PDF
    Lane changing manoeuvre on highway is a highly interactive task for human drivers. The intelligent vehicles and the advanced driver assistance systems (ADAS) need to have proper awareness of the traffic context as well as the driver. The ADAS also need to understand the driver potential intent correctly since it shares the control authority with the human driver. This study provides a research on the driver intention inference, particular focus on the lane change manoeuvre on highways. This report is organised in a paper basis, where each chapter corresponding to a publication, which is submitted or to be submitted. Part Ⅰ introduce the motivation and general methodology framework for this thesis. Part Ⅱ includes the literature survey and the state-of-art of driver intention inference. Part Ⅲ contains the techniques for traffic context perception that focus on the lane detection. A literature review on lane detection techniques and its integration with parallel driving framework is proposed. Next, a novel integrated lane detection system is designed. Part Ⅳ contains two parts, which provides the driver behaviour monitoring system for normal driving and secondary tasks detection. The first part is based on the conventional feature selection methods while the second part introduces an end-to-end deep learning framework. The design and analysis of driver lane change intention inference system for the lane change manoeuvre is proposed in Part Ⅴ. Finally, discussions and conclusions are made in Part Ⅵ. A major contribution of this project is to propose novel algorithms which accurately model the driver intention inference process. Lane change intention will be recognised based on machine learning (ML) methods due to its good reasoning and generalizing characteristics. Sensors in the vehicle are used to capture context traffic information, vehicle dynamics, and driver behaviours information. Machine learning and image processing are the techniques to recognise human driver behaviour.PhD in Transpor

    Audio-coupled video content understanding of unconstrained video sequences

    Get PDF
    Unconstrained video understanding is a difficult task. The main aim of this thesis is to recognise the nature of objects, activities and environment in a given video clip using both audio and video information. Traditionally, audio and video information has not been applied together for solving such complex task, and for the first time we propose, develop, implement and test a new framework of multi-modal (audio and video) data analysis for context understanding and labelling of unconstrained videos. The framework relies on feature selection techniques and introduces a novel algorithm (PCFS) that is faster than the well-established SFFS algorithm. We use the framework for studying the benefits of combining audio and video information in a number of different problems. We begin by developing two independent content recognition modules. The first one is based on image sequence analysis alone, and uses a range of colour, shape, texture and statistical features from image regions with a trained classifier to recognise the identity of objects, activities and environment present. The second module uses audio information only, and recognises activities and environment. Both of these approaches are preceded by detailed pre-processing to ensure that correct video segments containing both audio and video content are present, and that the developed system can be made robust to changes in camera movement, illumination, random object behaviour etc. For both audio and video analysis, we use a hierarchical approach of multi-stage classification such that difficult classification tasks can be decomposed into simpler and smaller tasks. When combining both modalities, we compare fusion techniques at different levels of integration and propose a novel algorithm that combines advantages of both feature and decision-level fusion. The analysis is evaluated on a large amount of test data comprising unconstrained videos collected for this work. We finally, propose a decision correction algorithm which shows that further steps towards combining multi-modal classification information effectively with semantic knowledge generates the best possible results

    NEIGHBORHOOD-LEVEL LEARNING TECHNIQUES FOR NONPARAMETRIC SCENE MODELS

    Get PDF
    Scene model based segmentation of video into foreground and background structure has long been an important and ongoing research topic in image processing and computer vision. Segmentation of complex video scenes into binary foreground/background label images is often the first step in a wide range of video processing applications. Examples of common applications include surveillance, Traffic Monitoring, People Tracking, Activity Recognition, and Event Detection.A wide range of scene modeling techniques have been proposed for identifying foreground pixels or regions in surveillance video. Broadly speaking, the purpose of a scene model is to characterize the distribution of features in an image block or pixel over time. In the majority of cases, the scene model is used to represent the distribution of background features (background modeling) and the distribution of foreground features is assumed to be uniform or Gaussian. In other cases, the model characterizes the distribution of foreground and background values and the segmentation is performed by maximum likelihood.Pixel-level scene models characterize the distributions of spatiotemporally localized image features centered about each pixel location in video over time. Individual video frames are segmented into foreground and background regions based on a comparison between pixel-level features from within the frame under segmentation and the appropriate elements of the scene model at the corresponding pixel location. Prominent pixel level scene models include the Single Gaussian, Gaussian Mixture Model and Kernel Density Estimation.Recently reported advancements in scene modeling techniques have been largely based on the exploitation of local coherency in natural imagery based on integration of neighborhood information among nonparametric pixel-level scene models. The earliest scene models inadvertently made use of neighborhood information because they modeled images at the block level. As the resolution of the scene models progressed, textural image features such as the spatial derivative, local binary pattern (LBP) or Wavelet coefficients were employed to provide neighborhood-level structural information in the pixel-level models. In the most recent case, Barnich and Van DroogenBroeck proposed the Visual Background Extractor (ViBe), where neighborhood-level information is incorporated into the scene model in the learning step. In ViBe, the learning function is distributed over a small region such that new background information is absorbed at both the pixel and neighborhood level.In this dissertation, I present a nonparametric pixel level scene model based on several recently reported stochastic video segmentations algorithms. I propose new stochastic techniques for updating scene models over time that are focused on the incorporation of neighborhood-level features into the model learning process and demonstrate the effectiveness of the system on a wide range of challenging visual tasks. Specifically, I propose a model maintenance policy that is based on the replacement of outliers within each nonparametric pixel level model through kernel density estimation (KDE) and a neighborhood diffusion procedure where information sharing between adjacent models having significantly different shapes is discouraged. Quantitative results are compared using the well known percentage correct classification (PCC) and a new probability correct classification (PrCC) metric, where the underlying models are scrutinized prior to application of a final segmentation threshold. In all cases considered, the superiority of the proposed model with respect to the existing state-of-the-art techniques is well established
    corecore