569 research outputs found

    Discovery of Shared Semantic Spaces for Multiscene Video Query and Summarization.

    Get PDF
    The growing rate of public space CCTV installations has generated a need for automated methods for exploiting video surveillance data including scene understanding, query, behaviour annotation and summarization. For this reason, extensive research has been performed on surveillance scene understanding and analysis. However, most studies have considered single scenes, or groups of adjacent scenes. The semantic similarity between different but related scenes (e.g., many different traffic scenes of similar layout) is not generally exploited to improve any automated surveillance tasks and reduce manual effort. Exploiting commonality, and sharing any supervised annotations, between different scenes is however challenging due to: Some scenes are totally un-related -- and thus any information sharing between them would be detrimental; while others may only share a subset of common activities -- and thus information sharing is only useful if it is selective. Moreover, semantically similar activities which should be modelled together and shared across scenes may have quite different pixel-level appearance in each scene. To address these issues we develop a new framework for distributed multiple-scene global understanding that clusters surveillance scenes by their ability to explain each other's behaviours; and further discovers which subset of activities are shared versus scene-specific within each cluster. We show how to use this structured representation of multiple scenes to improve common surveillance tasks including scene activity understanding, cross-scene query-by-example, behaviour classification with reduced supervised labelling requirements, and video summarization. In each case we demonstrate how our multi-scene model improves on a collection of standard single scene models and a flat model of all scenes.Comment: Multi-Scene Traffic Behaviour Analysis ---- Accepted at IEEE Transactions on Circuits and Systems for Video Technolog

    Semantic Spaces for Video Analysis of Behaviour

    Get PDF
    PhDThere are ever growing interests from the computer vision community into human behaviour analysis based on visual sensors. These interests generally include: (1) behaviour recognition - given a video clip or specific spatio-temporal volume of interest discriminate it into one or more of a set of pre-defined categories; (2) behaviour retrieval - given a video or textual description as query, search for video clips with related behaviour; (3) behaviour summarisation - given a number of video clips, summarise out representative and distinct behaviours. Although countless efforts have been dedicated into problems mentioned above, few works have attempted to analyse human behaviours in a semantic space. In this thesis, we define semantic spaces as a collection of high-dimensional Euclidean space in which semantic meaningful events, e.g. individual word, phrase and visual event, can be represented as vectors or distributions which are referred to as semantic representations. With the semantic space, semantic texts, visual events can be quantitatively compared by inner product, distance and divergence. The introduction of semantic spaces can bring lots of benefits for visual analysis. For example, discovering semantic representations for visual data can facilitate semantic meaningful video summarisation, retrieval and anomaly detection. Semantic space can also seamlessly bridge categories and datasets which are conventionally treated independent. This has encouraged the sharing of data and knowledge across categories and even datasets to improve recognition performance and reduce labelling effort. Moreover, semantic space has the ability to generalise learned model beyond known classes which is usually referred to as zero-shot learning. Nevertheless, discovering such a semantic space is non-trivial due to (1) semantic space is hard to define manually. Humans always have a good sense of specifying the semantic relatedness between visual and textual instances. But a measurable and finite semantic space can be difficult to construct with limited manual supervision. As a result, constructing semantic space from data is adopted to learn in an unsupervised manner; (2) It is hard to build a universal semantic space, i.e. this space is always contextual dependent. So it is important to build semantic space upon selected data such that it is always meaningful within the context. Even with a well constructed semantic space, challenges are still present including; (3) how to represent visual instances in the semantic space; and (4) how to mitigate the misalignment of visual feature and semantic spaces across categories and even datasets when knowledge/data are generalised. This thesis tackles the above challenges by exploiting data from different sources and building contextual semantic space with which data and knowledge can be transferred and shared to facilitate the general video behaviour analysis. To demonstrate the efficacy of semantic space for behaviour analysis, we focus on studying real world problems including surveillance behaviour analysis, zero-shot human action recognition and zero-shot crowd behaviour recognition with techniques specifically tailored for the nature of each problem. Firstly, for video surveillances scenes, we propose to discover semantic representations from the visual data in an unsupervised manner. This is due to the largely availability of unlabelled visual data in surveillance systems. By representing visual instances in the semantic space, data and annotations can be generalised to new events and even new surveillance scenes. Specifically, to detect abnormal events this thesis studies a geometrical alignment between semantic representation of events across scenes. Semantic actions can be thus transferred to new scenes and abnormal events can be detected in an unsupervised way. To model multiple surveillance scenes simultaneously, we show how to learn a shared semantic representation across a group of semantic related scenes through a multi-layer clustering of scenes. With multi-scene modelling we show how to improve surveillance tasks including scene activity profiling/understanding, crossscene query-by-example, behaviour classification, and video summarisation. Secondly, to avoid extremely costly and ambiguous video annotating, we investigate how to generalise recognition models learned from known categories to novel ones, which is often termed as zero-shot learning. To exploit the limited human supervision, e.g. category names, we construct the semantic space via a word-vector representation trained on large textual corpus in an unsupervised manner. Representation of visual instance in semantic space is obtained by learning a visual-to-semantic mapping. We notice that blindly applying the mapping learned from known categories to novel categories can cause bias and deteriorating the performance which is termed as domain shift. To solve this problem we employed techniques including semisupervised learning, self-training, hubness correction, multi-task learning and domain adaptation. All these methods in combine achieve state-of-the-art performance in zero-shot human action task. In the last, we study the possibility to re-use known and manually labelled semantic crowd attributes to recognise rare and unknown crowd behaviours. This task is termed as zero-shot crowd behaviours recognition. Crucially we point out that given the multi-labelled nature of semantic crowd attributes, zero-shot recognition can be improved by exploiting the co-occurrence between attributes. To summarise, this thesis studies methods for analysing video behaviours and demonstrates that exploring semantic spaces for video analysis is advantageous and more importantly enables multi-scene analysis and zero-shot learning beyond conventional learning strategies

    Video Content Summarization

    Get PDF
    Bezpečnostné kamery denne vyprodukujú enormné množstvo video záznamov. Ľudská analýza daného objemu záznamov je prakticky nemožná. Sumarizačný systém by bol v mnohých prípadoch veľkým prínosom. Táto práca definuje problém video sumarizácie na základe jeho vstupov, výstupov a podproblémov. Práca zároveň identifikuje vhodné techniky a existujúce práce na túto tému, pričom taktiež predstavuje návrh vhodného riešenia. Navrhnutý systém bol implementovaný a výsledky vyhodnotené.The amount surveillance footage recorded each day is too large for human operators to analyze. A video summary system to process and refine this video data would prove beneficial in many instances. This work defines the problem in terms of its inputs, outputs and sub-problems, identifies suitable techniques and existing works as well as describes a design of such system. The system is implemented, and the results are examined.

    DragonflEYE: a passive approach to aerial collision sensing

    Get PDF
    "This dissertation describes the design, development and test of a passive wide-field optical aircraft collision sensing instrument titled 'DragonflEYE'. Such a ""sense-and-avoid"" instrument is desired for autonomous unmanned aerial systems operating in civilian airspace. The instrument was configured as a network of smart camera nodes and implemented using commercial, off-the-shelf components. An end-to-end imaging train model was developed and important figures of merit were derived. Transfer functions arising from intermediate mediums were discussed and their impact assessed. Multiple prototypes were developed. The expected performance of the instrument was iteratively evaluated on the prototypes, beginning with modeling activities followed by laboratory tests, ground tests and flight tests. A prototype was mounted on a Bell 205 helicopter for flight tests, with a Bell 206 helicopter acting as the target. Raw imagery was recorded alongside ancillary aircraft data, and stored for the offline assessment of performance. The ""range at first detection"" (R0), is presented as a robust measure of sensor performance, based on a suitably defined signal-to-noise ratio. The analysis treats target radiance fluctuations, ground clutter, atmospheric effects, platform motion and random noise elements. Under the measurement conditions, R0 exceeded flight crew acquisition ranges. Secondary figures of merit are also discussed, including time to impact, target size and growth, and the impact of resolution on detection range. The hardware was structured to facilitate a real-time hierarchical image-processing pipeline, with selected image processing techniques introduced. In particular, the height of an observed event above the horizon compensates for angular motion of the helicopter platform.

    A Survey on Video-based Graphics and Video Visualization

    Get PDF

    KOLAM : human computer interfaces fro visual analytics in big data imagery

    Get PDF
    In the present day, we are faced with a deluge of disparate and dynamic information from multiple heterogeneous sources. Among these are the big data imagery datasets that are rapidly being generated via mature acquisition methods in the geospatial, surveillance (specifically, Wide Area Motion Imagery or WAMI) and biomedical domains. The need to interactively visualize these imagery datasets by using multiple types of views (as needed) into the data is common to these domains. Furthermore, researchers in each domain have additional needs: users of WAMI datasets also need to interactively track objects of interest using algorithms of their choice, visualize the resulting object trajectories and interactively edit these results as needed. While software tools that fulfill each of these requirements individually are available and well-used at present, there is still a need for tools that can combine the desired aspects of visualization, human computer interaction (HCI), data analysis, data management, and (geo-)spatial and temporal data processing into a single flexible and extensible system. KOLAM is an open, cross-platform, interoperable, scalable and extensible framework for visualization and analysis that we have developed to fulfil the above needs. The novel contributions in this thesis are the following: 1) Spatio-temporal caching for animating both giga-pixel and Full Motion Video (FMV) imagery, 2) Human computer interfaces purposefully designed to accommodate big data visualization, 3) Human-in-the-loop interactive video object tracking - ground-truthing of moving objects in wide area imagery using algorithm assisted human-in-the-loop coupled tracking, 4) Coordinated visualization using stacked layers, side-by-side layers/video sub-windows and embedded imagery, 5) Efficient one-click manual tracking, editing and data management of trajectories, 6) Efficient labeling of image segmentation regions and passing these results to desired modules, 7) Visualization of image processing results generated by non-interactive operators using layers, 8) Extension of interactive imagery and trajectory visualization to multi-monitor wall display environments, 9) Geospatial applications: Providing rapid roam, zoom and hyper-jump spatial operations, interactive blending, colormap and histogram enhancement, spherical projection and terrain maps, 10) Biomedical applications: Visualization and target tracking of cell motility in time-lapse cell imagery, collecting ground-truth from experts on whole-slide imagery (WSI) for developing histopathology analytic algorithms and computer-aided diagnosis for cancer grading, and easy-to-use tissue annotation features.Includes bibliographical reference

    Articulated human tracking and behavioural analysis in video sequences

    Get PDF
    Recently, there has been a dramatic growth of interest in the observation and tracking of human subjects through video sequences. Arguably, the principal impetus has come from the perceived demand for technological surveillance, however applications in entertainment, intelligent domiciles and medicine are also increasing. This thesis examines human articulated tracking and the classi cation of human movement, rst separately and then as a sequential process. First, this thesis considers the development and training of a 3D model of human body structure and dynamics. To process video sequences, an observation model is also designed with a multi-component likelihood based on edge, silhouette and colour. This is de ned on the articulated limbs, and visible from a single or multiple cameras, each of which may be calibrated from that sequence. Second, for behavioural analysis, we develop a methodology in which actions and activities are described by semantic labels generated from a Movement Cluster Model (MCM). Third, a Hierarchical Partitioned Particle Filter (HPPF) was developed for human tracking that allows multi-level parameter search consistent with the body structure. This tracker relies on the articulated motion prediction provided by the MCM at pose or limb level. Fourth, tracking and movement analysis are integrated to generate a probabilistic activity description with action labels. The implemented algorithms for tracking and behavioural analysis are tested extensively and independently against ground truth on human tracking and surveillance datasets. Dynamic models are shown to predict and generate synthetic motion, while MCM recovers both periodic and non-periodic activities, de ned either on the whole body or at the limb level. Tracking results are comparable with the state of the art, however the integrated behaviour analysis adds to the value of the approach.Overseas Research Students Awards Scheme (ORSAS

    Discovering visual attributes from image and video data

    Get PDF
    corecore