121 research outputs found

    Cognitive visual tracking and camera control

    Get PDF
    Cognitive visual tracking is the process of observing and understanding the behaviour of a moving person. This paper presents an efficient solution to extract, in real-time, high-level information from an observed scene, and generate the most appropriate commands for a set of pan-tilt-zoom (PTZ) cameras in a surveillance scenario. Such a high-level feedback control loop, which is the main novelty of our work, will serve to reduce uncertainties in the observed scene and to maximize the amount of information extracted from it. It is implemented with a distributed camera system using SQL tables as virtual communication channels, and Situation Graph Trees for knowledge representation, inference and high-level camera control. A set of experiments in a surveillance scenario show the effectiveness of our approach and its potential for real applications of cognitive vision

    Action Recognition in Videos: from Motion Capture Labs to the Web

    Full text link
    This paper presents a survey of human action recognition approaches based on visual data recorded from a single video camera. We propose an organizing framework which puts in evidence the evolution of the area, with techniques moving from heavily constrained motion capture scenarios towards more challenging, realistic, "in the wild" videos. The proposed organization is based on the representation used as input for the recognition task, emphasizing the hypothesis assumed and thus, the constraints imposed on the type of video that each technique is able to address. Expliciting the hypothesis and constraints makes the framework particularly useful to select a method, given an application. Another advantage of the proposed organization is that it allows categorizing newest approaches seamlessly with traditional ones, while providing an insightful perspective of the evolution of the action recognition task up to now. That perspective is the basis for the discussion in the end of the paper, where we also present the main open issues in the area.Comment: Preprint submitted to CVIU, survey paper, 46 pages, 2 figures, 4 table

    Interpretation of complex situations in a semantic-based surveillance framework

    Get PDF
    The integration of cognitive capabilities in computer vision systems requires both to enable high semantic expressiveness and to deal with high computational costs as large amounts of data are involved in the analysis. This contribution describes a cognitive vision system conceived to automatically provide high-level interpretations of complex real-time situations in outdoor and indoor scenarios, and to eventually maintain communication with casual end users in multiple languages. The main contributions are: (i) the design of an integrative multilevel architecture for cognitive surveillance purposes; (ii) the proposal of a coherent taxonomy of knowledge to guide the process of interpretation, which leads to the conception of a situation-based ontology; (iii) the use of situational analysis for content detection and a progressive interpretation of semantically rich scenes, by managing incomplete or uncertain knowledge, and (iv) the use of such an ontological background to enable multilingual capabilities and advanced end-user interfaces. Experimental results are provided to show the feasibility of the proposed approach.This work was supported by the project 'CONSOLIDER-INGENIO 2010 Multimodal interaction in pattern recognition and computer vision' (V-00069). This work is supported by EC Grants IST-027110 for the HERMES project and IST-045547 for the VIDI-video project, and by the Spanish MEC under Projects TIN2006-14606 and CONSOLIDER-INGENIO 2010 (CSD2007-00018). Jordi Gonzàlez also acknowledges the support of a Juan de la Cierva Postdoctoral fellowship from the Spanish MEC.Peer Reviewe

    Detection and representation of moving objects for video surveillance

    Get PDF
    In this dissertation two new approaches have been introduced for the automatic detection of moving objects (such as people and vehicles) in video surveillance sequences. The first technique analyses the original video and exploits spatial and temporal information to find those pixels in the images that correspond to moving objects. The second technique analyses video sequences that have been encoded according to a recent video coding standard (H.264/AVC). As such, only the compressed features are analyzed to find moving objects. The latter technique results in a very fast and accurate detection (up to 20 times faster than the related work). Lastly, we investigated how different XML-based metadata standards can be used to represent information about these moving objects. We proposed the usage of Semantic Web Technologies to combine information described according to different metadata standards

    Semantic Spaces for Video Analysis of Behaviour

    Get PDF
    PhDThere are ever growing interests from the computer vision community into human behaviour analysis based on visual sensors. These interests generally include: (1) behaviour recognition - given a video clip or specific spatio-temporal volume of interest discriminate it into one or more of a set of pre-defined categories; (2) behaviour retrieval - given a video or textual description as query, search for video clips with related behaviour; (3) behaviour summarisation - given a number of video clips, summarise out representative and distinct behaviours. Although countless efforts have been dedicated into problems mentioned above, few works have attempted to analyse human behaviours in a semantic space. In this thesis, we define semantic spaces as a collection of high-dimensional Euclidean space in which semantic meaningful events, e.g. individual word, phrase and visual event, can be represented as vectors or distributions which are referred to as semantic representations. With the semantic space, semantic texts, visual events can be quantitatively compared by inner product, distance and divergence. The introduction of semantic spaces can bring lots of benefits for visual analysis. For example, discovering semantic representations for visual data can facilitate semantic meaningful video summarisation, retrieval and anomaly detection. Semantic space can also seamlessly bridge categories and datasets which are conventionally treated independent. This has encouraged the sharing of data and knowledge across categories and even datasets to improve recognition performance and reduce labelling effort. Moreover, semantic space has the ability to generalise learned model beyond known classes which is usually referred to as zero-shot learning. Nevertheless, discovering such a semantic space is non-trivial due to (1) semantic space is hard to define manually. Humans always have a good sense of specifying the semantic relatedness between visual and textual instances. But a measurable and finite semantic space can be difficult to construct with limited manual supervision. As a result, constructing semantic space from data is adopted to learn in an unsupervised manner; (2) It is hard to build a universal semantic space, i.e. this space is always contextual dependent. So it is important to build semantic space upon selected data such that it is always meaningful within the context. Even with a well constructed semantic space, challenges are still present including; (3) how to represent visual instances in the semantic space; and (4) how to mitigate the misalignment of visual feature and semantic spaces across categories and even datasets when knowledge/data are generalised. This thesis tackles the above challenges by exploiting data from different sources and building contextual semantic space with which data and knowledge can be transferred and shared to facilitate the general video behaviour analysis. To demonstrate the efficacy of semantic space for behaviour analysis, we focus on studying real world problems including surveillance behaviour analysis, zero-shot human action recognition and zero-shot crowd behaviour recognition with techniques specifically tailored for the nature of each problem. Firstly, for video surveillances scenes, we propose to discover semantic representations from the visual data in an unsupervised manner. This is due to the largely availability of unlabelled visual data in surveillance systems. By representing visual instances in the semantic space, data and annotations can be generalised to new events and even new surveillance scenes. Specifically, to detect abnormal events this thesis studies a geometrical alignment between semantic representation of events across scenes. Semantic actions can be thus transferred to new scenes and abnormal events can be detected in an unsupervised way. To model multiple surveillance scenes simultaneously, we show how to learn a shared semantic representation across a group of semantic related scenes through a multi-layer clustering of scenes. With multi-scene modelling we show how to improve surveillance tasks including scene activity profiling/understanding, crossscene query-by-example, behaviour classification, and video summarisation. Secondly, to avoid extremely costly and ambiguous video annotating, we investigate how to generalise recognition models learned from known categories to novel ones, which is often termed as zero-shot learning. To exploit the limited human supervision, e.g. category names, we construct the semantic space via a word-vector representation trained on large textual corpus in an unsupervised manner. Representation of visual instance in semantic space is obtained by learning a visual-to-semantic mapping. We notice that blindly applying the mapping learned from known categories to novel categories can cause bias and deteriorating the performance which is termed as domain shift. To solve this problem we employed techniques including semisupervised learning, self-training, hubness correction, multi-task learning and domain adaptation. All these methods in combine achieve state-of-the-art performance in zero-shot human action task. In the last, we study the possibility to re-use known and manually labelled semantic crowd attributes to recognise rare and unknown crowd behaviours. This task is termed as zero-shot crowd behaviours recognition. Crucially we point out that given the multi-labelled nature of semantic crowd attributes, zero-shot recognition can be improved by exploiting the co-occurrence between attributes. To summarise, this thesis studies methods for analysing video behaviours and demonstrates that exploring semantic spaces for video analysis is advantageous and more importantly enables multi-scene analysis and zero-shot learning beyond conventional learning strategies

    Automated Knowledge Generation with Persistent Surveillance Video

    Get PDF
    The Air Force has increasingly invested in persistent surveillance platforms gathering a large amount of surveillance video. Ordinarily, intelligence analysts watch the video to determine if suspicious activities are occurring. This approach to video analysis can be a very time and manpower intensive process. Instead, this thesis proposes that by using tracks generated from persistent video, we can build a model to detect events for an intelligence analyst. The event that we chose to detect was a suspicious surveillance activity known as a casing event. To test our model we used Global Positioning System (GPS) tracks generated from vehicles driving in an urban area. The results show that over 400 vehicles can be monitored simultaneously in real-time and casing events are detected with high probability (43 of 43 events detected with only 4 false positives). Casing event detections are augmented by determining which buildings are being targeted. In addition, persistent surveillance video is used to construct a social network from vehicle tracks based on the interactions of those tracks. Social networks that are constructed give us further information about the suspicious actors flagged by the casing event detector by telling us who the suspicious actor has interacted with and what buildings they have visited. The end result is a process that automatically generates information from persistent surveillance video providing additional knowledge and understanding to intelligence analysts about terrorist activities

    Creative Machine

    Get PDF
    Curators: William Latham, Atau Tanaka and Frederic Fol Leymarie A major exhibition exploring the twilight world of human/machine creativity, including installations, video and computer art, Artificial Intelligence, robotics and Apps by leading artists from Goldsmiths and international artists by invitation. The vision for organising the Creative Machine Exhibition is to show exciting works by key international artists, Goldsmiths staff and selected students who use original software and hardware development in the creative production of their work. The range of work on show, which could be broadly termed Computer Art, includes mechanical drawing devices, kinetic sculpture driven by fuzzy logic, images produced using machine learning, simulated cellular growth forms and the self-generating works using automated aesthetics, VR, 3D printing, and social telephony networks. Traditionally, Computer Art has held a maverick position on the edge of mainstream contemporary culture with its origins in Russian Constructivist Art, biological systems, “geeky” software conferences, rave / techno music and indie computer games. These artists have defined their own channels for exhibiting their work and organised conferences and at times been entrepreneurial at building collaborations with industry at both a corporate and startup level (with the early computer artists in the 1970s and 1980s needing to work with computer corporations to get access to computers). Alongside this, interactive media art drew upon McLuhan’s notion of technology as extensions of the human to create participatory, interactive artworks by making use of novel interface technology that has been developed since the 1980s. However, with new techniques such as 3D printing, the massive spread of sophisticated sensors in consumer devices like smartphones, and the use of robotics by artists, digital art would appear to have an opportunity to come more to the fore in public consciousness. This exhibition is timely in that it coincides with an apparent wider growth of public interest in digital art, as shown by the Digital Revolution exhibition at the Barbican, London and the recent emergence of commercial galleries such as Bitforms in New York and Carroll / Fletcher in London, which, acquire and show technology-based art. The Creative Machine exhibition is the first event to make use of Goldsmiths’ new Sonics Immersive Media Lab (SIML) Chamber. This advanced surround audiovisual projection space is a key part of the St James-Hatcham refurbishment. The facility was funded by capital funding from the Engineering & Physical Sciences Research Council (EPSRC) and Goldsmiths, as well as research funding from the European Research Council (ERC). This is connected respectively to the Intelligent Games/Game Intelligence (IGGI) Centre for Doctoral Training, and Atau Tanaka’s MetaGesture Music (MGM) ERC grant. The space was built by the SONICS, a cross-departmental research special interest group at Goldsmiths that brings together the departments of Computing, Music, Media & Communications, Sociology, Visual Cultures, and Cultural Studies. It was designed in consultation with the San Francisco-based curator, Naut Humon, to be compatible with the Cinechamber system there. During Creative Machines, we shall see, in the SIML space, multiscreen screenings of work by Yoichiro Kawaguchi, Naoko Tosa, and Vesna Petresin, as well as a new immersive media work by IGGI researcher Memo Akten

    A cognotive stylistic analysis of a selection of contemporary egyptian novels

    Get PDF
    [ES] Esta tesis doctoral ofrece un comprensivo análisis cognitivo de tres novelas egipcias de autores contemporáneos: Book of Sands (2015) de Karim Alrawi; Taxi (2016) de Khalid Al Khamissi y The Day the Leader Was Killed (1985) de Naguib Mahfouz. Mediante la implementación de los marcos teóricos de Text World Theory (Werth, 1999; Gavins, 2007) y Blending Theory (también conocida en español como la teoría de integración conceptual) (Fauconnier and Turner, 2002), esta tesis persigue tres objetivos principales. En primer lugar, demostrar cómo la Text World Theory ayuda al lector a entender la narración como una estructura conceptual constituida por tres capas conceptuales interrelacionadas: the discourse-world, text-worlds y sub-worlds. En segundo lugar, evidenciar el papel fundamental que desempeña la Blending Theory en la correcta interpretación de metáforas a nivel de oración, además de exponer cómo la ironía y el humor que surge de la colisión de elementos incongruentes en estos constructos metafóricos se utilizan para criticar aspectos políticos y socio-culturales presentes en Egipto. Por ultimo, esta tesis busca ilustrar cómo la combinación de la Text World Theory y Blending Theory representa un método efectivo que permite entender las novelas desde niveles micro y macro textuales. Para este fin, se ha seleccionado la Text World Theory como marco discursivo para el macroanálisis de Book of Sands, mientras que la Blending Theory se ha utilizado para el análisis detallado de metáforas a nivel oracional en Taxi. Estos análisis revelan que cada una de las teorías aborda aspectos específicos del texto literario, poniendo en valor su combinación como una estrategia efectiva que comporta una investigación holística de las novelas egipcias seleccionadas, exponiendo así sus complejidades e intersecciones. Por este motivo, ambas aproximaciones se han integrado en el análisis de The Day the Leader Was Killed. Esta integración ha demostrado ser una herramienta de análisis útil al permitir, tanto al lector nacional como internacional, la comprensión completa de la narración y la revelación de mensajes ocultos y realidades encubiertas de la sociedad egipcia

    Acta Cybernetica : Volume 25. Number 2.

    Get PDF

    Biometric and behavioural mass surveillance in EU member states: report for the Greens/EFA in the European Parliament

    Get PDF
    The aim of this report is to establish a problematised overview of what we know about what is currently being done in Europe when it comes to remote biometric identification (RBI), and to assess in which cases we could potentially fall into forms of biometric mass surveillance.Institutions, Decisions and Collective Behaviou
    corecore