74,034 research outputs found

    Reproducible Evaluation of Pan-Tilt-Zoom Tracking

    Get PDF
    Tracking with a Pan-Tilt-Zoom (PTZ) camera has been a research topic in computer vision for many years. However, it is very difficult to assess the progress that has been made on this topic because there is no standard evaluation methodology. The difficulty in evaluating PTZ tracking algorithms arises from their dynamic nature. In contrast to other forms of tracking, PTZ tracking involves both locating the target in the image and controlling the motors of the camera to aim it so that the target stays in its field of view. This type of tracking can only be performed online. In this paper, we propose a new evaluation framework based on a virtual PTZ camera. With this framework, tracking scenarios do not change for each experiment and we are able to replicate online PTZ camera control and behavior including camera positioning delays, tracker processing delays, and numerical zoom. We tested our evaluation framework with the Camshift tracker to show its viability and to establish baseline results.Comment: This is an extended version of the 2015 ICIP paper "Reproducible Evaluation of Pan-Tilt-Zoom Tracking

    The kindest cut: Enhancing the user experience of mobile tv through adequate zooming

    Get PDF
    The growing market of Mobile TV requires automated adaptation of standard TV footage to small size displays. Especially extreme long shots (XLS) depicting distant objects can spoil the user experience, e.g. in soccer content. Automated zooming schemes can improve the visual experience if the resulting footage meets user expectations in terms of the visual detail and quality but does not omit valuable context information. Current zooming schemes are ignorant of beneficial zoom ranges for a given target size when applied to standard definition TV footage. In two experiments 84 participants were able to switch between original and zoom enhanced soccer footage at three sizes - from 320x240 (QVGA) down to 176x144 (QCIF). Eye tracking and subjective ratings showed that zoom factors between 1.14 and 1.33 were preferred for all sizes. Interviews revealed that a zoom factor of 1.6 was too high for QVGA content due to low perceived video quality, but beneficial for QCIF size. The optimal zoom depended on the target display size. We include a function to compute the optimal zoom for XLS depending on the target device size. It can be applied in automatic content adaptation schemes and should stimulate further research on the requirements of different shot types in video coding

    A distributed camera system for multi-resolution surveillance

    Get PDF
    We describe an architecture for a multi-camera, multi-resolution surveillance system. The aim is to support a set of distributed static and pan-tilt-zoom (PTZ) cameras and visual tracking algorithms, together with a central supervisor unit. Each camera (and possibly pan-tilt device) has a dedicated process and processor. Asynchronous interprocess communications and archiving of data are achieved in a simple and effective way via a central repository, implemented using an SQL database. Visual tracking data from static views are stored dynamically into tables in the database via client calls to the SQL server. A supervisor process running on the SQL server determines if active zoom cameras should be dispatched to observe a particular target, and this message is effected via writing demands into another database table. We show results from a real implementation of the system comprising one static camera overviewing the environment under consideration and a PTZ camera operating under closed-loop velocity control, which uses a fast and robust level-set-based region tracker. Experiments demonstrate the effectiveness of our approach and its feasibility to multi-camera systems for intelligent surveillance

    Evaluation of trackers for Pan-Tilt-Zoom Scenarios

    Full text link
    Tracking with a Pan-Tilt-Zoom (PTZ) camera has been a research topic in computer vision for many years. Compared to tracking with a still camera, the images captured with a PTZ camera are highly dynamic in nature because the camera can perform large motion resulting in quickly changing capture conditions. Furthermore, tracking with a PTZ camera involves camera control to position the camera on the target. For successful tracking and camera control, the tracker must be fast enough, or has to be able to predict accurately the next position of the target. Therefore, standard benchmarks do not allow to assess properly the quality of a tracker for the PTZ scenario. In this work, we use a virtual PTZ framework to evaluate different tracking algorithms and compare their performances. We also extend the framework to add target position prediction for the next frame, accounting for camera motion and processing delays. By doing this, we can assess if predicting can make long-term tracking more robust as it may help slower algorithms for keeping the target in the field of view of the camera. Results confirm that both speed and robustness are required for tracking under the PTZ scenario.Comment: 6 pages, 2 figures, International Conference on Pattern Recognition and Artificial Intelligence 201

    Self-organising zooms for decentralised redundancy management in visual sensor networks

    Get PDF
    When visual sensor networks are composed of cameras which can adjust the zoom factor of their own lens, one must determine the optimal zoom levels for the cameras, for a given task. This gives rise to an important trade-off between the overlap of the different cameras’ fields of view, providing redundancy, and image quality. In an object tracking task, having multiple cameras observe the same area allows for quicker recovery, when a camera fails. In contrast having narrow zooms allow for a higher pixel count on regions of interest, leading to increased tracking confidence. In this paper we propose an approach for the self-organisation of redundancy in a distributed visual sensor network, based on decentralised multi-objective online learning using only local information to approximate the global state. We explore the impact of different zoom levels on these trade-offs, when tasking omnidirectional cameras, having perfect 360-degree view, with keeping track of a varying number of moving objects. We further show how employing decentralised reinforcement learning enables zoom configurations to be achieved dynamically at runtime according to an operator’s preference for maximising either the proportion of objects tracked, confidence associated with tracking, or redundancy in expectation of camera failure. We show that explicitly taking account of the level of overlap, even based only on local knowledge, improves resilience when cameras fail. Our results illustrate the trade-off between maintaining high confidence and object coverage, and maintaining redundancy, in anticipation of future failure. Our approach provides a fully tunable decentralised method for the self-organisation of redundancy in a changing environment, according to an operator’s preferences

    Face tracking using a hyperbolic catadioptric omnidirectional system

    Get PDF
    In the first part of this paper, we present a brief review on catadioptric omnidirectional systems. The special case of the hyperbolic omnidirectional system is analysed in depth. The literature shows that a hyperboloidal mirror has two clear advantages over alternative geometries. Firstly, a hyperboloidal mirror has a single projection centre [1]. Secondly, the image resolution is uniformly distributed along the mirror’s radius [2]. In the second part of this paper we show empirical results for the detection and tracking of faces from the omnidirectional images using Viola-Jones method. Both panoramic and perspective projections, extracted from the omnidirectional image, were used for that purpose. The omnidirectional image size was 480x480 pixels, in greyscale. The tracking method used regions of interest (ROIs) set as the result of the detections of faces from a panoramic projection of the image. In order to avoid losing or duplicating detections, the panoramic projection was extended horizontally. Duplications were eliminated based on the ROIs established by previous detections. After a confirmed detection, faces were tracked from perspective projections (which are called virtual cameras), each one associated with a particular face. The zoom, pan and tilt of each virtual camera was determined by the ROIs previously computed on the panoramic image. The results show that, when using a careful combination of the two projections, good frame rates can be achieved in the task of tracking faces reliably

    Automated Top View Registration of Broadcast Football Videos

    Full text link
    In this paper, we propose a novel method to register football broadcast video frames on the static top view model of the playing surface. The proposed method is fully automatic in contrast to the current state of the art which requires manual initialization of point correspondences between the image and the static model. Automatic registration using existing approaches has been difficult due to the lack of sufficient point correspondences. We investigate an alternate approach exploiting the edge information from the line markings on the field. We formulate the registration problem as a nearest neighbour search over a synthetically generated dictionary of edge map and homography pairs. The synthetic dictionary generation allows us to exhaustively cover a wide variety of camera angles and positions and reduce this problem to a minimal per-frame edge map matching procedure. We show that the per-frame results can be improved in videos using an optimization framework for temporal camera stabilization. We demonstrate the efficacy of our approach by presenting extensive results on a dataset collected from matches of football World Cup 2014

    Cognitive visual tracking and camera control

    Get PDF
    Cognitive visual tracking is the process of observing and understanding the behaviour of a moving person. This paper presents an efficient solution to extract, in real-time, high-level information from an observed scene, and generate the most appropriate commands for a set of pan-tilt-zoom (PTZ) cameras in a surveillance scenario. Such a high-level feedback control loop, which is the main novelty of our work, will serve to reduce uncertainties in the observed scene and to maximize the amount of information extracted from it. It is implemented with a distributed camera system using SQL tables as virtual communication channels, and Situation Graph Trees for knowledge representation, inference and high-level camera control. A set of experiments in a surveillance scenario show the effectiveness of our approach and its potential for real applications of cognitive vision
    corecore