7,099 research outputs found

    Task-Driven Video Collection

    Get PDF
    Vision systems are increasingly being deployed to perform complex surveillance tasks. While improved algorithms are being developed to perform these tasks, it is also important that data suitable for these algorithms be acquired - a non-trivial task in a dynamic and crowded scene viewed by multiple PTZ cameras. In this paper, we describe a multi-camera system that collects images and videos of moving objects in such scenes, subject to task constraints. The system constructs "task visibility intervals" that contain information about what can be sensed in future time intervals. Constructing these intervals requires prediction of future object motion and consideration of several factors such as object occlusion and camera control parameters. Using a plane-sweep algorithm, these atomic intervals can be combined to form multi-task intervals, during which a single camera can collect videos suitable for multiple tasks simultaneously. Although cameras can then be scheduled based on the constructed intervals, finding an optimal schedule is a typical NP-hard problem. Due to this, and the lack of exact future information in a dynamic environment, we propose several methods for fast camera scheduling that yield solutions within a small constant factor of optimal. Experimental results illustrate system capabilities for both real and more complicated simulated scenarios

    Systems and algorithms for autonomously simultaneous observation of multiple objects using robotic PTZ cameras assisted by a wide-angle camera

    Full text link
    Abstract — We report an autonomous observation system with multiple pan-tilt-zoom (PTZ) cameras assisted by a fixed wide-angle camera. The wide-angle camera provides large but low resolution coverage and detects and tracks all moving objects in the scene. Based on the output of the wide-angle camera, the system generates spatiotemporal observation requests for each moving object, which are candidates for close-up views using PTZ cameras. Due to the fact that there are usually much more objects than the number of PTZ cameras, the system first assigns a subset of the requests/objects to each PTZ camera. The PTZ cameras then select the parameter settings that best satisfy the assigned competing requests to provide high resolution views of the moving objects. We solve the request assignment and the camera parameter selection problems in real time. The effectiveness of the proposed system is validated in comparison with an existing work using simulation. The simulation results show that in heavy traffic scenarios, our algorithm increases the number of observed objects by over 200%. I

    Bullying10K: A Large-Scale Neuromorphic Dataset towards Privacy-Preserving Bullying Recognition

    Full text link
    The prevalence of violence in daily life poses significant threats to individuals' physical and mental well-being. Using surveillance cameras in public spaces has proven effective in proactively deterring and preventing such incidents. However, concerns regarding privacy invasion have emerged due to their widespread deployment. To address the problem, we leverage Dynamic Vision Sensors (DVS) cameras to detect violent incidents and preserve privacy since it captures pixel brightness variations instead of static imagery. We introduce the Bullying10K dataset, encompassing various actions, complex movements, and occlusions from real-life scenarios. It provides three benchmarks for evaluating different tasks: action recognition, temporal action localization, and pose estimation. With 10,000 event segments, totaling 12 billion events and 255 GB of data, Bullying10K contributes significantly by balancing violence detection and personal privacy persevering. And it also poses a challenge to the neuromorphic dataset. It will serve as a valuable resource for training and developing privacy-protecting video systems. The Bullying10K opens new possibilities for innovative approaches in these domains.Comment: Accepted at the 37th Conference on Neural Information Processing Systems (NeurIPS 2023) Track on Datasets and Benchmark

    Development of an Active Vision System for the Remote Identification of Multiple Targets

    Get PDF
    This thesis introduces a centralized active vision system for the remote identification of multiple targets in applications where the targets may outnumber the active system resources. Design and implementation details of a modular active vision system are presented, from which a prototype has been constructed. The system employs two different, yet complimentary, camera technologies. Omnidirectional cameras are used to detect and track targets at a low resolution, while perspective cameras mounted to pan-tilt stages are used to acquire high resolution images suitable for identification. Five greedy-based scheduling policies have been developed and implemented to manage the active system resources in an attempt to achieve optimal target-to-camera assignments. System performance has been evaluated using both simulated and real-world experiments under different target and system configurations for all five scheduling policies. Parameters affecting performance that were considered include: target entry conditions, congestion levels, target to camera speeds, target trajectories, and number of active cameras. An overall trend in the relative performance of the scheduling algorithms was observed. The Least System Reconfiguration and Future Least System Reconfiguration scheduling policies performed the best for the majority of conditions investigated, while the Load Sharing and First Come First Serve policies performed the poorest. The performance of the Earliest Deadline First policy was seen to be highly dependent on target predictability

    Development of an Active Vision System for the Remote Identification of Multiple Targets

    Get PDF
    This thesis introduces a centralized active vision system for the remote identification of multiple targets in applications where the targets may outnumber the active system resources. Design and implementation details of a modular active vision system are presented, from which a prototype has been constructed. The system employs two different, yet complimentary, camera technologies. Omnidirectional cameras are used to detect and track targets at a low resolution, while perspective cameras mounted to pan-tilt stages are used to acquire high resolution images suitable for identification. Five greedy-based scheduling policies have been developed and implemented to manage the active system resources in an attempt to achieve optimal target-to-camera assignments. System performance has been evaluated using both simulated and real-world experiments under different target and system configurations for all five scheduling policies. Parameters affecting performance that were considered include: target entry conditions, congestion levels, target to camera speeds, target trajectories, and number of active cameras. An overall trend in the relative performance of the scheduling algorithms was observed. The Least System Reconfiguration and Future Least System Reconfiguration scheduling policies performed the best for the majority of conditions investigated, while the Load Sharing and First Come First Serve policies performed the poorest. The performance of the Earliest Deadline First policy was seen to be highly dependent on target predictability

    Wide-Area Surveillance System using a UAV Helicopter Interceptor and Sensor Placement Planning Techniques

    Get PDF
    This project proposes and describes the implementation of a wide-area surveillance system comprised of a sensor/interceptor placement planning and an interceptor unmanned aerial vehicle (UAV) helicopter. Given the 2-D layout of an area, the planning system optimally places perimeter cameras based on maximum coverage and minimal cost. Part of this planning system includes the MATLAB implementation of Erdem and Sclaroff’s Radial Sweep algorithm for visibility polygon generation. Additionally, 2-D camera modeling is proposed for both fixed and PTZ cases. Finally, the interceptor is also placed to minimize shortest-path flight time to any point on the perimeter during a detection event. Secondly, a basic flight control system for the UAV helicopter is designed and implemented. The flight control system’s primary goal is to hover the helicopter in place when a human operator holds an automatic-flight switch. This system represents the first step in a complete waypoint-navigation flight control system. The flight control system is based on an inertial measurement unit (IMU) and a proportional-integral-derivative (PID) controller. This system is implemented using a general-purpose personal computer (GPPC) running Windows XP and other commercial off-the-shelf (COTS) hardware. This setup differs from other helicopter control systems which typically use custom embedded solutions or micro-controllers. Experiments demonstrate the sensor placement planning achieving \u3e90% coverage at optimized-cost for several typical areas given multiple camera types and parameters. Furthermore, the helicopter flight control system experiments achieve hovering success over short flight periods. However, the final conclusion is that the COTS IMU is insufficient for high-speed, high-frequency applications such as a helicopter control system

    Learning Visual Patterns: Imposing Order on Objects, Trajectories and Networks

    Get PDF
    Fundamental to many tasks in the field of computer vision, this work considers the understanding of observed visual patterns in static images and dynamic scenes . Within this broad domain, we focus on three particular subtasks, contributing novel solutions to: (a) the subordinate categorization of objects (avian species specifically), (b) the analysis of multi-agent interactions using the agent trajectories, and (c) the estimation of camera network topology. In contrast to object recognition, where the presence or absence of certain parts is generally indicative of basic-level category, the problem of subordinate categorization rests on the ability to establish salient distinctions amongst the characteristics of those parts which comprise the basic-level category. Focusing on an avian domain due to the fine-grained structure of the category taxonomy, we explore a pose-normalized appearance model based on a volumetric poselet scheme. The variation in shape and appearance properties of these parts across a taxonomy provides the cues needed for subordinate categorization. Our model associates the underlying image pattern parameters used for detection with corresponding volumetric part location, scale and orientation parameters. These parameters implicitly define a mapping from the image pixels into a pose-normalized appearance space, removing view and pose dependencies, facilitating fine-grained categorization with relatively few training examples. We next examine the problem of leveraging trajectories to understand interactions in dynamic multi-agent environments. We focus on perceptual tasks, those for which an agent's behavior is governed largely by the individuals and objects around them. We introduce kinetic accessibility, a model for evaluating the perceived, and thus anticipated, movements of other agents. This new model is then applied to the analysis of basketball footage. The kinetic accessibility measures are coupled with low-level visual cues and domain-specific knowledge for determining which player has possession of the ball and for recognizing events such as passes, shots and turnovers. Finally, we present two differing approaches for estimating camera network topology. The first technique seeks to partition a set of observations made in the camera network into individual object trajectories. As exhaustive consideration of the partition space is intractable, partitions are considered incrementally, adding observations while pruning unlikely partitions. Partition likelihood is determined by the evaluation of a probabilistic graphical model, balancing the consistency of appearances across a hypothesized trajectory with the latest predictions of camera adjacency. A primarily benefit of estimating object trajectories is that higher-order statistics, as opposed to just first-order adjacency, can be derived, yielding resilience to camera failure and the potential for improved tracking performance between cameras. Unlike the former centralized technique, the latter takes a decentralized approach, estimating the global network topology with local computations using sequential Bayesian estimation on a modified multinomial distribution. Key to this method is an information-theoretic appearance model for observation weighting. The inherently distributed nature of the approach allows the simultaneous utilization of all sensors as processing agents in collectively recovering the network topology

    Analyzing Structured Scenarios by Tracking People and Their Limbs

    Get PDF
    The analysis of human activities is a fundamental problem in computer vision. Though complex, interactions between people and their environment often exhibit a spatio-temporal structure that can be exploited during analysis. This structure can be leveraged to mitigate the effects of missing or noisy visual observations caused, for example, by sensor noise, inaccurate models, or occlusion. Trajectories of people and their hands and feet, often sufficient for recognition of human activities, lead to a natural qualitative spatio-temporal description of these interactions. This work introduces the following contributions to the task of human activity understanding: 1) a framework that efficiently detects and tracks multiple interacting people and their limbs, 2) an event recognition approach that integrates both logical and probabilistic reasoning in analyzing the spatio-temporal structure of multi-agent scenarios, and 3) an effective computational model of the visibility constraints imposed on humans as they navigate through their environment. The tracking framework mixes probabilistic models with deterministic constraints and uses AND/OR search and lazy evaluation to efficiently obtain the globally optimal solution in each frame. Our high-level reasoning framework efficiently and robustly interprets noisy visual observations to deduce the events comprising structured scenarios. This is accomplished by combining First-Order Logic, Allen's Interval Logic, and Markov Logic Networks with an event hypothesis generation process that reduces the size of the ground Markov network. When applied to outdoor one-on-one basketball videos, our framework tracks the players and, guided by the game rules, analyzes their interactions with each other and the ball, annotating the videos with the relevant basketball events that occurred. Finally, motivated by studies of spatial behavior, we use a set of features from visibility analysis to represent spatial context in the interpretation of human spatial activities. We demonstrate the effectiveness of our representation on trajectories generated by humans in a virtual environment
    • …
    corecore