10 research outputs found

    Person re-identification in multi-camera system by signature based on interest point descriptors collected on short video sequences

    No full text
    International audienceWe present and evaluate a person re-identification scheme for multi-camera surveillance system. Our approach uses matching of signatures based on interest-points descriptors collected on short video sequences. One of the originalities of our method is to accumulate interest points on several sufficiently time-spaced images during person tracking within each camera, in order to capture appearance variability. A first experimental evaluation conducted on a publicly available set of low-resolution videos in a commercial mall shows very promising inter-camera person re-identification performances (a precision of 82% for a recall of 78%). It should also be noted that our matching method is very fast: ~ 1/8s for re-identification of one target person among 10 previously seen persons, and a logarithmic dependence with the number of stored person models, making reidentification among hundreds of persons computationally feasible in less than ~ 1/5 second

    Interest points harvesting in video sequences for efficient person identification

    No full text
    International audienceWe propose and evaluate a new approach for identification of persons, based on harvesting of interest point descriptors in video sequences. By accumulating interest points on several sufficiently time-spaced images during person silhouette or face tracking within each camera, the collected interest points capture appearance variability. Our method can in particular be applied to global person re-identification in a network of cameras. We present a first experimental evaluation conducted on a publicly available set of videos in a commercial mall, with very promising inter-camera pedestrian reidentification performances (a precision of 82% for a recall of 78%). Our matching method is very fast: ~ 1/8s for re-identification of one target person among 10 previously seen persons, and a logarithmic dependence with the number of stored person models, making re-identification among hundreds of persons computationally feasible in less than ~ 1/5s second. Finally, we also present a first feasibility test for on-the-fly face recognition, with encouraging results

    PEOPLE REIDENTIFICATION IN A DISTRIBUTED CAMERA NETWORK

    Get PDF
    This paper presents an approach to the people reidentification problem in a distributed camera network system. The reidentification or reacquisition problem consists essentially on the matching process of images acquired from different cameras. This work is applied in a monitored environment by cameras. This application is important to modern security systems, in which the targets presence identication in the environment expands the capacity of action by security agents in real time and provides important parameters like localization for each target. We used the target\u27s color and target\u27s interest points with features for reidentication. The satisfactory results were obtained from real experiments in public video datasets and synthetic images with noise

    Detecting, Tracking, And Recognizing Activities In Aerial Video

    Get PDF
    In this dissertation, we address the problem of detecting humans and vehicles, tracking them in crowded scenes, and finally determining their activities in aerial video. Even though this is a well explored problem in the field of computer vision, many challenges still remain when one is presented with realistic data. These challenges include large camera motion, strong scene parallax, fast object motion, large object density, strong shadows, and insufficiently large action datasets. Therefore, we propose a number of novel methods based on exploiting scene constraints from the imagery itself to aid in the detection and tracking of objects. We show, via experiments on several datasets, that superior performance is achieved with the use of proposed constraints. First, we tackle the problem of detecting moving, as well as stationary, objects in scenes that contain parallax and shadows. We do this on both regular aerial video, as well as the new and challenging domain of wide area surveillance. This problem poses several challenges: large camera motion, strong parallax, large number of moving objects, small number of pixels on target, single channel data, and low frame-rate of video. We propose a method for detecting moving and stationary objects that overcomes these challenges, and evaluate it on CLIF and VIVID datasets. In order to find moving objects, we use median background modelling which requires few frames to obtain a workable model, and is very robust when there is a large number of moving objects in the scene while the model is being constructed. We then iii remove false detections from parallax and registration errors using gradient information from the background image. Relying merely on motion to detect objects in aerial video may not be sufficient to provide complete information about the observed scene. First of all, objects that are permanently stationary may be of interest as well, for example to determine how long a particular vehicle has been parked at a certain location. Secondly, moving vehicles that are being tracked through the scene may sometimes stop and remain stationary at traffic lights and railroad crossings. These prolonged periods of non-motion make it very difficult for the tracker to maintain the identities of the vehicles. Therefore, there is a clear need for a method that can detect stationary pedestrians and vehicles in UAV imagery. This is a challenging problem due to small number of pixels on the target, which makes it difficult to distinguish objects from background clutter, and results in a much larger search space. We propose a method for constraining the search based on a number of geometric constraints obtained from the metadata. Specifically, we obtain the orientation of the ground plane normal, the orientation of the shadows cast by out of plane objects in the scene, and the relationship between object heights and the size of their corresponding shadows. We utilize the above information in a geometry-based shadow and ground plane normal blob detector, which provides an initial estimation for the locations of shadow casting out of plane (SCOOP) objects in the scene. These SCOOP candidate locations are then classified as either human or clutter using a combination of wavelet features, and a Support Vector Machine. Additionally, we combine regular SCOOP and inverted SCOOP candidates to obtain vehicle candidates. We show impressive results on sequences from VIVID and CLIF datasets, and provide comparative quantitative and qualitative analysis. We also show that we can extend the SCOOP detection method to automatically estimate the iv orientation of the shadow in the image without relying on metadata. This is useful in cases where metadata is either unavailable or erroneous. Simply detecting objects in every frame does not provide sufficient understanding of the nature of their existence in the scene. It may be necessary to know how the objects have travelled through the scene over time and which areas they have visited. Hence, there is a need to maintain the identities of the objects across different time instances. The task of object tracking can be very challenging in videos that have low frame rate, high density, and a very large number of objects, as is the case in the WAAS data. Therefore, we propose a novel method for tracking a large number of densely moving objects in an aerial video. In order to keep the complexity of the tracking problem manageable when dealing with a large number of objects, we divide the scene into grid cells, solve the tracking problem optimally within each cell using bipartite graph matching and then link the tracks across the cells. Besides tractability, grid cells also allow us to define a set of local scene constraints, such as road orientation and object context. We use these constraints as part of cost function to solve the tracking problem; This allows us to track fast-moving objects in low frame rate videos. In addition to moving through the scene, the humans that are present may be performing individual actions that should be detected and recognized by the system. A number of different approaches exist for action recognition in both aerial and ground level video. One of the requirements for the majority of these approaches is the existence of a sizeable dataset of examples of a particular action from which a model of the action can be constructed. Such a luxury is not always possible in aerial scenarios since it may be difficult to fly a large number of missions to observe a particular event multiple times. Therefore, we propose a method for v recognizing human actions in aerial video from as few examples as possible (a single example in the extreme case). We use the bag of words action representation and a 1vsAll multi-class classification framework. We assume that most of the classes have many examples, and construct Support Vector Machine models for each class. Then, we use Support Vector Machines that were trained for classes with many examples to improve the decision function of the Support Vector Machine that was trained using few examples, via late weighted fusion of decision values

    Object Reacquisition and Tracking in Large-Scale Smart Camera Networks

    No full text

    A Scalable Edge-Centric System Design for Camera Networks to aid Situation Awareness Applications

    Get PDF
    The ubiquity of cameras in our environment coupled with advances in computer vision and machine learning has enabled several novel applications combining sensing, processing, and actuation. Often referred to as situation awareness applications, they span a variety of domains including safety (e.g., surveillance), retail (e.g., drone delivery), and transportation (e.g., assisted/autonomous driving). There is a perfect storm of technology enablers that have come together making it a ripe time for realizing a smart camera system at the edge of the network to aid situation awareness applications. There are two types of smart camera systems, live processing at ingestion time and post-mortem video analysis. Live processing features a more timely response when the queries are known ahead of time. At the same time, post-mortem analysis fits the exploratory analysis where the queries (or the parameters of queries) are not known in advance. Various situation awareness applications can benefit from either type of the smart camera system or even both. There is prior art which are mostly standalone techniques to facilitate camera processing. For example, efficient live camera processing frameworks feature the partition of the video analysis tasks and the placement of these tasks across Edge and Cloud. Databases for building efficient query processing systems on archived videos feature modern techniques (e.g., filters) for accelerating video analytics. This dissertation research has been looking into both types of smart camera systems (i.e., live processing at ingestion time and postmortem exploratory video analysis) for various situation awareness applications. Precisely, this dissertation seeks to fill the void left by prior art by asking these questions: 1. What are the necessary system components for a geo-distributed camera system and how best to architect them for scalability? 2. Given the limited resource capacity of the edge, how best to orchestrate the resources for live camera processing at video ingestion time? 3. How best to leverage traditional database management optimization techniques for post-mortem video analysis? To aid various situation awareness applications, this dissertation proposes a “Scalable-by-Design” approach to designing edge-centric systems for camera networks, efficient resource orchestration for live camera processing at ingestion time, and a postmortem video engine featuring reuse for exploratory video analytics in a scalable edge-centric system for camera networks.Ph.D
    corecore