2,694 research outputs found

    Multi-Person Tracking Based on Faster R-CNN and Deep Appearance Features

    Get PDF
    Mostly computer vision problems related to crowd analytics are highly dependent upon multi-object tracking (MOT) systems. There are two major steps involved in the design of MOT system: object detection and association. In the first step, desired objects are detected in every frame of video stream. Detection quality directly influences the performance of tracking. The second step involves the correspondence of detected objects in current frame with the previous to obtain their trajectories. High accuracy in object detection system results in less number of missing detection and finally produces less fragmented tracks. Better object association increases the affinity between objects in different frames. This paper presents a novel algorithm for improved object detection followed by enhanced object tracking. Object detection accuracy has been increased by employing deep learning-based Faster region convolutional neural network (Faster R-CNN) algorithm. Object association is carried out by using appearance and improved motion features. Evaluation results show that we have enhanced the performance of current state-of-the-art work by reducing identity switches and fragmentation

    Instance Segmentation with Mask R-CNN Applied to Loose-Housed Dairy Cows in a Multi-Camera Setting

    Get PDF
    With increasing herd sizes came an enhanced requirement for automated systems to support the farmers in the monitoring of the health and welfare status of their livestock. Cattle are a highly sociable species, and the herd structure has important impact on the animal welfare. As the behaviour of the animals and their social interactions can be influenced by the presence of a human observer, a camera based system that automatically detects the animals would be beneficial to analyse dairy cattle herd activity. In the present study, eight surveillance cameras were mounted above the barn area of a group of thirty-six lactating Holstein Friesian dairy cows at the Chamber of Agriculture in Futterkamp in Northern Germany. With Mask R-CNN, a state-of-the-art model of convolutional neural networks was trained to determine pixel level segmentation masks for the cows in the video material. The model was pre-trained on the Microsoft common objects in the context data set, and transfer learning was carried out on annotated image material from the recordings as training data set. In addition, the relationship between the size of the used training data set and the performance on the model after transfer learning was analysed. The trained model achieved averaged precision (Intersection over union, IOU = 0.5) 91% and 85% for the detection of bounding boxes and segmentation masks of the cows, respectively, thereby laying a solid technical basis for an automated analysis of herd activity and the use of resources in loose-housing

    A Novel Technique to Detect and Track Multiple Objects in Dynamic Video Surveillance Systems

    Get PDF
    Video surveillance is one of the important state of the art systems to be utilized in order to monitor different areas of modern society surveillance like the general public surveillance system, city traffic monitoring system, and forest monitoring system. Hence, surveillance systems have become especially relevant in the digital era. The needs of the video surveillance systems and its video analytics have become inevitable due to an increase in crimes and unethical behavior. Thus enabling the tracking of individuals object in video surveillance is an essential part of modern society. With the advent of video surveillance, performance measures for such surveillance also need to be improved to keep up with the ever increasing crime rates. So far, many methodologies relating to video surveillance have been introduced ranging from single object detection with a single or multiple cameras to multiple object detection using single or multiple cameras. Despite this, performance benchmarks and metrics need further improvements. While mechanisms exist for single or multiple object detection and prediction on videos or images, none can meet the criteria of detection and tracking of multiple objects in static as well as dynamic environments. Thus, real-world multiple object detection and prediction systems need to be introduced that are both accurate as well as fast and can also be adopted in static and dynamic environments. This paper introduces the Densely Feature selection Convolutional neural Network – Hyper Parameter tuning (DFCNHP) and it is a hybrid protocol with faster prediction time and high accuracy levels. The proposed system has successfully tracked multiple objects from multiple channels and is a combination of dense block, feature selection, background subtraction and Bayesian methods. The results of the experiment conducted demonstrated an accuracy of 98% and 1.11 prediction time and these results have also been compared with existing methods such as Kalman Filtering (KF) and Deep Neural Network (DNN)

    Distributed Dynamic Sensor Assignment of Multiple Mobile Targets

    Get PDF
    Distributed scalable algorithms are sought in many multi-robot contexts. In this work we address the dynamic optimal linear assignment problem, exemplified as a target tracking mission in which mobile robots visually track mobile targets in a one-to-one capacity. We adapt our previous work on formation achievement by means of a distributed simplex variant, which results in a conceptually simple consensus solution, asynchronous in nature and requiring only local broadcast communications. This approach seamlessly tackles dynamic changes in both costs and network topology. Improvements designed to accelerate the global convergence in the face of dynamically evolving task rewards are described and evaluated with simulations that highlight the efficiency and scalability of the proposal. Experiments with a team of three Turtlebot robots are finally shown to validate the applicability of the algorithm

    Discriminatively Trained Latent Ordinal Model for Video Classification

    Full text link
    We study the problem of video classification for facial analysis and human action recognition. We propose a novel weakly supervised learning method that models the video as a sequence of automatically mined, discriminative sub-events (eg. onset and offset phase for "smile", running and jumping for "highjump"). The proposed model is inspired by the recent works on Multiple Instance Learning and latent SVM/HCRF -- it extends such frameworks to model the ordinal aspect in the videos, approximately. We obtain consistent improvements over relevant competitive baselines on four challenging and publicly available video based facial analysis datasets for prediction of expression, clinical pain and intent in dyadic conversations and on three challenging human action datasets. We also validate the method with qualitative results and show that they largely support the intuitions behind the method.Comment: Paper accepted in IEEE TPAMI. arXiv admin note: substantial text overlap with arXiv:1604.0150
    • …
    corecore