9,503 research outputs found

    Autonomous real-time surveillance system with distributed IP cameras

    Get PDF
    An autonomous Internet Protocol (IP) camera based object tracking and behaviour identification system, capable of running in real-time on an embedded system with limited memory and processing power is presented in this paper. The main contribution of this work is the integration of processor intensive image processing algorithms on an embedded platform capable of running at real-time for monitoring the behaviour of pedestrians. The Algorithm Based Object Recognition and Tracking (ABORAT) system architecture presented here was developed on an Intel PXA270-based development board clocked at 520 MHz. The platform was connected to a commercial stationary IP-based camera in a remote monitoring station for intelligent image processing. The system is capable of detecting moving objects and their shadows in a complex environment with varying lighting intensity and moving foliage. Objects moving close to each other are also detected to extract their trajectories which are then fed into an unsupervised neural network for autonomous classification. The novel intelligent video system presented is also capable of performing simple analytic functions such as tracking and generating alerts when objects enter/leave regions or cross tripwires superimposed on live video by the operator

    Salient object subitizing

    Full text link
    We study the problem of salient object subitizing, i.e. predicting the existence and the number of salient objects in an image using holistic cues. This task is inspired by the ability of people to quickly and accurately identify the number of items within the subitizing range (1–4). To this end, we present a salient object subitizing image dataset of about 14 K everyday images which are annotated using an online crowdsourcing marketplace. We show that using an end-to-end trained convolutional neural network (CNN) model, we achieve prediction accuracy comparable to human performance in identifying images with zero or one salient object. For images with multiple salient objects, our model also provides significantly better than chance performance without requiring any localization process. Moreover, we propose a method to improve the training of the CNN subitizing model by leveraging synthetic images. In experiments, we demonstrate the accuracy and generalizability of our CNN subitizing model and its applications in salient object detection and image retrieval.This research was supported in part by US NSF Grants 0910908 and 1029430, and gifts from Adobe and NVIDIA. (0910908 - US NSF; 1029430 - US NSF)https://arxiv.org/abs/1607.07525https://arxiv.org/pdf/1607.07525.pdfAccepted manuscrip

    Convolutional Neural Networks for Counting Fish in Fisheries Surveillance Video

    Get PDF
    We present a computer vision tool that analyses video from a CCTV system installed on fishing trawlers to monitor discarded fish catch. The system aims to support expert observers who review the footage and verify numbers, species and sizes of discarded fish. The operational environment presents a significant challenge for these tasks. Fish are processed below deck under fluorescent lights, they are randomly oriented and there are multiple occlusions. The scene is unstructured and complicated by the presence of fishermen processing the catch. We describe an approach to segmenting the scene and counting fish that exploits the N4N^4-Fields algorithm. We performed extensive tests of the algorithm on a data set comprising 443 frames from 6 belts. Results indicate the relative count error (for individual fish) ranges from 2\% to 16\%. We believe this is the first system that is able to handle footage from operational trawlers

    Understanding Traffic Density from Large-Scale Web Camera Data

    Full text link
    Understanding traffic density from large-scale web camera (webcam) videos is a challenging problem because such videos have low spatial and temporal resolution, high occlusion and large perspective. To deeply understand traffic density, we explore both deep learning based and optimization based methods. To avoid individual vehicle detection and tracking, both methods map the image into vehicle density map, one based on rank constrained regression and the other one based on fully convolution networks (FCN). The regression based method learns different weights for different blocks in the image to increase freedom degrees of weights and embed perspective information. The FCN based method jointly estimates vehicle density map and vehicle count with a residual learning framework to perform end-to-end dense prediction, allowing arbitrary image resolution, and adapting to different vehicle scales and perspectives. We analyze and compare both methods, and get insights from optimization based method to improve deep model. Since existing datasets do not cover all the challenges in our work, we collected and labelled a large-scale traffic video dataset, containing 60 million frames from 212 webcams. Both methods are extensively evaluated and compared on different counting tasks and datasets. FCN based method significantly reduces the mean absolute error from 10.99 to 5.31 on the public dataset TRANCOS compared with the state-of-the-art baseline.Comment: Accepted by CVPR 2017. Preprint version was uploaded on http://welcome.isr.tecnico.ulisboa.pt/publications/understanding-traffic-density-from-large-scale-web-camera-data

    Real-time crowd density mapping using a novel sensory fusion model of infrared and visual systems

    Get PDF
    Crowd dynamic management research has seen significant attention in recent years in research and industry in an attempt to improve safety level and management of large scale events and in large public places such as stadiums, theatres, railway stations, subways and other places where high flow of people at high densities is expected. Failure to detect the crowd behaviour at the right time could lead to unnecessary injuries and fatalities. Over the past decades there have been many incidents of crowd which caused major injuries and fatalities and lead to physical damages. Examples of crowd disasters occurred in past decades include the tragedy of Hillsborough football stadium at Sheffield where at least 93 football supporters have been killed and 400 injured in 1989 in Britain's worst-ever sporting disaster (BBC, 1989). Recently in Cambodia a pedestrians stampede during the Water Festival celebration resulted in 345 deaths and 400 injuries (BBC, 2010) and in 2011 at least 16 people were killed and 50 others were injured in a stampede in the northern Indian town of Haridwar (BBC, 2011). Such disasters could be avoided or losses reduced by using different technologies. Crowd simulation models have been found effective in the prediction of potential crowd hazards in critical situations and thus help in reducing fatalities. However, there is a need to combine the advancement in simulation with real time crowd characterisation such as the estimation of real time density in order to provide accurate prognosis in crowd behaviour and enhance crowd management and safety, particularly in mega event such as the Hajj. This paper addresses the use of novel sensory technology in order to estimate people’s dynamic density du ring one of the Hajj activities. The ultimate goal is that real time accurate estimation of density in different areas within the crowd could help to improve the decision making process and provide more accurate prediction of the crowd dynamics. This paper investigates the use of infrared and visual cameras supported by auxiliary sensors and artificial intelligence to evaluate the accuracy in estimating crowd density in an open space during Muslims Pilgrimage to Makkah (Mecca)
    corecore