24,099 research outputs found
Available seat counting in public rail transport
Surveillance cameras are found almost everywhere today, including vehicles for public transport. A lot of research has already been done on video analysis in open spaces. However, the conditions in a vehicle for public transport differ from these in open spaces, as described in detail in this paper. A use case described in this paper is on counting the available seats in a vehicle using surveillance cameras. We propose an algorithm based on Laplace edge detection, combined with background subtraction
FCN-rLSTM: Deep Spatio-Temporal Neural Networks for Vehicle Counting in City Cameras
In this paper, we develop deep spatio-temporal neural networks to
sequentially count vehicles from low quality videos captured by city cameras
(citycams). Citycam videos have low resolution, low frame rate, high occlusion
and large perspective, making most existing methods lose their efficacy. To
overcome limitations of existing methods and incorporate the temporal
information of traffic video, we design a novel FCN-rLSTM network to jointly
estimate vehicle density and vehicle count by connecting fully convolutional
neural networks (FCN) with long short term memory networks (LSTM) in a residual
learning fashion. Such design leverages the strengths of FCN for pixel-level
prediction and the strengths of LSTM for learning complex temporal dynamics.
The residual learning connection reformulates the vehicle count regression as
learning residual functions with reference to the sum of densities in each
frame, which significantly accelerates the training of networks. To preserve
feature map resolution, we propose a Hyper-Atrous combination to integrate
atrous convolution in FCN and combine feature maps of different convolution
layers. FCN-rLSTM enables refined feature representation and a novel end-to-end
trainable mapping from pixels to vehicle count. We extensively evaluated the
proposed method on different counting tasks with three datasets, with
experimental results demonstrating their effectiveness and robustness. In
particular, FCN-rLSTM reduces the mean absolute error (MAE) from 5.31 to 4.21
on TRANCOS, and reduces the MAE from 2.74 to 1.53 on WebCamT. Training process
is accelerated by 5 times on average.Comment: Accepted by International Conference on Computer Vision (ICCV), 201
Leveraging Traffic and Surveillance Video Cameras for Urban Traffic
The objective of this project was to investigate the use of existing video resources, such as traffic cameras, police cameras, red light cameras, and security cameras for the long-term, real-time collection of traffic statistics. An additional objective was to gather similar statistics for pedestrians and bicyclists. Throughout the course of the project, we investigated several methods for tracking vehicles under challenging conditions. The initial plan called for tracking based on optical flow. However, it was found that current optical flow–estimating algorithms are not well suited to low-quality video—hence, developing optical flow methods for low-quality video has been one aspect of this project. The method eventually used combines basic optical flow tracking with a learning detector for each tracked object—that is, the object is tracked both by its apparent movement and by its appearance should it temporarily disappear from or be obscured in the frame. We have produced a prototype software that allows the user to specify the vehicle trajectories of interest by drawing their shapes superimposed on a video frame. The software then tracks each vehicle as it travels through the frame, matches the vehicle’s movements to the most closely matching trajectory, and increases the vehicle count for that trajectory. In terms of pedestrian and bicycle counting, the system is capable of tracking these “objects” as well, though at present it is not capable of distinguishing between the three classes automatically. Continuing research by the principal investigator under a different grant will establish this capability as well.Illinois Department of Transportation, R27-131Ope
Understanding Traffic Density from Large-Scale Web Camera Data
Understanding traffic density from large-scale web camera (webcam) videos is
a challenging problem because such videos have low spatial and temporal
resolution, high occlusion and large perspective. To deeply understand traffic
density, we explore both deep learning based and optimization based methods. To
avoid individual vehicle detection and tracking, both methods map the image
into vehicle density map, one based on rank constrained regression and the
other one based on fully convolution networks (FCN). The regression based
method learns different weights for different blocks in the image to increase
freedom degrees of weights and embed perspective information. The FCN based
method jointly estimates vehicle density map and vehicle count with a residual
learning framework to perform end-to-end dense prediction, allowing arbitrary
image resolution, and adapting to different vehicle scales and perspectives. We
analyze and compare both methods, and get insights from optimization based
method to improve deep model. Since existing datasets do not cover all the
challenges in our work, we collected and labelled a large-scale traffic video
dataset, containing 60 million frames from 212 webcams. Both methods are
extensively evaluated and compared on different counting tasks and datasets.
FCN based method significantly reduces the mean absolute error from 10.99 to
5.31 on the public dataset TRANCOS compared with the state-of-the-art baseline.Comment: Accepted by CVPR 2017. Preprint version was uploaded on
http://welcome.isr.tecnico.ulisboa.pt/publications/understanding-traffic-density-from-large-scale-web-camera-data
Measuring traffic flow and lane changing from semi-automatic video processing
Comprehensive databases are needed in order to extend our knowledge on the behavior of vehicular traffic. Nevertheless data coming from common traffic detectors is incomplete. Detectors only provide vehicle count, detector occupancy and speed at discrete locations. To enrich these databases additional measurements from other data sources, like video recordings, are used. Extracting data from videos by actually watching the entire length of the recordings and manually counting is extremely time-consuming. The alternative is to set up an automatic video detection system. This is also costly in terms of money and time, and generally does not pay off for sporadic usage on a pilot test. An adaptation of the semi-automatic video processing methodology proposed by Patire (2010) is presented here. It makes possible to count flow and lane changes 90% faster than
actually counting them by looking at the video. The method consists in selecting some specific lined pixels in the video, and converting them into a set of space – time images. The manual time is only spent in counting from these images. The method is adaptive, in the sense that the counting is always done at the maximum speed, not constrained by the video playback speed. This allows going faster when there are a few counts and slower when a lot of counts happen. This methodology has been used for measuring off-ramp flows and lane changing at several locations in the B-23 freeway (Soriguera & Sala, 2014). Results show that, as long as the video recordings fulfill some minimum requirements in framing and quality, the method is easy to use, fast and reliable. This method is intended for research purposes,
when some hours of video recording have to be analyzed, not for long term use in a Traffic Management Center.Postprint (published version
Detecting animals in African Savanna with UAVs and the crowds
Unmanned aerial vehicles (UAVs) offer new opportunities for wildlife
monitoring, with several advantages over traditional field-based methods. They
have readily been used to count birds, marine mammals and large herbivores in
different environments, tasks which are routinely performed through manual
counting in large collections of images. In this paper, we propose a
semi-automatic system able to detect large mammals in semi-arid Savanna. It
relies on an animal-detection system based on machine learning, trained with
crowd-sourced annotations provided by volunteers who manually interpreted
sub-decimeter resolution color images. The system achieves a high recall rate
and a human operator can then eliminate false detections with limited effort.
Our system provides good perspectives for the development of data-driven
management practices in wildlife conservation. It shows that the detection of
large mammals in semi-arid Savanna can be approached by processing data
provided by standard RGB cameras mounted on affordable fixed wings UAVs
The Caltech Fish Counting Dataset: A Benchmark for Multiple-Object Tracking and Counting
We present the Caltech Fish Counting Dataset (CFC), a large-scale dataset for
detecting, tracking, and counting fish in sonar videos. We identify sonar
videos as a rich source of data for advancing low signal-to-noise computer
vision applications and tackling domain generalization in multiple-object
tracking (MOT) and counting. In comparison to existing MOT and counting
datasets, which are largely restricted to videos of people and vehicles in
cities, CFC is sourced from a natural-world domain where targets are not easily
resolvable and appearance features cannot be easily leveraged for target
re-identification. With over half a million annotations in over 1,500 videos
sourced from seven different sonar cameras, CFC allows researchers to train MOT
and counting algorithms and evaluate generalization performance at unseen test
locations. We perform extensive baseline experiments and identify key
challenges and opportunities for advancing the state of the art in
generalization in MOT and counting.Comment: ECCV 2022. 33 pages, 12 figure
- …