40,581 research outputs found
The Caltech Fish Counting Dataset: A Benchmark for Multiple-Object Tracking and Counting
We present the Caltech Fish Counting Dataset (CFC), a large-scale dataset for
detecting, tracking, and counting fish in sonar videos. We identify sonar
videos as a rich source of data for advancing low signal-to-noise computer
vision applications and tackling domain generalization in multiple-object
tracking (MOT) and counting. In comparison to existing MOT and counting
datasets, which are largely restricted to videos of people and vehicles in
cities, CFC is sourced from a natural-world domain where targets are not easily
resolvable and appearance features cannot be easily leveraged for target
re-identification. With over half a million annotations in over 1,500 videos
sourced from seven different sonar cameras, CFC allows researchers to train MOT
and counting algorithms and evaluate generalization performance at unseen test
locations. We perform extensive baseline experiments and identify key
challenges and opportunities for advancing the state of the art in
generalization in MOT and counting.Comment: ECCV 2022. 33 pages, 12 figure
Available seat counting in public rail transport
Surveillance cameras are found almost everywhere today, including vehicles for public transport. A lot of research has already been done on video analysis in open spaces. However, the conditions in a vehicle for public transport differ from these in open spaces, as described in detail in this paper. A use case described in this paper is on counting the available seats in a vehicle using surveillance cameras. We propose an algorithm based on Laplace edge detection, combined with background subtraction
FCN-rLSTM: Deep Spatio-Temporal Neural Networks for Vehicle Counting in City Cameras
In this paper, we develop deep spatio-temporal neural networks to
sequentially count vehicles from low quality videos captured by city cameras
(citycams). Citycam videos have low resolution, low frame rate, high occlusion
and large perspective, making most existing methods lose their efficacy. To
overcome limitations of existing methods and incorporate the temporal
information of traffic video, we design a novel FCN-rLSTM network to jointly
estimate vehicle density and vehicle count by connecting fully convolutional
neural networks (FCN) with long short term memory networks (LSTM) in a residual
learning fashion. Such design leverages the strengths of FCN for pixel-level
prediction and the strengths of LSTM for learning complex temporal dynamics.
The residual learning connection reformulates the vehicle count regression as
learning residual functions with reference to the sum of densities in each
frame, which significantly accelerates the training of networks. To preserve
feature map resolution, we propose a Hyper-Atrous combination to integrate
atrous convolution in FCN and combine feature maps of different convolution
layers. FCN-rLSTM enables refined feature representation and a novel end-to-end
trainable mapping from pixels to vehicle count. We extensively evaluated the
proposed method on different counting tasks with three datasets, with
experimental results demonstrating their effectiveness and robustness. In
particular, FCN-rLSTM reduces the mean absolute error (MAE) from 5.31 to 4.21
on TRANCOS, and reduces the MAE from 2.74 to 1.53 on WebCamT. Training process
is accelerated by 5 times on average.Comment: Accepted by International Conference on Computer Vision (ICCV), 201
Pedestrian Counting Based on Piezoelectric Vibration Sensor
Pedestrian counting has attracted much interest of the academic and industry communities for its widespread application in many real-world scenarios. While many recent studies have focused on computer vision-based solutions for the problem, the deployment of cameras brings up concerns about privacy invasion. This paper proposes a novel indoor pedestrian counting approach, based on footstep-induced structural vibration signals with piezoelectric sensors. The approach is privacy-protecting because no audio or video data is acquired. Our approach analyzes the space-differential features from the vibration signals caused by pedestrian footsteps and outputs the number of pedestrians. The proposed approach supports multiple pedestrians walking together with signal mixture. Moreover, it makes no requirement about the number of groups of walking people in the detection area. The experimental results show that the averaged F1-score of our approach is over 0.98, which is better than the vibration signal-based state-of-the-art methods.Peer reviewe
Modeling people flow in buildings using edge and cloud computing
In recent years, significant progress has been made in computer vision regarding object detection and tracking which has allowed the emergence of various applications. These often focus on identifying and tracking people in different environments such as buildings.
Detecting people allows us to get a more comprehensive view of people flow as traditional IoT data from elevators cannot track individual people and their trajectories. In this thesis, we concentrate on people detection in elevator lobbies which we can use to improve the efficiency of the elevators and the convenience of the building. We compare the performance and speed of various object detection algorithms. Additionally, we research an edge device's capability to run an object detection model on multiple cameras and whether a single device can cover the target building.
We were able to train an object detection algorithm suitable for our application. This allowed accurate people detection that can be used for people counting. We found that out of the three object detection algorithms we trained, YOLOv3 was the only one capable of generalizing to unseen environments, which is essential for general purpose application. The performances of the other two models (SSD and Faster R-CNN) were poor in terms of either accuracy or speed. Based on these, we chose to deploy YOLOv3 to the edge device. We found that the edge device's inference time is linearly dependent on the number of cameras. Therefore, we can conclude that one edge device should be sufficient for our target building, allowing two cameras for each floor. We also demonstrated that the edge device allows easy addition of an object tracking layer, which is required for the solution to work in a real-life office building
- …