40,581 research outputs found

    The Caltech Fish Counting Dataset: A Benchmark for Multiple-Object Tracking and Counting

    Full text link
    We present the Caltech Fish Counting Dataset (CFC), a large-scale dataset for detecting, tracking, and counting fish in sonar videos. We identify sonar videos as a rich source of data for advancing low signal-to-noise computer vision applications and tackling domain generalization in multiple-object tracking (MOT) and counting. In comparison to existing MOT and counting datasets, which are largely restricted to videos of people and vehicles in cities, CFC is sourced from a natural-world domain where targets are not easily resolvable and appearance features cannot be easily leveraged for target re-identification. With over half a million annotations in over 1,500 videos sourced from seven different sonar cameras, CFC allows researchers to train MOT and counting algorithms and evaluate generalization performance at unseen test locations. We perform extensive baseline experiments and identify key challenges and opportunities for advancing the state of the art in generalization in MOT and counting.Comment: ECCV 2022. 33 pages, 12 figure

    Available seat counting in public rail transport

    Get PDF
    Surveillance cameras are found almost everywhere today, including vehicles for public transport. A lot of research has already been done on video analysis in open spaces. However, the conditions in a vehicle for public transport differ from these in open spaces, as described in detail in this paper. A use case described in this paper is on counting the available seats in a vehicle using surveillance cameras. We propose an algorithm based on Laplace edge detection, combined with background subtraction

    FCN-rLSTM: Deep Spatio-Temporal Neural Networks for Vehicle Counting in City Cameras

    Full text link
    In this paper, we develop deep spatio-temporal neural networks to sequentially count vehicles from low quality videos captured by city cameras (citycams). Citycam videos have low resolution, low frame rate, high occlusion and large perspective, making most existing methods lose their efficacy. To overcome limitations of existing methods and incorporate the temporal information of traffic video, we design a novel FCN-rLSTM network to jointly estimate vehicle density and vehicle count by connecting fully convolutional neural networks (FCN) with long short term memory networks (LSTM) in a residual learning fashion. Such design leverages the strengths of FCN for pixel-level prediction and the strengths of LSTM for learning complex temporal dynamics. The residual learning connection reformulates the vehicle count regression as learning residual functions with reference to the sum of densities in each frame, which significantly accelerates the training of networks. To preserve feature map resolution, we propose a Hyper-Atrous combination to integrate atrous convolution in FCN and combine feature maps of different convolution layers. FCN-rLSTM enables refined feature representation and a novel end-to-end trainable mapping from pixels to vehicle count. We extensively evaluated the proposed method on different counting tasks with three datasets, with experimental results demonstrating their effectiveness and robustness. In particular, FCN-rLSTM reduces the mean absolute error (MAE) from 5.31 to 4.21 on TRANCOS, and reduces the MAE from 2.74 to 1.53 on WebCamT. Training process is accelerated by 5 times on average.Comment: Accepted by International Conference on Computer Vision (ICCV), 201

    Pedestrian Counting Based on Piezoelectric Vibration Sensor

    Get PDF
    Pedestrian counting has attracted much interest of the academic and industry communities for its widespread application in many real-world scenarios. While many recent studies have focused on computer vision-based solutions for the problem, the deployment of cameras brings up concerns about privacy invasion. This paper proposes a novel indoor pedestrian counting approach, based on footstep-induced structural vibration signals with piezoelectric sensors. The approach is privacy-protecting because no audio or video data is acquired. Our approach analyzes the space-differential features from the vibration signals caused by pedestrian footsteps and outputs the number of pedestrians. The proposed approach supports multiple pedestrians walking together with signal mixture. Moreover, it makes no requirement about the number of groups of walking people in the detection area. The experimental results show that the averaged F1-score of our approach is over 0.98, which is better than the vibration signal-based state-of-the-art methods.Peer reviewe

    Modeling people flow in buildings using edge and cloud computing

    Get PDF
    In recent years, significant progress has been made in computer vision regarding object detection and tracking which has allowed the emergence of various applications. These often focus on identifying and tracking people in different environments such as buildings. Detecting people allows us to get a more comprehensive view of people flow as traditional IoT data from elevators cannot track individual people and their trajectories. In this thesis, we concentrate on people detection in elevator lobbies which we can use to improve the efficiency of the elevators and the convenience of the building. We compare the performance and speed of various object detection algorithms. Additionally, we research an edge device's capability to run an object detection model on multiple cameras and whether a single device can cover the target building. We were able to train an object detection algorithm suitable for our application. This allowed accurate people detection that can be used for people counting. We found that out of the three object detection algorithms we trained, YOLOv3 was the only one capable of generalizing to unseen environments, which is essential for general purpose application. The performances of the other two models (SSD and Faster R-CNN) were poor in terms of either accuracy or speed. Based on these, we chose to deploy YOLOv3 to the edge device. We found that the edge device's inference time is linearly dependent on the number of cameras. Therefore, we can conclude that one edge device should be sufficient for our target building, allowing two cameras for each floor. We also demonstrated that the edge device allows easy addition of an object tracking layer, which is required for the solution to work in a real-life office building
    corecore