Search CORE

8 research outputs found

DETECTION OF PERSONS AND HEIGHT ESTIMATION IN VIDEO SEQUENCE

Author: František Jakab*
Miroslav Michalko*
Ondrej Kainz*
Peter Feciľak*
Publication venue
Publication date
Field of study

The principal goal of this paper is the design and subsequent development of a solution for visual monitoring of specific area. Monitoring includes detection of movement and detection of person in the video sequence. Further additional information is to be extracted, i.e. the number of persons in the area and the height of subjects. Authors of paper propose own solution based on prior comparative analysis of current works and design mobile solution, where the development board handles all the data processing. Intel Galileo development board was selected. Implementation and subsequent testing proves the hardware and software solution to be fully functional

ZENODO

Segmentation-Based Bounding Box Generation for Omnidirectional Pedestrian Detection

Author: Tamura Masato
Yoshinaga Tomoaki
Publication venue
Publication date: 12/10/2021
Field of study

We propose a segmentation-based bounding box generation method for omnidirectional pedestrian detection that enables detectors to tightly fit bounding boxes to pedestrians without omnidirectional images for training. Due to the wide angle of view, omnidirectional cameras are more cost-effective than standard cameras and hence suitable for large-scale monitoring. The problem of using omnidirectional cameras for pedestrian detection is that the performance of standard pedestrian detectors is likely to be substantially degraded because pedestrians' appearance in omnidirectional images may be rotated to any angle. Existing methods mitigate this issue by transforming images during inference. However, the transformation substantially degrades the detection accuracy and speed. A recently proposed method obviates the transformation by training detectors with omnidirectional images, which instead incurs huge annotation costs. To obviate both the transformation and annotation works, we leverage an existing large-scale object detection dataset. We train a detector with rotated images and tightly fitted bounding box annotations generated from the segmentation annotations in the dataset, resulting in detecting pedestrians in omnidirectional images with tightly fitted bounding boxes. We also develop pseudo-fisheye distortion augmentation, which further enhances the performance. Extensive analysis shows that our detector successfully fits bounding boxes to pedestrians and demonstrates substantial performance improvement.Comment: Pre-print submitted to Journal of Multimedia Tools and Application

arXiv.org e-Print Archive

People counting using an overhead fisheye camera

Author: Li Shengye
Publication venue
Publication date: 04/06/2019
Field of study

As climate change concerns grow, the reduction of energy consumption is seen as one of many potential solutions. In the US, a considerable amount of energy is wasted in commercial buildings due to sub-optimal heating, ventilation and air conditioning that operate with no knowledge of the occupancy level in various rooms and open areas. In this thesis, I develop an approach to passive occupancy estimation that does not require occupants to carry any type of beacon, but instead uses an overhead camera with fisheye lens (360 by 180 degree field of view). The difficulty with fisheye images is that occupants may appear not only in the upright position, but also upside-down, horizontally and diagonally, and thus algorithms developed for typical side-mounted, standard-lens cameras tend to fail. As the top-performing people detection algorithms today use deep learning, a logical step would be to develop and train a new neural-network model. However, there exist no large fisheye-image datasets with person annotations to facilitate training a new model. Therefore, I developed two people-counting methods that leverage YOLO (version 3), a state-of-the-art object detection method trained on standard datasets. In one approach, YOLO is applied to 24 rotated and highly-overlapping windows, and the results are post-processed to produce a people count. In the other approach, regions of interest are first extracted via background subtraction and only windows that include such regions are supplied to YOLO and post-processed. I carried out extensive experimental evaluation of both algorithms and showed their superior performance compared to a benchmark method

Boston University Institutional Repository (OpenBU)

Human detection in fish-eye images using HOG-based detectors over rotated windows

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Deep learning algorithms for background subtraction and people detection

Author: Tezcan M. Ozan
Publication venue
Publication date: 27/09/2021
Field of study

Video cameras are commonly used today in surveillance and security, autonomous driving and flying, manufacturing and healthcare. While different applications seek different types of information from the video streams, detecting changes and finding people are two key enablers for many of them. This dissertation focuses on both of these tasks: change detection, also known as background subtraction, and people detection from overhead fisheye cameras, an emerging research topic. Background subtraction has been thoroughly researched to date and the top-performing algorithms are data-driven and supervised. Crucially, during training these algorithms rely on the availability of some annotated frames from the video being tested. Instead, we propose a novel, supervised background-subtraction algorithm for unseen videos based on a fully-convolutional neural network. The input to our network consists of the current frame and two background frames captured at different time scales along with their semantic segmentation maps. In order to reduce the chance of overfitting, we introduce novel temporal and spatio-temporal data-augmentation methods. We also propose a cross-validation training/evaluation strategy for the largest change-detection dataset, CDNet-2014, that allows a fair and video-agnostic performance comparison of supervised algorithms. Overall, our algorithm achieves significant performance gains over state of the art in terms of F-measure, recall and precision. Furthermore, we develop a real-time variant of our algorithm with performance close to that of the state of the art. Owing to their large field of view, fisheye cameras mounted overhead are becoming a surveillance modality of choice for large indoor spaces. However, due to their top-down viewpoint and unique optics, standing people appear radially oriented and radially distorted in fisheye images. Therefore, traditional people detection, tracking and recognition algorithms developed for standard cameras do not perform well on fisheye images. To address this, we introduce several novel people-detection algorithms for overhead fisheye cameras. Our first two algorithms address the issue of radial body orientation by applying a rotating-window approach. This approach leverages a state-of-the-art object-detection algorithm trained on standard images and applies additional pre- and post-processing to detect radially-oriented people. Our third algorithm addresses both the radial body orientation and distortion by applying an end-to-end neural network with a novel angle-aware loss function and training on fisheye images. This algorithm outperforms the first two approaches and is two orders of magnitude faster. Finally, we introduce three spatio-temporal extensions of the end-to-end approach to deal with intermittent misses and false detections. In order to evaluate the performance of our algorithms, we collected, annotated and made publicly available four datasets composed of overhead fisheye videos. We provide a detailed analysis of our algorithms on these datasets and show that they significantly outperform the current state of the art

Boston University Institutional Repository (OpenBU)

Excavator Pose Estimation for Safety Monitoring by Fusing Computer Vision and RTLS Data

Author: Soltani Mohammad Mostafa
Publication venue
Publication date: 23/10/2017
Field of study

The construction industry is considered as a hazardous industry because of its high number of accidents and fatality rates. Safety is one of the main requirements on construction sites since an insecure site drops the morale of the workers, which can also result in lower productivity. To address safety issues, many proactive methods have been introduced by researchers and equipment manufacturers. Studying these methods shows that most of them are using radio-based technologies that perform based on the locations of the attached sensors to the moving objects, which could be expensive and impractical for the large fleet of available construction equipment. Safety monitoring is a sensitive task and avoiding collisions requires a detailed information of the articulated equipment (e.g. excavators) and the motion of each part of that equipment. Therefore, it is necessary to install the location sensors on each moving part of the equipment for estimating its pose, which is a difficult, time consuming, and expensive task. On the other hand, the application of Computer Vision (CV) techniques is growing and becoming more practical and affordable. However, most of the available CV-based techniques evaluate the proximity of the resources by considering each object as a single point regardless of its shape and pose. Moreover, the process of manually collecting and annotating a large image dataset of different pieces of equipment is one of the most time consuming tasks. Furthermore, relying on a single source of data may not only decrease the accuracy of the pose estimation system because of missing data or calculation errors, but it may also increase the computation time. Moreover, when there are multiple objects and equipment in the field of view of each camera, CV-based algorithms are under a higher risk of false recognition of the equipment and their parts. Therefore, fusing the cameras’ data with data from Real-Time Location System (RTLS) can help the pose estimation system by limiting the search area for the parts’ detectors, and consequently reducing the processing time and improving the accuracy by reducing the false detections. This research aims to estimate the excavator pose by fusing CV and RTLS data for safety monitoring and has the following objectives: (1) improving the CV training by developing a method to automatically generate and annotate around-view synthetic images of equipment and their parts using the 3D model of the equipment and the real images of the construction sites as background; (2) developing a guideline for applying stereo vision system in construction sites using regular surveillance cameras with long baseline at a high level; (3) improving the accuracy and speed of CV detection by fusing RTLS data with cameras’ data; and (4) estimating the 3D pose of the equipment for detecting potential collisions based on a pair of Two Dimensional (2D) skeletons of the parts from the views of two cameras. To support these objectives, a comprehensive database of the synthetic images of the excavator and its parts are generated, and multiple detectors from multiple views are trained for each part of the excavator using the image database. Moreover, the RTLS data, providing the location of the equipment, are linked with the corresponding video frames from two cameras to fuse the location data with the video data. Knowing the overall size of the equipment and its location provided by the RTLS system, a virtual cylinder defined around the equipment is projected on the video frames to limit the search scope of the object detection algorithm within the projected cylinder, resulting in a faster processing time and higher detection accuracy. Additionally, knowing the equipment ID assigned to each RTLS device and the cameras’ locations and heights, it is possible to select the suitable detectors for each equipment. After detecting a part, the background of the detected bounding box are removed to estimate the location and orientation of each part. The final skeleton of the excavator is derived by connecting the start and end points of the parts to their adjacent parts knowing the kinematic information of the excavator. Estimating the skeleton of the excavator from each camera view on one hand, and knowing the extrinsic and intrinsic parameters of all available cameras on the construction site, on the other hand, are used for estimating the 3D pose by triangulating the estimated skeleton from each camera. In order to use the available collision avoidance systems, the 3D pose of the excavator is sent to the game environment and the potential collisions are detected followed by generating a warning. The contributions of this research are: (1) developing a method for creating and annotating the synthetic images of the construction equipment and their parts using the equipment 3D models and the real images of the construction sites; (2) creating and training the HOG-based excavator’s parts detectors using the database of the synthetic images developed earlier and automatically produced negative samples from the other excavator parts in addition to the real images of different construction sites while the target object is cut from these; (3) developing a data fusion framework after calibrating two regular surveillance cameras with the long baseline to integrate the RTLS data received from GPS with the video data from the cameras to decrease the processing efforts for detecting excavator parts while increasing the detection accuracy by limiting the search scope for the detectors; (4) developing a clustering technique to subtract parts’ background and extracting the 2D skeleton of the excavator in each camera’s view and to estimate the 3D pose of the excavator; and (5) transferring the 3D pose data of the excavator to the game environment using TCP/IP connection and visualizing the near real-time pose of the excavator in the game engine for detecting the potential collisions

Concordia University Research Repository