20 research outputs found

    Fast semantic region analysis for surveillance & video databases

    No full text
    Video databases are broadly applied both in consumer and professional domains. The importance of real-time surveillance video monitoring has increased for security reasons, while also for consumers video databases are rapidly growing. Quickly searching in databases is facilitated by region analysis, as it provides an indication for the contents of the video. For real-time and cost-efficient implementations, it is important to develop algorithms with low computational complexity. In this paper, we analyze the complexity of a newly develop semantic region labeling approach [2], e.g. road, sky, etc., which aims at extracting spatial contextual information from a video. In the analyzed semantic region labeling approach, color and texture features are combined with the vertical position to label the key regions. The algorithm is analyzed by its native DSP computations and memory usage to prove its practical feasibility. The analysis results show that the system has a low complexity while offering high-accuracy region labeling. A comparison with the state-of-the-art algorithm convincingly reveals that our system outperforms the state-of-the-art with fewer computations

    ASTEROIDS: A Stixel Tracking Extrapolation-based Relevant Obstacle Impact Detection System

    Get PDF
    This paper presents a vision-based collision-warning system for ADAS in intelligent vehicles, with a focus on urban scenarios. In most current systems, collision warnings are based on radar, or on monocular vision using pattern recognition. Since detecting collisions is a core functionality of intelligent vehicles, redundancy is essential, so that we explore the use of stereo vision. First, our approach is generic and class-agnostic, since it can detect general obstacles that are on a colliding path with the ego-vehicle without relying on semantic information. The framework estimates disparity and flow from a stereo video stream and calculates stixels. Then, the second contribution is the use of the new asteroids concept as a consecutive step. This step samples particles based on a probabilistic uncertainty analysis of the measurement process to model potential collisions. Third, this is all enclosed in a Bayesian histogram filter around a newly introduced time-to-collision versus angle-of-impact state space. The evaluation shows that the system correctly avoids any false warnings on the real-world KITTI dataset, detects all collisions in a newly simulated dataset when the obstacle is higher than 0.4m, and performs excellent on our new qualitative real-world data with near-collisions, both in daytime and nighttime conditions

    Infant monitoring system for real-time and remote discomfort detection

    No full text
    Discomfort detection for young infants is essential, since they lack the ability to verbalize their pain and discomfort. In this paper, we propose a novel infant monitoring system, enabling continuous monitoring for infant discomfort detection. The proposed algorithm is robust to arbitrary head rotations, occlusions and face profiles. For this purpose, a Faster RCNN architecture is first pre-trained with the ImageNet dataset, and then fine-tuned with a training dataset of different infant expressions. Our proposed method obtains a mean average precision of 74.4% and 87.4% for classifying infant expressions. The presented system enables reflux disease analysis and remote home monitoring in a more relaxed environment, which is largely preferred by pediatricians and parents

    Double-Camera Fusion System for Animal-Position Awareness in Farming Pens

    No full text
    In livestock breeding, continuous and objective monitoring of animals is manually unfeasible due to the large scale of breeding and expensive labour. Computer vision technology can generate accurate and real-time individual animal or animal group information from video surveillance. However, the frequent occlusion between animals and changes in appearance features caused by varying lighting conditions makes single-camera systems less attractive. We propose a double-camera system and image registration algorithms to spatially fuse the information from different viewpoints to solve these issues. This paper presents a deformable learning-based registration framework, where the input image pairs are initially linearly pre-registered. Then, an unsupervised convolutional neural network is employed to fit the mapping from one view to another, using a large number of unlabelled samples for training. The learned parameters are then used in a semi-supervised network and fine-tuned with a small number of manually annotated landmarks. The actual pixel displacement error is introduced as a complement to an image similarity measure. The performance of the proposed fine-tuned method is evaluated on real farming datasets and demonstrates significant improvement in lowering the registration errors than commonly used feature-based and intensity-based methods. This approach also reduces the registration time of an unseen image pair to less than 0.5 s. The proposed method provides a high-quality reference processing step for improving subsequent tasks such as multi-object tracking and behaviour recognition of animals for further analysis

    Towards multi-class detection: a self-learning approach to reduce inter-class noise from training dataset

    No full text
    This paper proposes a novel self-learning framework, which converts a noisy, pre-labeled multi-class object dataset into a purified multi-class object dataset with object bounding-box annotations, by iteratively removing noise samples from the low-quality dataset, which may contain a high level of inter-class noise samples. The framework iteratively purifies the noisy training datasets for each class and updates the classification model for multiple classes. The procedure starts with a generic single-class object model which changes to a multi-class model in an iterative procedure of which the F-1 score is evaluated to reach a sufficiently high score. The proposed framework is based on learning the used models with CNNs. As a result, we obtain a purified multi-class dataset and as a spin-off, the updated multi-class object model. The proposed framework is evaluated on maritime surveillance, where vessels need to be classified into eight different types. The experimental results on the evaluation dataset show that the proposed framework improves the F-1 score approximately by 5% and 25% at the end of the third iteration, while the initial training datasets contain 40% and 60% inter-class noise samples (erroneously classified labels of vessels and without annotations), respectively. Additionally, the recall rate increases nearly by 38% (for the more challenging 60% inter-class noise case), while the mean Average Precision (mAP) rate remains stable

    Gender classification in low-resolution surveillance video: In-depth comparison of random forests and SVMs

    No full text
    This research considers gender classification in surveillance environments, typically involving low-resolution images and a large amount of viewpoint variations and occlusions. Gender classification is inherently difficult due to the large intraclass variation and interclass correlation. We have developed a gender classification system, which is successfully evaluated on two novel datasets, which realistically consider the above conditions, typical for surveillance. The system reaches a mean accuracy of up to 90% and approaches our human baseline of 92.6%, proving a high-quality gender classification system. We also present an in-depth discussion of the fundamental differences between SVM and RF classifiers. We conclude that balancing the degree of randomization in any classifier is required for the highest classification accuracy. For our problem, an RF-SVM hybrid classifier exploiting the combination of HSV and LBP features results in the highest classification accuracy of 89.9±0.2%, while classification computation time is negligible compared to the detection time of pedestrians

    Towards accurate camera geopositioning by image matching

    No full text
    In this work, we present a camera geopositioning system based on matching a query image against a database with panoramic images. For matching, our system uses memory vectors aggregated from global image descriptors based on convolutional features to facilitate fast searching in the database. To speed up searching, a clustering algorithm is used to balance geographical positioning and computation time. We refine the obtained position from the query image using a new outlier removal algorithm. The matching of the query image is obtained with a recall@5 larger than 90% for panorama-to-panorama matching. We cluster available panoramas from geographically adjacent locations into a single compact representation and observe computational gains of approximately 50% at the cost of only a small (approximately 3%) recall loss. Finally, we present a coordinate estimation algorithm that reduces the median geopositioning error by up to 20%

    R3P: real-time RGB-D registration pipeline

    No full text
    Applications based on colored 3-D data sequences suffer from lack of efficient algorithms for transformation estimation and key points extraction to perform accurate registration and sensor localization either in the 2-D or 3-D domain. Therefore, we propose a real-time RGB-D registration pipeline, named R3P, presented in processing layers. In this paper, we present an evaluation of several algorithm combinations for each layer, to optimize the registration and sensor localization for specific applications. The resulting dynamic reconfigurability of R3P makes it suitable as a front-end system for any SLAM reconstruction algorithm. Evaluation results on several public datasets reveal that R3P delivers real-time registration with 59 fps and high accuracy with the relative pose error (for a time span of 40 frames) for rotation and translation of 0.5� and 8 mm, respectively. All the heterogeneous dataset and implementations are publicly available under an open-source license [21]

    Enhanced face alignment using an unsupervised roll estimation initialization

    No full text
    We propose a novel and efficient initialization method for generalized facial landmark localization with an unsupervised roll-angle estimation based on B-spline models. We first show that the roll angle is crucial for an accurate landmark localization. Therefore, we develop an unsupervised roll-angle estimation by adopting a joint 1 st -order B-spline model, which is robust to intensity variations and generic for application to various face detectors. The method consists of three steps. First, the scaled-normalized Laplacian of Gaussian operator is applied to a bounding box generated by a face detector for extracting facial feature segments. Second, a joint 1 st -order B-spline model is fitted to the extracted facial feature segments, using an iterative optimization method. Finally, the roll angle is estimated through the aligned segments. We evaluate four state-of-the-art landmark localization schemes with the proposed roll-angle estimation initialization in the benchmark dataset. The proposed method boosts the performance of landmark localization in general, especially for cases with large head pose. Moreover, the proposed unsupervised roll-angle estimation method outperforms the standard supervised methods, such as random forest and support vector regression by 41.6% and 47.2%, respectively
    corecore