1,879 research outputs found

    An evaluation of the pedestrian classification in a multi-domain multi-modality setup

    Get PDF
    The objective of this article is to study the problem of pedestrian classification across different light spectrum domains (visible and far-infrared (FIR)) and modalities (intensity, depth and motion). In recent years, there has been a number of approaches for classifying and detecting pedestrians in both FIR and visible images, but the methods are difficult to compare, because either the datasets are not publicly available or they do not offer a comparison between the two domains. Our two primary contributions are the following: (1) we propose a public dataset, named RIFIR , containing both FIR and visible images collected in an urban environment from a moving vehicle during daytime; and (2) we compare the state-of-the-art features in a multi-modality setup: intensity, depth and flow, in far-infrared over visible domains. The experiments show that features families, intensity self-similarity (ISS), local binary patterns (LBP), local gradient patterns (LGP) and histogram of oriented gradients (HOG), computed from FIR and visible domains are highly complementary, but their relative performance varies across different modalities. In our experiments, the FIR domain has proven superior to the visible one for the task of pedestrian classification, but the overall best results are obtained by a multi-domain multi-modality multi-feature fusion

    Discrete features for rapid pedestrian detection in infrared images

    Get PDF
    Proceeding of: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems October 7-12, 2012. Vilamoura, Algarve, Portugal.In this paper the authors propose a pedestrian detection system based on discrete features in infrared images. Unique keypoints are searched for in the images around which a descriptor, based on the histogram of the phase congruency orientation, is extracted. These descriptors are matched with defined regions of the body of a pedestrian. In case of a match, it creates a region of interest in the image, which is classified as a pedestrian / non-pedestrian by an SVM classifier. The pedestrian detection system has been tested in an advanced driver assistance system for urban driving.This work was supported by the Spanish Government through the Cicyt projects FEDORA (GRANT TRA2010-20225-C03- 01) and Driver Distraction Detector System (GRANT TRA2011-29454-C03-02), and by the Comunidad de Madrid through the project SEGVAUTO (S2009/DPI- 1509).Publicad

    Facial Component Detection in Thermal Imagery

    Get PDF
    This paper studies the problem of detecting facial components in thermal imagery (specifically eyes, nostrils and mouth). One of the immediate goals is to enable the automatic registration of facial thermal images. The detection of eyes and nostrils is performed using Haar features and the GentleBoost algorithm, which are shown to provide superior detection rates. The detection of the mouth is based on the detections of the eyes and the nostrils and is performed using measures of entropy and self similarity. The results show that reliable facial component detection is feasible using this methodology, getting a correct detection rate for both eyes and nostrils of 0.8. A correct eyes and nostrils detection enables a correct detection of the mouth in 65% of closed-mouth test images and in 73% of open-mouth test images

    C2\mathbf{C}^2Former: Calibrated and Complementary Transformer for RGB-Infrared Object Detection

    Full text link
    Object detection on visible (RGB) and infrared (IR) images, as an emerging solution to facilitate robust detection for around-the-clock applications, has received extensive attention in recent years. With the help of IR images, object detectors have been more reliable and robust in practical applications by using RGB-IR combined information. However, existing methods still suffer from modality miscalibration and fusion imprecision problems. Since transformer has the powerful capability to model the pairwise correlations between different features, in this paper, we propose a novel Calibrated and Complementary Transformer called C2\mathrm{C}^2Former to address these two problems simultaneously. In C2\mathrm{C}^2Former, we design an Inter-modality Cross-Attention (ICA) module to obtain the calibrated and complementary features by learning the cross-attention relationship between the RGB and IR modality. To reduce the computational cost caused by computing the global attention in ICA, an Adaptive Feature Sampling (AFS) module is introduced to decrease the dimension of feature maps. Because C2\mathrm{C}^2Former performs in the feature domain, it can be embedded into existed RGB-IR object detectors via the backbone network. Thus, one single-stage and one two-stage object detector both incorporating our C2\mathrm{C}^2Former are constructed to evaluate its effectiveness and versatility. With extensive experiments on the DroneVehicle and KAIST RGB-IR datasets, we verify that our method can fully utilize the RGB-IR complementary information and achieve robust detection results. The code is available at https://github.com/yuanmaoxun/Calibrated-and-Complementary-Transformer-for-RGB-Infrared-Object-Detection.git

    Face Detection of Thermal Images in Various Standing Body-Pose using Facial Geometry

    Get PDF
     Automatic face detection in frontal view for thermal images is a primary task in a health system e.g. febrile identification or security system e.g. intruder recognition. In a daily state, the scanned person does not always stay in frontal face view. This paper develops an algorithm to identify a frontal face in various standing body-pose. The algorithm used an image processing method where first it segmented face based on human skin’s temperature. Some exposed non-face body parts could also get included in the segmentation result, hence discriminant features of a face were applied. The shape features were based on the characteristic of a frontal face, which are: (1) Size of a face, (2) facial Golden Ratio, and (3) Shape of a face is oval. The algorithm was tested on various standing body-pose that rotate 360° towards 2 meters and 4 meters camera-to-object distance. The accuracy of the algorithm on face detection in a manageable environment is 95.8%. It detected face whether the person was wearing glasses or not

    Pedestrian Detection at Day/Night Time with Visible and FIR Cameras : A Comparison

    Get PDF
    Altres ajuts: DGT (SPIP2014-01352)Despite all the significant advances in pedestrian detection brought by computer vision for driving assistance, it is still a challenging problem. One reason is the extremely varying lighting conditions under which such a detector should operate, namely day and nighttime. Recent research has shown that the combination of visible and non-visible imaging modalities may increase detection accuracy, where the infrared spectrum plays a critical role. The goal of this paper is to assess the accuracy gain of different pedestrian models (holistic, part-based, patch-based) when training with images in the far infrared spectrum. Specifically, we want to compare detection accuracy on test images recorded at day and nighttime if trained (and tested) using (a) plain color images; (b) just infrared images; and (c) both of them. In order to obtain results for the last item, we propose an early fusion approach to combine features from both modalities. We base the evaluation on a new dataset that we have built for this purpose as well as on the publicly available KAIST multispectral dataset

    Disentangled Contrastive Image Translation for Nighttime Surveillance

    Full text link
    Nighttime surveillance suffers from degradation due to poor illumination and arduous human annotations. It is challengable and remains a security risk at night. Existing methods rely on multi-spectral images to perceive objects in the dark, which are troubled by low resolution and color absence. We argue that the ultimate solution for nighttime surveillance is night-to-day translation, or Night2Day, which aims to translate a surveillance scene from nighttime to the daytime while maintaining semantic consistency. To achieve this, this paper presents a Disentangled Contrastive (DiCo) learning method. Specifically, to address the poor and complex illumination in the nighttime scenes, we propose a learnable physical prior, i.e., the color invariant, which provides a stable perception of a highly dynamic night environment and can be incorporated into the learning pipeline of neural networks. Targeting the surveillance scenes, we develop a disentangled representation, which is an auxiliary pretext task that separates surveillance scenes into the foreground and background with contrastive learning. Such a strategy can extract the semantics without supervision and boost our model to achieve instance-aware translation. Finally, we incorporate all the modules above into generative adversarial networks and achieve high-fidelity translation. This paper also contributes a new surveillance dataset called NightSuR. It includes six scenes to support the study on nighttime surveillance. This dataset collects nighttime images with different properties of nighttime environments, such as flare and extreme darkness. Extensive experiments demonstrate that our method outperforms existing works significantly. The dataset and source code will be released on GitHub soon.Comment: Submitted to TI
    corecore