2,677 research outputs found
Discrete features for rapid pedestrian detection in infrared images
Proceeding of: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems October 7-12, 2012. Vilamoura, Algarve, Portugal.In this paper the authors propose a pedestrian detection system based on discrete features in infrared images. Unique keypoints are searched for in the images around which a descriptor, based on the histogram of the phase congruency orientation, is extracted. These descriptors are matched with defined regions of the body of a pedestrian. In case of a match, it creates a region of interest in the image, which is classified as a pedestrian / non-pedestrian by an SVM classifier. The pedestrian detection system has been tested in an advanced driver assistance system for urban driving.This work was supported by the Spanish Government through the
Cicyt projects FEDORA (GRANT TRA2010-20225-C03- 01) and Driver
Distraction Detector System (GRANT TRA2011-29454-C03-02), and by
the Comunidad de Madrid through the project SEGVAUTO (S2009/DPI-
1509).Publicad
WCCNet: Wavelet-integrated CNN with Crossmodal Rearranging Fusion for Fast Multispectral Pedestrian Detection
Multispectral pedestrian detection achieves better visibility in challenging
conditions and thus has a broad application in various tasks, for which both
the accuracy and computational cost are of paramount importance. Most existing
approaches treat RGB and infrared modalities equally, typically adopting two
symmetrical CNN backbones for multimodal feature extraction, which ignores the
substantial differences between modalities and brings great difficulty for the
reduction of the computational cost as well as effective crossmodal fusion. In
this work, we propose a novel and efficient framework named WCCNet that is able
to differentially extract rich features of different spectra with lower
computational complexity and semantically rearranges these features for
effective crossmodal fusion. Specifically, the discrete wavelet transform (DWT)
allowing fast inference and training speed is embedded to construct a
dual-stream backbone for efficient feature extraction. The DWT layers of WCCNet
extract frequency components for infrared modality, while the CNN layers
extract spatial-domain features for RGB modality. This methodology not only
significantly reduces the computational complexity, but also improves the
extraction of infrared features to facilitate the subsequent crossmodal fusion.
Based on the well extracted features, we elaborately design the crossmodal
rearranging fusion module (CMRF), which can mitigate spatial misalignment and
merge semantically complementary features of spatially-related local regions to
amplify the crossmodal complementary information. We conduct comprehensive
evaluations on KAIST and FLIR benchmarks, in which WCCNet outperforms
state-of-the-art methods with considerable computational efficiency and
competitive accuracy. We also perform the ablation study and analyze thoroughly
the impact of different components on the performance of WCCNet.Comment: Submitted to TPAM
Calibration-free Pedestrian Partial Pose Estimation Using a High-mounted Kinect
Les applications de l’analyse du comportement humain ont subit de rapides développements durant les dernières décades, tant au niveau des systèmes de divertissements que pour des applications professionnelles comme les interfaces humain-machine, les systèmes d’assistance de conduite automobile ou des systèmes de protection des piétons. Cette thèse traite du problème de reconnaissance de piétons ainsi qu’à l’estimation de leur orientation en 3D. Cette estimation est faite dans l’optique que la connaissance de cette orientation est bénéfique tant au niveau de l’analyse que de la prédiction du comportement des piétons. De ce fait, cette thèse propose à la fois une nouvelle méthode pour détecter les piétons et une manière d’estimer leur orientation, par l’intégration séquentielle d’un module de détection et un module d’estimation d’orientation. Pour effectuer cette détection de piéton, nous avons conçu un classificateur en cascade qui génère automatiquement une boîte autour des piétons détectés dans l’image. Suivant cela, des régions sont extraites d’un nuage de points 3D afin de classifier l’orientation du torse du piéton. Cette classification se base sur une image synthétique grossière par tramage (rasterization) qui simule une caméra virtuelle placée immédiatement au-dessus du piéton détecté. Une machine à vecteurs de support effectue la classification à partir de cette image de synthèse, pour l’une des 10 orientations discrètes utilisées lors de l’entrainement (incréments de 30 degrés). Afin de valider les performances de notre approche d’estimation d’orientation, nous avons construit une base de données de référence contenant 764 nuages de points. Ces données furent capturées à l’aide d’une caméra Kinect de Microsoft pour 30 volontaires différents, et la vérité-terrain sur l’orientation fut établie par l’entremise d’un système de capture de mouvement Vicon. Finalement, nous avons démontré les améliorations apportées par notre approche. En particulier, nous pouvons détecter des piétons avec une précision de 95.29% et estimer l’orientation du corps (dans un intervalle de 30 degrés) avec une précision de 88.88%. Nous espérons ainsi que nos résultats de recherche puissent servir de point de départ à d’autres recherches futures.The application of human behavior analysis has undergone rapid development during the last decades from entertainment system to professional one, as Human Robot Interaction (HRI), Advanced Driver Assistance System (ADAS), Pedestrian Protection System (PPS), etc. Meanwhile, this thesis addresses the problem of recognizing pedestrians and estimating their body orientation in 3D based on the fact that estimating a person’s orientation is beneficial in determining their behavior. In this thesis, a new method is proposed for detecting and estimating the orientation, in which the result of a pedestrian detection module and a orientation estimation module are integrated sequentially. For the goal of pedestrian detection, a cascade classifier is designed to draw a bounding box around the detected pedestrian. Following this, extracted regions are given to a discrete orientation classifier to estimate pedestrian body’s orientation. This classification is based on a coarse, rasterized depth image simulating a top-view virtual camera, and uses a support vector machine classifier that was trained to distinguish 10 orientations (30 degrees increments). In order to test the performance of our approach, a new benchmark database contains 764 sets of point cloud for body-orientation classification was captured. For this benchmark, a Kinect recorded the point cloud of 30 participants and a marker-based motion capture system (Vicon) provided the ground truth on their orientation. Finally we demonstrated the improvements brought by our system, as it detected pedestrian with an accuracy of 95:29% and estimated the body orientation with an accuracy of 88:88%.We hope it can provide a new foundation for future researches
Robust pedestrian detection and tracking in crowded scenes
In this paper, a robust computer vision approach to detecting and tracking pedestrians in unconstrained crowded scenes is presented. Pedestrian detection is performed via a 3D clustering process within a region-growing framework. The clustering process avoids using hard thresholds by using bio-metrically inspired constraints and a number of plan view statistics. Pedestrian tracking is achieved by formulating the track matching process as a weighted bipartite graph and using a Weighted Maximum Cardinality Matching scheme. The approach is evaluated using both indoor and outdoor sequences, captured using a variety of different camera placements and orientations, that feature significant challenges in terms of the number of pedestrians present, their interactions and scene lighting conditions. The evaluation is performed against a manually generated groundtruth for all sequences. Results point to the extremely accurate performance of the proposed approach in all cases
Fourier-based Rotation-invariant Feature Boosting: An Efficient Framework for Geospatial Object Detection
Geospatial object detection of remote sensing imagery has been attracting an
increasing interest in recent years, due to the rapid development in spaceborne
imaging. Most of previously proposed object detectors are very sensitive to
object deformations, such as scaling and rotation. To this end, we propose a
novel and efficient framework for geospatial object detection in this letter,
called Fourier-based rotation-invariant feature boosting (FRIFB). A
Fourier-based rotation-invariant feature is first generated in polar
coordinate. Then, the extracted features can be further structurally refined
using aggregate channel features. This leads to a faster feature computation
and more robust feature representation, which is good fitting for the coming
boosting learning. Finally, in the test phase, we achieve a fast pyramid
feature extraction by estimating a scale factor instead of directly collecting
all features from image pyramid. Extensive experiments are conducted on two
subsets of NWPU VHR-10 dataset, demonstrating the superiority and effectiveness
of the FRIFB compared to previous state-of-the-art methods
Automated Complexity-Sensitive Image Fusion
To construct a complete representation of a scene with environmental obstacles such as fog, smoke, darkness, or textural homogeneity, multisensor video streams captured in diferent modalities are considered. A computational method for automatically fusing multimodal image streams into a highly informative and unified stream is proposed. The method consists of the following steps: 1. Image registration is performed to align video frames in the visible band over time, adapting to the nonplanarity of the scene by automatically subdividing the image domain into regions approximating planar patches
2. Wavelet coefficients are computed for each of the input frames in each modality
3. Corresponding regions and points are compared using spatial and temporal information across various scales
4. Decision rules based on the results of multimodal image analysis are used to combine thewavelet coefficients from different modalities
5. The combined wavelet coefficients are inverted to produce an output frame containing useful information gathered from the available modalities
Experiments show that the proposed system is capable of producing fused output containing the characteristics of color visible-spectrum imagery while adding information exclusive to infrared imagery, with attractive visual and informational properties
- …