20 research outputs found

    Human Detection using Feature Fusion Set of LBP and HOG

    Get PDF
    Human detection has become one of the major aspect in the real time modern systems whether it is driver-less vehicles or in disaster management or surveillance. Multiple approaches of machine learning are used to find an efficient and effective way of human detection. The proposed method is mainly applied to address the pose-variant problem of human detection. It reduces the redundancy problem which leads to a slow system. To solve the pose variant and redundancy problem, mutation and crossover concept has been applied over Local Binary Pattern (LBP) and Histogram of Oriented Gradient (HOG) feature set to generate final set . Then combination of feature fusion set of LBP and HOG are fed into Support Vector Machine (SVM) for classification purpose. To improve the performance of detector an unsupervised framework has been used for learning. For post-processing to suppress overlapping and redundant windows - Non-maximal suppression is used . For training and testing purpose, INRIA dataset has been used. The proposed method is compared with HOG, LBP, and HOG-LBP techniques, the result shows that our method outperforms these techniques

    Visual Place Recognition in Changing Environments Utilising Sequence-Based Filtering and Extremely JPEG Compressed Images

    Get PDF
    Visual Place Recognition (VPR), part of Simultaneous Localisation and Mapping (SLAM), is an essential task for the localisation process, where each robotic platform is required to successfully navigate through its environment using visual information gathered from the on-board camera. Despite the recent efforts of the research community, VPR remains an improving process. To this end, a large number of deep-learning-based and handcrafted VPR techniques (also referred as learnt and non-learnt VPR techniques) have been proposed to overcome the challenges in this field, such as viewpoint, illumination and seasonal variations. While Convolutional Neural Network (CNN)-based VPR techniques have significant computational requirements that may restrict their applicability on resource-constrained platforms, handcrafted VPR techniques struggle with appearance changes. In this thesis, two mainly unexplored avenues of research are investigated, namely sequence-based filtering and JPEG compression. To overcome the previously mentioned challenges, this thesis proposes a handcrafted VPR technique based on HOG descriptors, paired with an adaptive sequence-based filtering schema to perform VPR in scenarios where the appearance of the environment drastically changes upon different traversals. The technique entitled ConvSequential-SLAM is capable of achieving comparable place matching performance with state-of-the-art VPR techniques at reduced computational costs. The approach utilised for matching sequences of images in the above technique has been employed to investigate the improvement in VPR performance and the computational effort required to execute VPR when utilising a sequence-based filtering approach. As CNNs are computationally demanding, this thesis shows that VPR can be performed more efficiently using lightweight techniques. Furthermore, this thesis also investigates the effects of JPEG compression for VPR applications, where important reductions in both transmission and storage requirements can be achieved. As the VPR performance is drastically reduced, especially for high compression ratios, this thesis shows how a fine-tuned CNN can achieve more consistent VPR performance on highly JPEG compressed data (i.e. above 90% JPEG compression). Sequence-based filtering is introduced to overcome the performance loss due to JPEG compression. This thesis shows that the size of a JPEG compressed image is often smaller than the size of the image descriptor, and therefore should be transferred instead. Furthermore, our experiments also show that the amount of data required for transfer is reduced with an increase in JPEG compression, even when requiring an increased number of images in a sequence. This thesis also analyses the effects of image resolution on the performance of handcrafted techniques, to enable efficient deployment of VPR solutions on commercial products. The analysis performed in this thesis confirms that local feature descriptors are unable to operate on low-resolution images, as no keypoints (salient information) are detected. Moreover, this thesis also shows that the time required to perform VPR is reduced with a decrease in image resolution

    Improving Visual Place Recognition in Changing Environments

    Get PDF
    For many years, the research community has been highly interested in autonomous robotics and its various applications, from healthcare to manufacturing, transportation to construction, and more. An autonomous robot's key challenge is the ability to determine its location. A fundamental research topic in localization is Visual Place Recognition (VPR), a task of detecting a previously visited location through visual input alone. One specific challenge in VPR is dealing with a place's appearance variation across different visits, which can occur due to viewpoint and environmental changes such as illumination, weather, and seasonal variations. While appearance changes already make VPR challenging, a further difficulty is posed by the resource constraints of many robots employed in real-world applications that limit the usability of learning-based techniques, which enable state-of-the-art performance but are computationally expensive. This thesis aims to combine the need for accurate place recognition in changing environments with low resource usage. The work presented here explores different approaches, from local image feature descriptors to Binary Neural Networks (BNN), to improve the computational and energy efficiency of VPR. The best BNN-based VPR descriptor obtained runs up to one order of magnitude faster than many CNN-based and hand-crafted approaches while maintaining comparable performance and expending a small amount of energy to process an image. Specifically, the proposed BNN can process an image 7 to 14 times faster than AlexNet, spending 13\% of the power at most when deployed on a low-end ARM platform. The results in this manuscript are presented using a new performance metric and an evaluation framework designed explicitly for VPR applications aiming at the two-fold purpose of providing meaningful insights into VPR performance and making results easily comparable across the chapters

    The Benefits of Dense Stereo for Pedestrian Detection

    Full text link

    Dynamic Time Warping of Deep Features for Place Recognition in Visually Varying Conditions

    Get PDF
    This paper presents a new visual place recognition (VPR) method based on dynamic time warping (DTW) and deep convolutional neural network. The proposal considers visual place recognition in environments that exhibit changes in several visual conditions like appearance and viewpoint changes. The proposed VPR method belongs to the sequence matching category, i.e., it utilizes the sequence-to-sequence image matching to recognize the best matching to the current test image. This approach extracts the image’s features from a deep CNN, where different layers of a two selected CNNs are investigated and the best performing layer along with the DTW is identified. Also, the performance of the deep features is compared to the one of classical features (handcrafted features like SIFT, HOG and LDB). Our experiments also compare the performance with other state-of-the-art visual place recognition algorithms, Holistic, Only look once, NetVLAD and SeqSLAM in particular. © 2021, King Fahd University of Petroleum & Minerals

    Patch-NetVLAD: Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition

    Full text link
    Visual Place Recognition is a challenging task for robotics and autonomous systems, which must deal with the twin problems of appearance and viewpoint change in an always changing world. This paper introduces Patch-NetVLAD, which provides a novel formulation for combining the advantages of both local and global descriptor methods by deriving patch-level features from NetVLAD residuals. Unlike the fixed spatial neighborhood regime of existing local keypoint features, our method enables aggregation and matching of deep-learned local features defined over the feature-space grid. We further introduce a multi-scale fusion of patch features that have complementary scales (i.e. patch sizes) via an integral feature space and show that the fused features are highly invariant to both condition (season, structure, and illumination) and viewpoint (translation and rotation) changes. Patch-NetVLAD outperforms both global and local feature descriptor-based methods with comparable compute, achieving state-of-the-art visual place recognition results on a range of challenging real-world datasets, including winning the Facebook Mapillary Visual Place Recognition Challenge at ECCV2020. It is also adaptable to user requirements, with a speed-optimised version operating over an order of magnitude faster than the state-of-the-art. By combining superior performance with improved computational efficiency in a configurable framework, Patch-NetVLAD is well suited to enhance both stand-alone place recognition capabilities and the overall performance of SLAM systems.Comment: Accepted to IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2021

    GSAP: A Global Structure Attention Pooling Method for Graph-Based Visual Place Recognition

    Full text link
    The Visual Place Recognition problem aims to use an image to recognize the location that has been visited before. In most of the scenes revisited, the appearance and view are drastically different. Most previous works focus on the 2-D image-based deep learning method. However, the convolutional features are not robust enough to the challenging scenes mentioned above. In this paper, in order to take advantage of the information that helps the Visual Place Recognition task in these challenging scenes, we propose a new graph construction approach to extract the useful information from an RGB image and a depth image and fuse them in graph data. Then, we deal with the Visual Place Recognition problem as a graph classification problem. We propose a new Global Pooling method—Global Structure Attention Pooling (GSAP), which improves the classification accuracy by improving the expression ability of the Global Pooling component. The experiments show that our GSAP method improves the accuracy of graph classification by approximately 2–5%, the graph construction method improves the accuracy of graph classification by approximately 4–6%, and that the whole Visual Place Recognition model is robust to appearance change and view change
    corecore