9 research outputs found

    Multimodal spatio-temporal deep learning framework for 3D object detection in instrumented vehicles

    Get PDF
    This thesis presents the utilization of multiple modalities, such as image and lidar, to incorporate spatio-temporal information from sequence data into deep learning architectures for 3Dobject detection in instrumented vehicles. The race to autonomy in instrumented vehicles or self-driving cars has stimulated significant research in developing autonomous driver assistance systems (ADAS) technologies related explicitly to perception systems. Object detection plays a crucial role in perception systems by providing spatial information to its subsequent modules; hence, accurate detection is a significant task supporting autonomous driving. The advent of deep learning in computer vision applications and the availability of multiple sensing modalities such as 360° imaging, lidar, and radar have led to state-of-the-art 2D and 3Dobject detection architectures. Most current state-of-the-art 3D object detection frameworks consider single-frame reference. However, these methods do not utilize temporal information associated with the objects or scenes from the sequence data. Thus, the present research hypothesizes that multimodal temporal information can contribute to bridging the gap between 2D and 3D metric space by improving the accuracy of deep learning frameworks for 3D object estimations. The thesis presents understanding multimodal data representations and selecting hyper-parameters using public datasets such as KITTI and nuScenes with Frustum-ConvNet as a baseline architecture. Secondly, an attention mechanism was employed along with convolutional-LSTM to extract spatial-temporal information from sequence data to improve 3D estimations and to aid the architecture in focusing on salient lidar point cloud features. Finally, various fusion strategies are applied to fuse the modalities and temporal information into the architecture to assess its efficacy on performance and computational complexity. Overall, this thesis has established the importance and utility of multimodal systems for refined 3D object detection and proposed a complex pipeline incorporating spatial, temporal and attention mechanisms to improve specific, and general class accuracy demonstrated on key autonomous driving data sets

    Urban footpath image dataset to assess pedestrian mobility

    Get PDF
    This paper presents an urban footpath image dataset captured through crowdsourcing using the mapillary service (mobile ap- plication) and demonstrating its use for data analytics applications by employing object detection and image segmentation. The study was motivated by the unique, individual mobility challenges that many people face in navigating public footpaths, in particular those who use mobility aids such as long cane, guide digs, crutches, wheelchairs, etc., when faced with changes in pavement surface (tactile pavements) or obstacles such as bollards and other street furniture. Existing image datasets are generally captured from an instrumented vehicle and do not provide sufficient or adequate images of the footpaths from the pedestrian perspective. A citizen science project (Crowd4Access) worked with user groups and volunteers to gather a sample image dataset resulting in a set of 39,642 images collected in a range of different conditions. Preliminary studies to detect tactile pavements and perform semantic segmentation using state-of-the-art computer vision models demonstrate the utility of this dataset to enable better understanding of urban mobility issues

    Object polygonization in traffic scenes using small Eigenvalue analysis

    Get PDF
    Shape polygonization is an effective and convenient method to compress the storage requirements of a shape curve. Polygonal approximation offers an invariant representation of local properties even after digitization of a shape curve. In this paper, we propose to use universal threshold for polygonal approximation of any two-dimensional object boundary by exploiting the strength of small eigenvalues. We also propose to adapt the Jaccard Index as a metric to measure the effectiveness of shape polygonization. In the context of this paper, we have conducted extensive experiments on the semantically segmented images from Cityscapes dataset to polygonize the objects in the traffic scenes. Further, to corroborate the efficacy of the proposed method, experiments on the MPEG-7 shape database are conducted. Results obtained by the proposed technique are encouraging and can enable greater compression of annotation documents. This is particularly critical in the domain of instrumented vehicles where large volumes of high quality video must be exhaustively annotated without loss of accuracy and least man-hours

    Utilising visual attention cues for vehicle detection and tracking

    Get PDF
    Advanced Driver-Assistance Systems (ADAS) have been attracting attention from many researchers. Vision-based sensors are the closest way to emulate human driver visual behaviour while driving. In this paper, we explore possible ways to use visual attention (saliency) for object detection and tracking. We investigate: 1) How a visual attention map such as a subjectness attention or saliency map and an objectness attention map can facilitate region proposal generation in a 2- stage object detector; 2) How a visual attention map can be used for tracking multiple objects. We propose a neural network that can simultaneously detect objects as and generate objectness and subjectness maps to save computational power. We further exploit the visual attention map during tracking using a sequential Monte Carlo probability hypothesis density (PHD) filter. The experiments are conducted on KITTI and DETRAC datasets. The use of visual attention and hierarchical features has shown a considerable improvement of ≈8% in object detection which effectively increased tracking performance by ≈4% on KITTI dataset

    A deep residual architecture for skin lesion segmentation

    No full text
    In this paper, we propose an automatic approach to skin lesion region segmentation based on a deep learning architecture with multi-scale residual connections. The architecture of the proposed model is based on UNet [22] with residual connections to maximise the learning capability and performance of the network. The information lost in the encoder stages due to the max-pooling layer at each level is preserved through the multi-scale residual connections. To corroborate the efficacy of the proposed model, extensive experiments are conducted on the ISIC 2017 challenge dataset without using any external dermatologic image set. An extensive comparative analysis is presented with contemporary methodologies to highlight the promising performance of the proposed methodology

    Object polygonization in traffic scenes using small Eigenvalue analysis

    No full text
    Shape polygonization is an effective and convenient method to compress the storage requirements of a shape curve. Polygonal approximation offers an invariant representation of local properties even after digitization of a shape curve. In this paper, we propose to use universal threshold for polygonal approximation of any two-dimensional object boundary by exploiting the strength of small eigenvalues. We also propose to adapt the Jaccard Index as a metric to measure the effectiveness of shape polygonization. In the context of this paper, we have conducted extensive experiments on the semantically segmented images from Cityscapes dataset to polygonize the objects in the traffic scenes. Further, to corroborate the efficacy of the proposed method, experiments on the MPEG-7 shape database are conducted. Results obtained by the proposed technique are encouraging and can enable greater compression of annotation documents. This is particularly critical in the domain of instrumented vehicles where large volumes of high quality video must be exhaustively annotated without loss of accuracy and least man-hours

    Utilising visual attention cues for vehicle detection and tracking

    No full text
    Advanced Driver-Assistance Systems (ADAS) have been attracting attention from many researchers. Vision-based sensors are the closest way to emulate human driver visual behaviour while driving. In this paper, we explore possible ways to use visual attention (saliency) for object detection and tracking. We investigate: 1) How a visual attention map such as a subjectness attention or saliency map and an objectness attention map can facilitate region proposal generation in a 2- stage object detector; 2) How a visual attention map can be used for tracking multiple objects. We propose a neural network that can simultaneously detect objects as and generate objectness and subjectness maps to save computational power. We further exploit the visual attention map during tracking using a sequential Monte Carlo probability hypothesis density (PHD) filter. The experiments are conducted on KITTI and DETRAC datasets. The use of visual attention and hierarchical features has shown a considerable improvement of ≈8% in object detection which effectively increased tracking performance by ≈4% on KITTI dataset

    Saliency guided 2D-object annotation for instrumented vehicles

    Get PDF
    Instrumented vehicles can produce huge volumes of video data per vehicle per day that must be analysed automatically, often in real time. This analysis should include identifying the presence of objects and tagging these as semantic concepts such as car, pedestrian, etc. An important element in achieving this is the annotation of training data for machine learning algorithms, which requires accurate labels at a high-level of granularity. Current practise is to use trained human annotators who can annotate only a limited volume of video per day. In this paper, we demonstrate how a generic human saliency classifier can provide visual cues for object detection using deep learning approaches. Our work is applied to datasets for autonomous driving. Our experiments show that utilizing visual saliency improves the detection of small objects and increases the overall accuracy compared with a standalone single shot multibox detector
    corecore