1,201 research outputs found

    Recent Advances in Transfer Learning for Cross-Dataset Visual Recognition: A Problem-Oriented Perspective

    Get PDF
    This paper takes a problem-oriented perspective and presents a comprehensive review of transfer learning methods, both shallow and deep, for cross-dataset visual recognition. Specifically, it categorises the cross-dataset recognition into seventeen problems based on a set of carefully chosen data and label attributes. Such a problem-oriented taxonomy has allowed us to examine how different transfer learning approaches tackle each problem and how well each problem has been researched to date. The comprehensive problem-oriented review of the advances in transfer learning with respect to the problem has not only revealed the challenges in transfer learning for visual recognition, but also the problems (e.g. eight of the seventeen problems) that have been scarcely studied. This survey not only presents an up-to-date technical review for researchers, but also a systematic approach and a reference for a machine learning practitioner to categorise a real problem and to look up for a possible solution accordingly

    U3DS3^3: Unsupervised 3D Semantic Scene Segmentation

    Full text link
    Contemporary point cloud segmentation approaches largely rely on richly annotated 3D training data. However, it is both time-consuming and challenging to obtain consistently accurate annotations for such 3D scene data. Moreover, there is still a lack of investigation into fully unsupervised scene segmentation for point clouds, especially for holistic 3D scenes. This paper presents U3DS3^3, as a step towards completely unsupervised point cloud segmentation for any holistic 3D scenes. To achieve this, U3DS3^3 leverages a generalized unsupervised segmentation method for both object and background across both indoor and outdoor static 3D point clouds with no requirement for model pre-training, by leveraging only the inherent information of the point cloud to achieve full 3D scene segmentation. The initial step of our proposed approach involves generating superpoints based on the geometric characteristics of each scene. Subsequently, it undergoes a learning process through a spatial clustering-based methodology, followed by iterative training using pseudo-labels generated in accordance with the cluster centroids. Moreover, by leveraging the invariance and equivariance of the volumetric representations, we apply the geometric transformation on voxelized features to provide two sets of descriptors for robust representation learning. Finally, our evaluation provides state-of-the-art results on the ScanNet and SemanticKITTI, and competitive results on the S3DIS, benchmark datasets.Comment: 10 Pages, 4 figures, accepted to IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 202

    Dark Model Adaptation: Semantic Image Segmentation from Daytime to Nighttime

    Full text link
    This work addresses the problem of semantic image segmentation of nighttime scenes. Although considerable progress has been made in semantic image segmentation, it is mainly related to daytime scenarios. This paper proposes a novel method to progressive adapt the semantic models trained on daytime scenes, along with large-scale annotations therein, to nighttime scenes via the bridge of twilight time -- the time between dawn and sunrise, or between sunset and dusk. The goal of the method is to alleviate the cost of human annotation for nighttime images by transferring knowledge from standard daytime conditions. In addition to the method, a new dataset of road scenes is compiled; it consists of 35,000 images ranging from daytime to twilight time and to nighttime. Also, a subset of the nighttime images are densely annotated for method evaluation. Our experiments show that our method is effective for model adaptation from daytime scenes to nighttime scenes, without using extra human annotation.Comment: Accepted to International Conference on Intelligent Transportation Systems (ITSC 2018

    DALES: Automated Tool for Detection, Annotation, Labelling and Segmentation of Multiple Objects in Multi-Camera Video Streams

    Get PDF
    In this paper, we propose a new software tool called DALES to extract semantic information from multi-view videos based on the analysis of their visual content. Our system is fully automatic and is well suited for multi-camera environment. Once the multi-view video sequences are loaded into DALES, our software performs the detection, counting, and segmentation of the visual objects evolving in the provided video streams. Then, these objects of interest are processed in order to be labelled, and the related frames are thus annotated with the corresponding semantic content. Moreover, a textual script is automatically generated with the video annotations. DALES system shows excellent performance in terms of accuracy and computational speed and is robustly designed to ensure view synchronization

    Contrastive Learning for Self-Supervised Pre-Training of Point Cloud Segmentation Networks With Image Data

    Full text link
    Reducing the quantity of annotations required for supervised training is vital when labels are scarce and costly. This reduction is particularly important for semantic segmentation tasks involving 3D datasets, which are often significantly smaller and more challenging to annotate than their image-based counterparts. Self-supervised pre-training on unlabelled data is one way to reduce the amount of manual annotations needed. Previous work has focused on pre-training with point clouds exclusively. While useful, this approach often requires two or more registered views. In the present work, we combine image and point cloud modalities by first learning self-supervised image features and then using these features to train a 3D model. By incorporating image data, which is often included in many 3D datasets, our pre-training method only requires a single scan of a scene and can be applied to cases where localization information is unavailable. We demonstrate that our pre-training approach, despite using single scans, achieves comparable performance to other multi-scan, point cloud-only methods.Comment: In Proceedings of the Conference on Robots and Vision (CRV'23), Montreal, Canada, Jun. 6-8, 2023. arXiv admin note: substantial text overlap with arXiv:2211.1180

    A review on deep learning techniques for 3D sensed data classification

    Get PDF
    Over the past decade deep learning has driven progress in 2D image understanding. Despite these advancements, techniques for automatic 3D sensed data understanding, such as point clouds, is comparatively immature. However, with a range of important applications from indoor robotics navigation to national scale remote sensing there is a high demand for algorithms that can learn to automatically understand and classify 3D sensed data. In this paper we review the current state-of-the-art deep learning architectures for processing unstructured Euclidean data. We begin by addressing the background concepts and traditional methodologies. We review the current main approaches including; RGB-D, multi-view, volumetric and fully end-to-end architecture designs. Datasets for each category are documented and explained. Finally, we give a detailed discussion about the future of deep learning for 3D sensed data, using literature to justify the areas where future research would be most valuable.Comment: 25 pages, 9 figures. Review pape

    Learning Off-Road Terrain Traversability with Self-Supervisions Only

    Full text link
    Estimating the traversability of terrain should be reliable and accurate in diverse conditions for autonomous driving in off-road environments. However, learning-based approaches often yield unreliable results when confronted with unfamiliar contexts, and it is challenging to obtain manual annotations frequently for new circumstances. In this paper, we introduce a method for learning traversability from images that utilizes only self-supervision and no manual labels, enabling it to easily learn traversability in new circumstances. To this end, we first generate self-supervised traversability labels from past driving trajectories by labeling regions traversed by the vehicle as highly traversable. Using the self-supervised labels, we then train a neural network that identifies terrains that are safe to traverse from an image using a one-class classification algorithm. Additionally, we supplement the limitations of self-supervised labels by incorporating methods of self-supervised learning of visual representations. To conduct a comprehensive evaluation, we collect data in a variety of driving environments and perceptual conditions and show that our method produces reliable estimations in various environments. In addition, the experimental results validate that our method outperforms other self-supervised traversability estimation methods and achieves comparable performances with supervised learning methods trained on manually labeled data.Comment: Accepted to IEEE Robotics and Automation Letters. Our video can be found at https://bit.ly/3YdKan
    • …
    corecore