1,201 research outputs found
Recent Advances in Transfer Learning for Cross-Dataset Visual Recognition: A Problem-Oriented Perspective
This paper takes a problem-oriented perspective and presents a comprehensive
review of transfer learning methods, both shallow and deep, for cross-dataset
visual recognition. Specifically, it categorises the cross-dataset recognition
into seventeen problems based on a set of carefully chosen data and label
attributes. Such a problem-oriented taxonomy has allowed us to examine how
different transfer learning approaches tackle each problem and how well each
problem has been researched to date. The comprehensive problem-oriented review
of the advances in transfer learning with respect to the problem has not only
revealed the challenges in transfer learning for visual recognition, but also
the problems (e.g. eight of the seventeen problems) that have been scarcely
studied. This survey not only presents an up-to-date technical review for
researchers, but also a systematic approach and a reference for a machine
learning practitioner to categorise a real problem and to look up for a
possible solution accordingly
U3DS: Unsupervised 3D Semantic Scene Segmentation
Contemporary point cloud segmentation approaches largely rely on richly
annotated 3D training data. However, it is both time-consuming and challenging
to obtain consistently accurate annotations for such 3D scene data. Moreover,
there is still a lack of investigation into fully unsupervised scene
segmentation for point clouds, especially for holistic 3D scenes. This paper
presents U3DS, as a step towards completely unsupervised point cloud
segmentation for any holistic 3D scenes. To achieve this, U3DS leverages a
generalized unsupervised segmentation method for both object and background
across both indoor and outdoor static 3D point clouds with no requirement for
model pre-training, by leveraging only the inherent information of the point
cloud to achieve full 3D scene segmentation. The initial step of our proposed
approach involves generating superpoints based on the geometric characteristics
of each scene. Subsequently, it undergoes a learning process through a spatial
clustering-based methodology, followed by iterative training using
pseudo-labels generated in accordance with the cluster centroids. Moreover, by
leveraging the invariance and equivariance of the volumetric representations,
we apply the geometric transformation on voxelized features to provide two sets
of descriptors for robust representation learning. Finally, our evaluation
provides state-of-the-art results on the ScanNet and SemanticKITTI, and
competitive results on the S3DIS, benchmark datasets.Comment: 10 Pages, 4 figures, accepted to IEEE/CVF Winter Conference on
Applications of Computer Vision (WACV) 202
Dark Model Adaptation: Semantic Image Segmentation from Daytime to Nighttime
This work addresses the problem of semantic image segmentation of nighttime
scenes. Although considerable progress has been made in semantic image
segmentation, it is mainly related to daytime scenarios. This paper proposes a
novel method to progressive adapt the semantic models trained on daytime
scenes, along with large-scale annotations therein, to nighttime scenes via the
bridge of twilight time -- the time between dawn and sunrise, or between sunset
and dusk. The goal of the method is to alleviate the cost of human annotation
for nighttime images by transferring knowledge from standard daytime
conditions. In addition to the method, a new dataset of road scenes is
compiled; it consists of 35,000 images ranging from daytime to twilight time
and to nighttime. Also, a subset of the nighttime images are densely annotated
for method evaluation. Our experiments show that our method is effective for
model adaptation from daytime scenes to nighttime scenes, without using extra
human annotation.Comment: Accepted to International Conference on Intelligent Transportation
Systems (ITSC 2018
DALES: Automated Tool for Detection, Annotation, Labelling and Segmentation of Multiple Objects in Multi-Camera Video Streams
In this paper, we propose a new software tool called DALES to extract semantic information
from multi-view videos based on the analysis of their visual content. Our system is fully automatic
and is well suited for multi-camera environment. Once the multi-view video sequences are
loaded into DALES, our software performs the detection, counting, and segmentation of the visual
objects evolving in the provided video streams. Then, these objects of interest are processed
in order to be labelled, and the related frames are thus annotated with the corresponding semantic
content. Moreover, a textual script is automatically generated with the video annotations.
DALES system shows excellent performance in terms of accuracy and computational speed and
is robustly designed to ensure view synchronization
Contrastive Learning for Self-Supervised Pre-Training of Point Cloud Segmentation Networks With Image Data
Reducing the quantity of annotations required for supervised training is
vital when labels are scarce and costly. This reduction is particularly
important for semantic segmentation tasks involving 3D datasets, which are
often significantly smaller and more challenging to annotate than their
image-based counterparts. Self-supervised pre-training on unlabelled data is
one way to reduce the amount of manual annotations needed. Previous work has
focused on pre-training with point clouds exclusively. While useful, this
approach often requires two or more registered views. In the present work, we
combine image and point cloud modalities by first learning self-supervised
image features and then using these features to train a 3D model. By
incorporating image data, which is often included in many 3D datasets, our
pre-training method only requires a single scan of a scene and can be applied
to cases where localization information is unavailable. We demonstrate that our
pre-training approach, despite using single scans, achieves comparable
performance to other multi-scan, point cloud-only methods.Comment: In Proceedings of the Conference on Robots and Vision (CRV'23),
Montreal, Canada, Jun. 6-8, 2023. arXiv admin note: substantial text overlap
with arXiv:2211.1180
A review on deep learning techniques for 3D sensed data classification
Over the past decade deep learning has driven progress in 2D image
understanding. Despite these advancements, techniques for automatic 3D sensed
data understanding, such as point clouds, is comparatively immature. However,
with a range of important applications from indoor robotics navigation to
national scale remote sensing there is a high demand for algorithms that can
learn to automatically understand and classify 3D sensed data. In this paper we
review the current state-of-the-art deep learning architectures for processing
unstructured Euclidean data. We begin by addressing the background concepts and
traditional methodologies. We review the current main approaches including;
RGB-D, multi-view, volumetric and fully end-to-end architecture designs.
Datasets for each category are documented and explained. Finally, we give a
detailed discussion about the future of deep learning for 3D sensed data, using
literature to justify the areas where future research would be most valuable.Comment: 25 pages, 9 figures. Review pape
Learning Off-Road Terrain Traversability with Self-Supervisions Only
Estimating the traversability of terrain should be reliable and accurate in
diverse conditions for autonomous driving in off-road environments. However,
learning-based approaches often yield unreliable results when confronted with
unfamiliar contexts, and it is challenging to obtain manual annotations
frequently for new circumstances. In this paper, we introduce a method for
learning traversability from images that utilizes only self-supervision and no
manual labels, enabling it to easily learn traversability in new circumstances.
To this end, we first generate self-supervised traversability labels from past
driving trajectories by labeling regions traversed by the vehicle as highly
traversable. Using the self-supervised labels, we then train a neural network
that identifies terrains that are safe to traverse from an image using a
one-class classification algorithm. Additionally, we supplement the limitations
of self-supervised labels by incorporating methods of self-supervised learning
of visual representations. To conduct a comprehensive evaluation, we collect
data in a variety of driving environments and perceptual conditions and show
that our method produces reliable estimations in various environments. In
addition, the experimental results validate that our method outperforms other
self-supervised traversability estimation methods and achieves comparable
performances with supervised learning methods trained on manually labeled data.Comment: Accepted to IEEE Robotics and Automation Letters. Our video can be
found at https://bit.ly/3YdKan
- β¦