6,838 research outputs found
Pyramid Semantic Graph-based Global Point Cloud Registration with Low Overlap
Global point cloud registration is essential in many robotics tasks like loop
closing and relocalization. Unfortunately, the registration often suffers from
the low overlap between point clouds, a frequent occurrence in practical
applications due to occlusion and viewpoint change. In this paper, we propose a
graph-theoretic framework to address the problem of global point cloud
registration with low overlap. To this end, we construct a consistency graph to
facilitate robust data association and employ graduated non-convexity (GNC) for
reliable pose estimation, following the state-of-the-art (SoTA) methods.
Unlike previous approaches, we use semantic cues to scale down the dense
point clouds, thus reducing the problem size. Moreover, we address the
ambiguity arising from the consistency threshold by constructing a pyramid
graph with multi-level consistency thresholds. Then we propose a cascaded
gradient ascend method to solve the resulting densest clique problem and obtain
multiple pose candidates for every consistency threshold. Finally, fast
geometric verification is employed to select the optimal estimation from
multiple pose candidates. Our experiments, conducted on a self-collected indoor
dataset and the public KITTI dataset, demonstrate that our method achieves the
highest success rate despite the low overlap of point clouds and low semantic
quality. We have open-sourced our code
https://github.com/HKUST-Aerial-Robotics/Pagor for this project.Comment: Accepted by IROS202
Introduction to Facial Micro Expressions Analysis Using Color and Depth Images: A Matlab Coding Approach (Second Edition, 2023)
The book attempts to introduce a gentle introduction to the field of Facial
Micro Expressions Recognition (FMER) using Color and Depth images, with the aid
of MATLAB programming environment. FMER is a subset of image processing and it
is a multidisciplinary topic to analysis. So, it requires familiarity with
other topics of Artifactual Intelligence (AI) such as machine learning, digital
image processing, psychology and more. So, it is a great opportunity to write a
book which covers all of these topics for beginner to professional readers in
the field of AI and even without having background of AI. Our goal is to
provide a standalone introduction in the field of MFER analysis in the form of
theorical descriptions for readers with no background in image processing with
reproducible Matlab practical examples. Also, we describe any basic definitions
for FMER analysis and MATLAB library which is used in the text, that helps
final reader to apply the experiments in the real-world applications. We
believe that this book is suitable for students, researchers, and professionals
alike, who need to develop practical skills, along with a basic understanding
of the field. We expect that, after reading this book, the reader feels
comfortable with different key stages such as color and depth image processing,
color and depth image representation, classification, machine learning, facial
micro-expressions recognition, feature extraction and dimensionality reduction.
The book attempts to introduce a gentle introduction to the field of Facial
Micro Expressions Recognition (FMER) using Color and Depth images, with the aid
of MATLAB programming environment.Comment: This is the second edition of the boo
Object Segmentation and Reconstruction Using Infrastructure Sensor Nodes for Autonomous Mobility
This thesis focuses on the Lidar point cloud processing for the infrastructure sensor node that serves as the perception system for autonomous robots with general mobility in indoor applications. Compared with typical schemes mounting sensors on the robots, the method acquires data from infrastructure sensor nodes, providing a more comprehensive view of the environment, which benefits the robot's navigation. The number of sensors would not need to be increased even for multiple robots, significantly reducing costs. In addition, with a central perception system using the infrastructure sensor nodes navigating every robot, a more comprehensive understanding of the current environment and all the robots' locations can be obtained for the control and operation of the autonomous robots.
For a robot in the detection range of the sensor node, the sensor node can detect and segment obstacles in its driveable area and reconstruct the incomplete, sparse point cloud of objects upon their movement. The complete shape by the reconstruction benefits the localization and path planning which follows the perception part of the robot's system.
Considering the sparse Lidar data and the variety of object categories in the environment, a model-free scheme is selected for object segmentation. Point segmentation starts with background filtering. Considering the complexity of the indoor environment, a depth-matching-based background removal approach is first proposed. However, later tests imply that the method is adequate but not time-efficient. Therefore, based on the depth matching-based method, a process that only focuses on the drive-able area of the robot is proposed, and the computational complexity is significantly reduced. With optimization, the computation time for processing one frame of data can be greatly increased, from 0.2 second by the first approach to 0.01 second by the second approach. After background filtering, the remaining points for occurring objects are segmented as separate clusters using an object clustering algorithm.
With independent clusters of objects, an object tracking algorithm is followed to allocate the point clusters with IDs and arrange the clusters in a time sequence. With a stream of clusters for a specific object in a time sequence, point registration is deployed to aggregate the clusters into a complete shape. And as noticed during the experiment, one of the differences between indoor and outdoor environments is that contact between objects in the indoor environment is much more common. The objects in contact are likely to be segmented as a single cluster by the model-free clustering algorithm, which needs to be avoided in the reconstruction process. Therefore an improvement is made in the tracking algorithm when contact happens. The algorithms in this thesis have been experimentally evaluated and presented
3D-SeqMOS: A Novel Sequential 3D Moving Object Segmentation in Autonomous Driving
For the SLAM system in robotics and autonomous driving, the accuracy of
front-end odometry and back-end loop-closure detection determine the whole
intelligent system performance. But the LiDAR-SLAM could be disturbed by
current scene moving objects, resulting in drift errors and even loop-closure
failure. Thus, the ability to detect and segment moving objects is essential
for high-precision positioning and building a consistent map. In this paper, we
address the problem of moving object segmentation from 3D LiDAR scans to
improve the odometry and loop-closure accuracy of SLAM. We propose a novel 3D
Sequential Moving-Object-Segmentation (3D-SeqMOS) method that can accurately
segment the scene into moving and static objects, such as moving and static
cars. Different from the existing projected-image method, we process the raw 3D
point cloud and build a 3D convolution neural network for MOS task. In
addition, to make full use of the spatio-temporal information of point cloud,
we propose a point cloud residual mechanism using the spatial features of
current scan and the temporal features of previous residual scans. Besides, we
build a complete SLAM framework to verify the effectiveness and accuracy of
3D-SeqMOS. Experiments on SemanticKITTI dataset show that our proposed
3D-SeqMOS method can effectively detect moving objects and improve the accuracy
of LiDAR odometry and loop-closure detection. The test results show our
3D-SeqMOS outperforms the state-of-the-art method by 12.4%. We extend the
proposed method to the SemanticKITTI: Moving Object Segmentation competition
and achieve the 2nd in the leaderboard, showing its effectiveness
Visually Adversarial Attacks and Defenses in the Physical World: A Survey
Although Deep Neural Networks (DNNs) have been widely applied in various
real-world scenarios, they are vulnerable to adversarial examples. The current
adversarial attacks in computer vision can be divided into digital attacks and
physical attacks according to their different attack forms. Compared with
digital attacks, which generate perturbations in the digital pixels, physical
attacks are more practical in the real world. Owing to the serious security
problem caused by physically adversarial examples, many works have been
proposed to evaluate the physically adversarial robustness of DNNs in the past
years. In this paper, we summarize a survey versus the current physically
adversarial attacks and physically adversarial defenses in computer vision. To
establish a taxonomy, we organize the current physical attacks from attack
tasks, attack forms, and attack methods, respectively. Thus, readers can have a
systematic knowledge of this topic from different aspects. For the physical
defenses, we establish the taxonomy from pre-processing, in-processing, and
post-processing for the DNN models to achieve full coverage of the adversarial
defenses. Based on the above survey, we finally discuss the challenges of this
research field and further outlook on the future direction
STELLAR: A LARGE SATELLITE STEREO DATASET FOR DIGITAL SURFACE MODEL GENERATION
Stellar is a large, satellite stereo dataset. It contains rectified stereo pairs of the terrain captured by the satellite image sensors and corresponding true disparity maps and semantic segmentation. Unlike stereo vision in autonomous driving and mobile imaging, a satellite stereo pair is not captured simultaneously. Thus, the same object in a satellite stereo pair is more likely to have a varied visual appearance. Stellar provides flexible access to such stereo pairs to train methods to be robust to such appearance variation. We use publicly available data sources, and invented several techniques to perform data registration, rectification, and semantic segmentation on the data to build Stellar. In our preliminary experiment, we fine-tuned two deep-learning stereo methods on Stellar. The result demonstrates that most of the time, these methods generate denser and more accurate disparity maps for satellite stereo by fine-tuning on Stellar, compared to without fine-tuning on satellite stereo datasets, or fine-tuning on previous, smaller satellite stereo datasets. Stellar is available to download at https://github.com/guo-research-group/Stellar
Deep Learning for Scene Flow Estimation on Point Clouds: A Survey and Prospective Trends
Aiming at obtaining structural information and 3D motion of dynamic scenes, scene flow estimation has been an interest of research in computer vision and computer graphics for a long time. It is also a fundamental task for various applications such as autonomous driving. Compared to previous methods that utilize image representations, many recent researches build upon the power of deep analysis and focus on point clouds representation to conduct 3D flow estimation. This paper comprehensively reviews the pioneering literature in scene flow estimation based on point clouds. Meanwhile, it delves into detail in learning paradigms and presents insightful comparisons between the state-of-the-art methods using deep learning for scene flow estimation. Furthermore, this paper investigates various higher-level scene understanding tasks, including object tracking, motion segmentation, etc. and concludes with an overview of foreseeable research trends for scene flow estimation
Quantum Annealing for Single Image Super-Resolution
This paper proposes a quantum computing-based algorithm to solve the single
image super-resolution (SISR) problem. One of the well-known classical
approaches for SISR relies on the well-established patch-wise sparse modeling
of the problem. Yet, this field's current state of affairs is that deep neural
networks (DNNs) have demonstrated far superior results than traditional
approaches. Nevertheless, quantum computing is expected to become increasingly
prominent for machine learning problems soon. As a result, in this work, we
take the privilege to perform an early exploration of applying a quantum
computing algorithm to this important image enhancement problem, i.e., SISR.
Among the two paradigms of quantum computing, namely universal gate quantum
computing and adiabatic quantum computing (AQC), the latter has been
successfully applied to practical computer vision problems, in which quantum
parallelism has been exploited to solve combinatorial optimization efficiently.
This work demonstrates formulating quantum SISR as a sparse coding optimization
problem, which is solved using quantum annealers accessed via the D-Wave Leap
platform. The proposed AQC-based algorithm is demonstrated to achieve improved
speed-up over a classical analog while maintaining comparable SISR accuracy.Comment: Accepted to IEEE/CVF CVPR 2023, NTIRE Challenge and Workshop. Draft
info: 10 pages, 6 Figures, 2 Table
Point cloud registration: a mini-review of current state, challenging issues and future directions
A point cloud is a set of data points in space. Point cloud registration is the process of aligning two or more 3D point clouds collected from different locations of the same scene. Registration enables point cloud data to be transformed into a common coordinate system, forming an integrated dataset representing the scene surveyed. In addition to those reliant on targets being placed in the scene before data capture, there are various registration methods available that are based on using only the point cloud data captured. Until recently, cloud-to-cloud registration methods have generally been centered upon the use of a coarse-to-fine optimization strategy. The challenges and limitations inherent in this process have shaped the development of point cloud registration and the associated software tools over the past three decades. Based on the success of deep learning methods applied to imagery data, attempts at applying these approaches to point cloud datasets have received much attention. This study reviews and comments on more recent developments in point cloud registration without using any targets and explores remaining issues, based on which recommendations on potential future studies in this topic are made
An Improved eXplainable Point Cloud Classifier (XPCC)
Classification of objects from 3D point clouds has become an increasingly relevant task across many computer vision applications. However, few studies have investigated explainable methods. In this paper, a new prototype-based and explainable classification method called eXplainable Point Cloud Classifier (XPCC) is proposed. The XPCC method offers several advantages over previous explainable and non-explainable methods. First, the XPCC method uses local densities and global multivariate generative distributions. Therefore, the XPCC provides comprehensive and interpretable object-based classification. Furthermore, the proposed method is built on recursive calculations, thus, is computationally very efficient. Second, the model learns continuously without the need for complete re-training and is domain transferable. Third, the proposed XPCC expands on the underlying learning method, xDNN, and is specific to 3D. As such, three new layers are added to the original xDNN architecture: i) the 3D point cloud feature extraction, ii) the global compound prototype weighting, and iii) the SoftMax function. Experiments were performed with the ModelNet40 benchmark which demonstrated that XPCC is the only explainable point cloud classifier to increase classification accuracy relative to the base algorithm when applied to the same problem. Additionally, this paper proposes a novel prototype-based visual representation that provides model- and object-based explanations. The prototype objects are superimposed to create a prototypical class representation of their data density within the feature space, called the Compound Prototype Cloud. They allow a user to visualize the explainable aspects of the model and identify object regions that contribute to the classification in a human-understandable way
- …