6,838 research outputs found

    Pyramid Semantic Graph-based Global Point Cloud Registration with Low Overlap

    Full text link
    Global point cloud registration is essential in many robotics tasks like loop closing and relocalization. Unfortunately, the registration often suffers from the low overlap between point clouds, a frequent occurrence in practical applications due to occlusion and viewpoint change. In this paper, we propose a graph-theoretic framework to address the problem of global point cloud registration with low overlap. To this end, we construct a consistency graph to facilitate robust data association and employ graduated non-convexity (GNC) for reliable pose estimation, following the state-of-the-art (SoTA) methods. Unlike previous approaches, we use semantic cues to scale down the dense point clouds, thus reducing the problem size. Moreover, we address the ambiguity arising from the consistency threshold by constructing a pyramid graph with multi-level consistency thresholds. Then we propose a cascaded gradient ascend method to solve the resulting densest clique problem and obtain multiple pose candidates for every consistency threshold. Finally, fast geometric verification is employed to select the optimal estimation from multiple pose candidates. Our experiments, conducted on a self-collected indoor dataset and the public KITTI dataset, demonstrate that our method achieves the highest success rate despite the low overlap of point clouds and low semantic quality. We have open-sourced our code https://github.com/HKUST-Aerial-Robotics/Pagor for this project.Comment: Accepted by IROS202

    Introduction to Facial Micro Expressions Analysis Using Color and Depth Images: A Matlab Coding Approach (Second Edition, 2023)

    Full text link
    The book attempts to introduce a gentle introduction to the field of Facial Micro Expressions Recognition (FMER) using Color and Depth images, with the aid of MATLAB programming environment. FMER is a subset of image processing and it is a multidisciplinary topic to analysis. So, it requires familiarity with other topics of Artifactual Intelligence (AI) such as machine learning, digital image processing, psychology and more. So, it is a great opportunity to write a book which covers all of these topics for beginner to professional readers in the field of AI and even without having background of AI. Our goal is to provide a standalone introduction in the field of MFER analysis in the form of theorical descriptions for readers with no background in image processing with reproducible Matlab practical examples. Also, we describe any basic definitions for FMER analysis and MATLAB library which is used in the text, that helps final reader to apply the experiments in the real-world applications. We believe that this book is suitable for students, researchers, and professionals alike, who need to develop practical skills, along with a basic understanding of the field. We expect that, after reading this book, the reader feels comfortable with different key stages such as color and depth image processing, color and depth image representation, classification, machine learning, facial micro-expressions recognition, feature extraction and dimensionality reduction. The book attempts to introduce a gentle introduction to the field of Facial Micro Expressions Recognition (FMER) using Color and Depth images, with the aid of MATLAB programming environment.Comment: This is the second edition of the boo

    Object Segmentation and Reconstruction Using Infrastructure Sensor Nodes for Autonomous Mobility

    Get PDF
    This thesis focuses on the Lidar point cloud processing for the infrastructure sensor node that serves as the perception system for autonomous robots with general mobility in indoor applications. Compared with typical schemes mounting sensors on the robots, the method acquires data from infrastructure sensor nodes, providing a more comprehensive view of the environment, which benefits the robot's navigation. The number of sensors would not need to be increased even for multiple robots, significantly reducing costs. In addition, with a central perception system using the infrastructure sensor nodes navigating every robot, a more comprehensive understanding of the current environment and all the robots' locations can be obtained for the control and operation of the autonomous robots. For a robot in the detection range of the sensor node, the sensor node can detect and segment obstacles in its driveable area and reconstruct the incomplete, sparse point cloud of objects upon their movement. The complete shape by the reconstruction benefits the localization and path planning which follows the perception part of the robot's system. Considering the sparse Lidar data and the variety of object categories in the environment, a model-free scheme is selected for object segmentation. Point segmentation starts with background filtering. Considering the complexity of the indoor environment, a depth-matching-based background removal approach is first proposed. However, later tests imply that the method is adequate but not time-efficient. Therefore, based on the depth matching-based method, a process that only focuses on the drive-able area of the robot is proposed, and the computational complexity is significantly reduced. With optimization, the computation time for processing one frame of data can be greatly increased, from 0.2 second by the first approach to 0.01 second by the second approach. After background filtering, the remaining points for occurring objects are segmented as separate clusters using an object clustering algorithm. With independent clusters of objects, an object tracking algorithm is followed to allocate the point clusters with IDs and arrange the clusters in a time sequence. With a stream of clusters for a specific object in a time sequence, point registration is deployed to aggregate the clusters into a complete shape. And as noticed during the experiment, one of the differences between indoor and outdoor environments is that contact between objects in the indoor environment is much more common. The objects in contact are likely to be segmented as a single cluster by the model-free clustering algorithm, which needs to be avoided in the reconstruction process. Therefore an improvement is made in the tracking algorithm when contact happens. The algorithms in this thesis have been experimentally evaluated and presented

    3D-SeqMOS: A Novel Sequential 3D Moving Object Segmentation in Autonomous Driving

    Full text link
    For the SLAM system in robotics and autonomous driving, the accuracy of front-end odometry and back-end loop-closure detection determine the whole intelligent system performance. But the LiDAR-SLAM could be disturbed by current scene moving objects, resulting in drift errors and even loop-closure failure. Thus, the ability to detect and segment moving objects is essential for high-precision positioning and building a consistent map. In this paper, we address the problem of moving object segmentation from 3D LiDAR scans to improve the odometry and loop-closure accuracy of SLAM. We propose a novel 3D Sequential Moving-Object-Segmentation (3D-SeqMOS) method that can accurately segment the scene into moving and static objects, such as moving and static cars. Different from the existing projected-image method, we process the raw 3D point cloud and build a 3D convolution neural network for MOS task. In addition, to make full use of the spatio-temporal information of point cloud, we propose a point cloud residual mechanism using the spatial features of current scan and the temporal features of previous residual scans. Besides, we build a complete SLAM framework to verify the effectiveness and accuracy of 3D-SeqMOS. Experiments on SemanticKITTI dataset show that our proposed 3D-SeqMOS method can effectively detect moving objects and improve the accuracy of LiDAR odometry and loop-closure detection. The test results show our 3D-SeqMOS outperforms the state-of-the-art method by 12.4%. We extend the proposed method to the SemanticKITTI: Moving Object Segmentation competition and achieve the 2nd in the leaderboard, showing its effectiveness

    Visually Adversarial Attacks and Defenses in the Physical World: A Survey

    Full text link
    Although Deep Neural Networks (DNNs) have been widely applied in various real-world scenarios, they are vulnerable to adversarial examples. The current adversarial attacks in computer vision can be divided into digital attacks and physical attacks according to their different attack forms. Compared with digital attacks, which generate perturbations in the digital pixels, physical attacks are more practical in the real world. Owing to the serious security problem caused by physically adversarial examples, many works have been proposed to evaluate the physically adversarial robustness of DNNs in the past years. In this paper, we summarize a survey versus the current physically adversarial attacks and physically adversarial defenses in computer vision. To establish a taxonomy, we organize the current physical attacks from attack tasks, attack forms, and attack methods, respectively. Thus, readers can have a systematic knowledge of this topic from different aspects. For the physical defenses, we establish the taxonomy from pre-processing, in-processing, and post-processing for the DNN models to achieve full coverage of the adversarial defenses. Based on the above survey, we finally discuss the challenges of this research field and further outlook on the future direction

    STELLAR: A LARGE SATELLITE STEREO DATASET FOR DIGITAL SURFACE MODEL GENERATION

    Get PDF
    Stellar is a large, satellite stereo dataset. It contains rectified stereo pairs of the terrain captured by the satellite image sensors and corresponding true disparity maps and semantic segmentation. Unlike stereo vision in autonomous driving and mobile imaging, a satellite stereo pair is not captured simultaneously. Thus, the same object in a satellite stereo pair is more likely to have a varied visual appearance. Stellar provides flexible access to such stereo pairs to train methods to be robust to such appearance variation. We use publicly available data sources, and invented several techniques to perform data registration, rectification, and semantic segmentation on the data to build Stellar. In our preliminary experiment, we fine-tuned two deep-learning stereo methods on Stellar. The result demonstrates that most of the time, these methods generate denser and more accurate disparity maps for satellite stereo by fine-tuning on Stellar, compared to without fine-tuning on satellite stereo datasets, or fine-tuning on previous, smaller satellite stereo datasets. Stellar is available to download at https://github.com/guo-research-group/Stellar

    Deep Learning for Scene Flow Estimation on Point Clouds: A Survey and Prospective Trends

    Get PDF
    Aiming at obtaining structural information and 3D motion of dynamic scenes, scene flow estimation has been an interest of research in computer vision and computer graphics for a long time. It is also a fundamental task for various applications such as autonomous driving. Compared to previous methods that utilize image representations, many recent researches build upon the power of deep analysis and focus on point clouds representation to conduct 3D flow estimation. This paper comprehensively reviews the pioneering literature in scene flow estimation based on point clouds. Meanwhile, it delves into detail in learning paradigms and presents insightful comparisons between the state-of-the-art methods using deep learning for scene flow estimation. Furthermore, this paper investigates various higher-level scene understanding tasks, including object tracking, motion segmentation, etc. and concludes with an overview of foreseeable research trends for scene flow estimation

    Quantum Annealing for Single Image Super-Resolution

    Full text link
    This paper proposes a quantum computing-based algorithm to solve the single image super-resolution (SISR) problem. One of the well-known classical approaches for SISR relies on the well-established patch-wise sparse modeling of the problem. Yet, this field's current state of affairs is that deep neural networks (DNNs) have demonstrated far superior results than traditional approaches. Nevertheless, quantum computing is expected to become increasingly prominent for machine learning problems soon. As a result, in this work, we take the privilege to perform an early exploration of applying a quantum computing algorithm to this important image enhancement problem, i.e., SISR. Among the two paradigms of quantum computing, namely universal gate quantum computing and adiabatic quantum computing (AQC), the latter has been successfully applied to practical computer vision problems, in which quantum parallelism has been exploited to solve combinatorial optimization efficiently. This work demonstrates formulating quantum SISR as a sparse coding optimization problem, which is solved using quantum annealers accessed via the D-Wave Leap platform. The proposed AQC-based algorithm is demonstrated to achieve improved speed-up over a classical analog while maintaining comparable SISR accuracy.Comment: Accepted to IEEE/CVF CVPR 2023, NTIRE Challenge and Workshop. Draft info: 10 pages, 6 Figures, 2 Table

    Point cloud registration: a mini-review of current state, challenging issues and future directions

    Get PDF
    A point cloud is a set of data points in space. Point cloud registration is the process of aligning two or more 3D point clouds collected from different locations of the same scene. Registration enables point cloud data to be transformed into a common coordinate system, forming an integrated dataset representing the scene surveyed. In addition to those reliant on targets being placed in the scene before data capture, there are various registration methods available that are based on using only the point cloud data captured. Until recently, cloud-to-cloud registration methods have generally been centered upon the use of a coarse-to-fine optimization strategy. The challenges and limitations inherent in this process have shaped the development of point cloud registration and the associated software tools over the past three decades. Based on the success of deep learning methods applied to imagery data, attempts at applying these approaches to point cloud datasets have received much attention. This study reviews and comments on more recent developments in point cloud registration without using any targets and explores remaining issues, based on which recommendations on potential future studies in this topic are made

    An Improved eXplainable Point Cloud Classifier (XPCC)

    Get PDF
    Classification of objects from 3D point clouds has become an increasingly relevant task across many computer vision applications. However, few studies have investigated explainable methods. In this paper, a new prototype-based and explainable classification method called eXplainable Point Cloud Classifier (XPCC) is proposed. The XPCC method offers several advantages over previous explainable and non-explainable methods. First, the XPCC method uses local densities and global multivariate generative distributions. Therefore, the XPCC provides comprehensive and interpretable object-based classification. Furthermore, the proposed method is built on recursive calculations, thus, is computationally very efficient. Second, the model learns continuously without the need for complete re-training and is domain transferable. Third, the proposed XPCC expands on the underlying learning method, xDNN, and is specific to 3D. As such, three new layers are added to the original xDNN architecture: i) the 3D point cloud feature extraction, ii) the global compound prototype weighting, and iii) the SoftMax function. Experiments were performed with the ModelNet40 benchmark which demonstrated that XPCC is the only explainable point cloud classifier to increase classification accuracy relative to the base algorithm when applied to the same problem. Additionally, this paper proposes a novel prototype-based visual representation that provides model- and object-based explanations. The prototype objects are superimposed to create a prototypical class representation of their data density within the feature space, called the Compound Prototype Cloud. They allow a user to visualize the explainable aspects of the model and identify object regions that contribute to the classification in a human-understandable way
    • …
    corecore