13 research outputs found

    R2-MLP: Round-Roll MLP for Multi-View 3D Object Recognition

    Full text link
    Recently, vision architectures based exclusively on multi-layer perceptrons (MLPs) have gained much attention in the computer vision community. MLP-like models achieve competitive performance on a single 2D image classification with less inductive bias without hand-crafted convolution layers. In this work, we explore the effectiveness of MLP-based architecture for the view-based 3D object recognition task. We present an MLP-based architecture termed as Round-Roll MLP (R2^2-MLP). It extends the spatial-shift MLP backbone by considering the communications between patches from different views. R2^2-MLP rolls part of the channels along the view dimension and promotes information exchange between neighboring views. We benchmark MLP results on ModelNet10 and ModelNet40 datasets with ablations in various aspects. The experimental results show that, with a conceptually simple structure, our R2^2-MLP achieves competitive performance compared with existing state-of-the-art methods

    XC: Exploring Quantitative Use Cases for Explanations in 3D Object Detection

    Get PDF
    Explainable AI (XAI) methods are frequently applied to obtain qualitative insights about deep models' predictions. However, such insights need to be interpreted by a human observer to be useful. In this thesis, we aim to use explanations directly to make decisions without human observers. We adopt two gradient-based explanation methods, Integrated Gradients (IG) and backprop, for the task of 3D object detection. Then, we propose a set of quantitative measures, named Explanation Concentration (XC) scores, that can be used for downstream tasks. These scores quantify the concentration of attributions within the boundaries of detected objects. We evaluate the effectiveness of XC scores via the task of distinguishing true positive (TP) and false positive (FP) detected objects in the KITTI and Waymo datasets. The results demonstrate improvement of more than 100\% on both datasets compared to other heuristics such as random guesses and number of LiDAR points in bounding box, raising confidence in XC's potential for application in more use cases. Our results also indicate that computationally expensive XAI methods like IG may not be more valuable when used quantitatively compared to simpler methods. Moreover, we apply loss terms based on XC and pixel attribution prior (PAP), which is another qualitative measure for attributions, to the task of training a 3D object detection model. We show that performance boost is possible as long as we select the right subset of predictions for which the attribution-based losses are applied
    corecore