93 research outputs found

    LO-Net: Deep Real-time Lidar Odometry

    Full text link
    We present a novel deep convolutional network pipeline, LO-Net, for real-time lidar odometry estimation. Unlike most existing lidar odometry (LO) estimations that go through individually designed feature selection, feature matching, and pose estimation pipeline, LO-Net can be trained in an end-to-end manner. With a new mask-weighted geometric constraint loss, LO-Net can effectively learn feature representation for LO estimation, and can implicitly exploit the sequential dependencies and dynamics in the data. We also design a scan-to-map module, which uses the geometric and semantic information learned in LO-Net, to improve the estimation accuracy. Experiments on benchmark datasets demonstrate that LO-Net outperforms existing learning based approaches and has similar accuracy with the state-of-the-art geometry-based approach, LOAM

    Point2Node: Correlation Learning of Dynamic-Node for Point Cloud Feature Modeling

    Full text link
    Fully exploring correlation among points in point clouds is essential for their feature modeling. This paper presents a novel end-to-end graph model, named Point2Node, to represent a given point cloud. Point2Node can dynamically explore correlation among all graph nodes from different levels, and adaptively aggregate the learned features. Specifically, first, to fully explore the spatial correlation among points for enhanced feature description, in a high-dimensional node graph, we dynamically integrate the node's correlation with self, local, and non-local nodes. Second, to more effectively integrate learned features, we design a data-aware gate mechanism to self-adaptively aggregate features at the channel level. Extensive experiments on various point cloud benchmarks demonstrate that our method outperforms the state-of-the-art.Comment: AAAI2020(oral

    Long-MIL: Scaling Long Contextual Multiple Instance Learning for Histopathology Whole Slide Image Analysis

    Full text link
    Histopathology image analysis is the golden standard of clinical diagnosis for Cancers. In doctors daily routine and computer-aided diagnosis, the Whole Slide Image (WSI) of histopathology tissue is used for analysis. Because of the extremely large scale of resolution, previous methods generally divide the WSI into a large number of patches, then aggregate all patches within a WSI by Multi-Instance Learning (MIL) to make the slide-level prediction when developing computer-aided diagnosis tools. However, most previous WSI-MIL models using global-attention without pairwise interaction and any positional information, or self-attention with absolute position embedding can not well handle shape varying large WSIs, e.g. testing WSIs after model deployment may be larger than training WSIs, since the model development set is always limited due to the difficulty of histopathology WSIs collection. To deal with the problem, in this paper, we propose to amend position embedding for shape varying long-contextual WSI by introducing Linear Bias into Attention, and adapt it from 1-d long sequence into 2-d long-contextual WSI which helps model extrapolate position embedding to unseen or under-fitted positions. We further utilize Flash-Attention module to tackle the computational complexity of Transformer, which also keep full self-attention performance compared to previous attention approximation work. Our method, Long-contextual MIL (Long-MIL) are evaluated on extensive experiments including 4 dataset including WSI classification and survival prediction tasks to validate the superiority on shape varying WSIs. The source code will be open-accessed soon

    RF-Net: An End-to-End Image Matching Network based on Receptive Field

    Full text link
    This paper proposes a new end-to-end trainable matching network based on receptive field, RF-Net, to compute sparse correspondence between images. Building end-to-end trainable matching framework is desirable and challenging. The very recent approach, LF-Net, successfully embeds the entire feature extraction pipeline into a jointly trainable pipeline, and produces the state-of-the-art matching results. This paper introduces two modifications to the structure of LF-Net. First, we propose to construct receptive feature maps, which lead to more effective keypoint detection. Second, we introduce a general loss function term, neighbor mask, to facilitate training patch selection. This results in improved stability in descriptor training. We trained RF-Net on the open dataset HPatches, and compared it with other methods on multiple benchmark datasets. Experiments show that RF-Net outperforms existing state-of-the-art methods.Comment: 9 pages, 6 figure

    Retrieval-augmented Generation to Improve Math Question-Answering: Trade-offs Between Groundedness and Human Preference

    Full text link
    For middle-school math students, interactive question-answering (QA) with tutors is an effective way to learn. The flexibility and emergent capabilities of generative large language models (LLMs) has led to a surge of interest in automating portions of the tutoring process - including interactive QA to support conceptual discussion of mathematical concepts. However, LLM responses to math questions can be incorrect or mismatched to the educational context - such as being misaligned with a school's curriculum. One potential solution is retrieval-augmented generation (RAG), which involves incorporating a vetted external knowledge source in the LLM prompt to increase response quality. In this paper, we designed prompts that retrieve and use content from a high-quality open-source math textbook to generate responses to real student questions. We evaluate the efficacy of this RAG system for middle-school algebra and geometry QA by administering a multi-condition survey, finding that humans prefer responses generated using RAG, but not when responses are too grounded in the textbook content. We argue that while RAG is able to improve response quality, designers of math QA systems must consider trade-offs between generating responses preferred by students and responses closely matched to specific educational resources.Comment: 6 pages, presented at NeurIPS'23 Workshop on Generative AI for Education (GAIED

    Semantic Labeling of Mobile LiDAR Point Clouds via Active Learning and Higher Order MRF

    Get PDF
    【Abstract】Using mobile Light Detection and Ranging point clouds to accomplish road scene labeling tasks shows promise for a variety of applications. Most existing methods for semantic labeling of point clouds require a huge number of fully supervised point cloud scenes, where each point needs to be manually annotated with a specific category. Manually annotating each point in point cloud scenes is labor intensive and hinders practical usage of those methods. To alleviate such a huge burden of manual annotation, in this paper, we introduce an active learning method that avoids annotating the whole point cloud scenes by iteratively annotating a small portion of unlabeled supervoxels and creating a minimal manually annotated training set. In order to avoid the biased sampling existing in traditional active learning methods, a neighbor-consistency prior is exploited to select the potentially misclassified samples into the training set to improve the accuracy of the statistical model. Furthermore, lots of methods only consider short-range contextual information to conduct semantic labeling tasks, but ignore the long-range contexts among local variables. In this paper, we use a higher order Markov random field model to take into account more contexts for refining the labeling results, despite of lacking fully supervised scenes. Evaluations on three data sets show that our proposed framework achieves a high accuracy in labeling point clouds although only a small portion of labels is provided. Moreover, comparative experiments demonstrate that our proposed framework is superior to traditional sampling methods and exhibits comparable performance to those fully supervised models.10.13039/501100001809-National Natural Science Foundation of China; Collaborative Innovation Center of Haixi Government Affairs Big Data Sharin

    Pairwise registration of TLS point clouds by deep multi-scale local features

    Get PDF
    Abstract(#br)Because of the mechanism of TLS system, noise, outliers, various occlusions, varying cloud densities, etc. inevitably exist in the collection of TLS point clouds. To achieve automatic TLS point cloud registration, many methods, based on the hand-crafted features of keypoints, have been proposed. Despite significant progress, the current methods still face great challenges in accomplishing TLS point cloud registration. In this paper, we propose a multi-scale neural network to learn local shape descriptors for establishing correspondences between pairwise TLS point clouds. To train our model, data augmentation, developed on pairwise semi-synthetic 3D local patches, is to extend our network to be robust to rotation transformation. Then, based on varying local neighborhoods, multi-scale subnetworks are constructed and fused to learn robust local features. Experimental results demonstrate that our proposed method successfully registers two TLS point clouds and outperforms state-of-the-art methods. Besides, our learned descriptors are invariant to translation and tolerant to changes in rotation
    corecore