152 research outputs found

    DeepSLAM: A Robust Monocular SLAM System with Unsupervised Deep Learning

    Get PDF
    In this paper, we propose DeepSLAM, a novel unsupervised deep learning-based visual Simultaneous Localization and Mapping (SLAM) system. The DeepSLAM training is fully unsupervised since it only requires stereo imagery instead of annotating ground-truth poses. Its testing takes a monocular image sequence as the input. Therefore, it is a monocular SLAM paradigm. DeepSLAM consists of several essential components, including Mapping-Net, Tracking-Net, Loop-Net and a graph optimization unit. Specifically, the Mapping-Net is an encoder and decoder architecture for describing the 3D structure of the environment while the Tracking-Net is a Recurrent Convolutional Neural Network (RCNN) architecture for capturing the camera motion. The Loop-Net is a pre-trained binary classifier for detecting loop closures. DeepSLAM can simultaneously generate pose estimate, depth map and outlier rejection mask. We evaluate its performance on various datasets, and find that DeepSLAM achieves good performance in terms of pose estimation accuracy, and is robust in some challenging scenes

    Growing Business in Live Commerce: A Tripartite Perspective and Product Heterogeneity

    Get PDF
    Live streaming becomes an important channel helping organizations and individual sellers boost their sales. Our research takes an integrated perspective and examines the simultaneous influences of streamers-, consumers-, and products-related factors on sales volume in live commerce. We apply multiple linear regression to analyze a panel data set collected from Taobao live in Double 11, 2020, which contained 34,925 product sales records. We find that streamers’ social capital, consumers’ engagement, and products’ live demonstration all significantly contribute to product sales volume. In addition, product heterogeneity matters in live commerce such that the effects of streamers’ social capital and products’ live demonstration on sales volume work only for experience products (not for search products) and for the products with less popular brands (not for the products with popular brands). Our research offers comprehensive insights for both researchers and practitioners on how to grow business in live commerce

    HLSDataset: Open-Source Dataset for ML-Assisted FPGA Design using High Level Synthesis

    Full text link
    Machine Learning (ML) has been widely adopted in design exploration using high level synthesis (HLS) to give a better and faster performance, and resource and power estimation at very early stages for FPGA-based design. To perform prediction accurately, high-quality and large-volume datasets are required for training ML models.This paper presents a dataset for ML-assisted FPGA design using HLS, called HLSDataset. The dataset is generated from widely used HLS C benchmarks including Polybench, Machsuite, CHStone and Rossetta. The Verilog samples are generated with a variety of directives including loop unroll, loop pipeline and array partition to make sure optimized and realistic designs are covered. The total number of generated Verilog samples is nearly 9,000 per FPGA type. To demonstrate the effectiveness of our dataset, we undertake case studies to perform power estimation and resource usage estimation with ML models trained with our dataset. All the codes and dataset are public at the github repo.We believe that HLSDataset can save valuable time for researchers by avoiding the tedious process of running tools, scripting and parsing files to generate the dataset, and enable them to spend more time where it counts, that is, in training ML models.Comment: 8 pages, 5 figure

    Indoor Relocalization in Challenging Environments With Dual-Stream Convolutional Neural Networks

    Get PDF
    This paper presents an indoor relocalization system using a dual-stream convolutional neural network (CNN) with both color images and depth images as the network inputs. Aiming at the pose regression problem, a deep neural network architecture for RGB-D images is introduced, a training method by stages for the dual-stream CNN is presented, different depth image encoding methods are discussed, and a novel encoding method is proposed. By introducing the range information into the network through a dual-stream architecture, we not only improved the relocalization accuracy by about 20% compared with the state-of-the-art deep learning method for pose regression, but also greatly enhanced the system robustness in challenging scenes such as large-scale, dynamic, fast movement, and night-time environments. To the best of our knowledge, this is the first work to solve the indoor relocalization problems based on deep CNNs with RGB-D camera. The method is first evaluated on the Microsoft 7-Scenes data set to show its advantage in accuracy compared with other CNNs. Large-scale indoor relocalization is further presented using our method. The experimental results show that 0.3 m in position and 4° in orientation accuracy could be obtained. Finally, this method is evaluated on challenging indoor data sets collected from motion capture system. The results show that the relocalization performance is hardly affected by dynamic objects, motion blur, or night-time environments

    BEV-LaneDet: a Simple and Effective 3D Lane Detection Baseline

    Full text link
    3D lane detection which plays a crucial role in vehicle routing, has recently been a rapidly developing topic in autonomous driving. Previous works struggle with practicality due to their complicated spatial transformations and inflexible representations of 3D lanes. Faced with the issues, our work proposes an efficient and robust monocular 3D lane detection called BEV-LaneDet with three main contributions. First, we introduce the Virtual Camera that unifies the in/extrinsic parameters of cameras mounted on different vehicles to guarantee the consistency of the spatial relationship among cameras. It can effectively promote the learning procedure due to the unified visual space. We secondly propose a simple but efficient 3D lane representation called Key-Points Representation. This module is more suitable to represent the complicated and diverse 3D lane structures. At last, we present a light-weight and chip-friendly spatial transformation module named Spatial Transformation Pyramid to transform multiscale front-view features into BEV features. Experimental results demonstrate that our work outperforms the state-of-the-art approaches in terms of F-Score, being 10.6% higher on the OpenLane dataset and 5.9% higher on the Apollo 3D synthetic dataset, with a speed of 185 FPS. The source code will released at https://github.com/gigo-team/bev_lane_det.Comment: Accepted by CVPR202

    Anomalous Nernst effect in compensated ferrimagnetic CoxGd1-x films

    Full text link
    The anomalous Nernst effect (ANE) is one of the most intriguing thermoelectric phenomena which has attracted growing interest both for its underlying physics and potential applications. Typically, a large ANE response is observed in magnets with pronounced magnetizations or nontrivial Berry curvature. Here, we report a significant ANE signal in compensated ferrimagnetic CoxGd1-x alloy films, which exhibit vanishingly small magnetization. In particular, we found that the polarity of ANE signal is dominated by the magnetization orientation of the transition metal Co sublattices, rather than the net magnetization of CoxGd1-x films. This observation is not expected from the conventional understanding of ANE but is analogous to the anomalous Hall effect in compensated ferrimagnets. We attribute the origin of ANE and its Co-dominant property to the Co-dominant Berry curvature. Our work could trigger a more comprehensive understanding of ANE and may be useful for building energy-harvesting devices by employing ANE in compensated ferrimagnets

    Outlier Suppression+: Accurate quantization of large language models by equivalent and optimal shifting and scaling

    Full text link
    Quantization of transformer language models faces significant challenges due to the existence of detrimental outliers in activations. We observe that these outliers are asymmetric and concentrated in specific channels. To address this issue, we propose the Outlier Suppression+ framework. First, we introduce channel-wise shifting and scaling operations to eliminate asymmetric presentation and scale down problematic channels. We demonstrate that these operations can be seamlessly migrated into subsequent modules while maintaining equivalence. Second, we quantitatively analyze the optimal values for shifting and scaling, taking into account both the asymmetric property and quantization errors of weights in the next layer. Our lightweight framework can incur minimal performance degradation under static and standard post-training quantization settings. Comprehensive results across various tasks and models reveal that our approach achieves near-floating-point performance on both small models, such as BERT, and large language models (LLMs) including OPTs, BLOOM, and BLOOMZ at 8-bit and 6-bit settings. Furthermore, we establish a new state of the art for 4-bit BERT
    • …
    corecore