9 research outputs found

    Real-Time Fully Unsupervised Domain Adaptation for Lane Detection in Autonomous Driving

    Full text link
    While deep neural networks are being utilized heavily for autonomous driving, they need to be adapted to new unseen environmental conditions for which they were not trained. We focus on a safety critical application of lane detection, and propose a lightweight, fully unsupervised, real-time adaptation approach that only adapts the batch-normalization parameters of the model. We demonstrate that our technique can perform inference, followed by on-device adaptation, under a tight constraint of 30 FPS on Nvidia Jetson Orin. It shows similar accuracy (avg. of 92.19%) as a state-of-the-art semi-supervised adaptation algorithm but which does not support real-time adaptation.Comment: Accepted in 2023 Design, Automation & Test in Europe Conference (DATE 2023) - Late Breaking Result

    RobotPerf: An Open-Source, Vendor-Agnostic, Benchmarking Suite for Evaluating Robotics Computing System Performance

    Full text link
    We introduce RobotPerf, a vendor-agnostic benchmarking suite designed to evaluate robotics computing performance across a diverse range of hardware platforms using ROS 2 as its common baseline. The suite encompasses ROS 2 packages covering the full robotics pipeline and integrates two distinct benchmarking approaches: black-box testing, which measures performance by eliminating upper layers and replacing them with a test application, and grey-box testing, an application-specific measure that observes internal system states with minimal interference. Our benchmarking framework provides ready-to-use tools and is easily adaptable for the assessment of custom ROS 2 computational graphs. Drawing from the knowledge of leading robot architects and system architecture experts, RobotPerf establishes a standardized approach to robotics benchmarking. As an open-source initiative, RobotPerf remains committed to evolving with community input to advance the future of hardware-accelerated robotics

    QuaRL: Quantization for Sustainable Reinforcement Learning

    Full text link
    Deep reinforcement learning has achieved significant milestones, however, the computational demands of reinforcement learning training and inference remain substantial. Quantization is an effective method to reduce the computational overheads of neural networks, though in the context of reinforcement learning, it is unknown whether quantization's computational benefits outweigh the accuracy costs introduced by the corresponding quantization error. To quantify this tradeoff we perform a broad study applying quantization to reinforcement learning. We apply standard quantization techniques such as post-training quantization (PTQ) and quantization aware training (QAT) to a comprehensive set of reinforcement learning tasks (Atari, Gym), algorithms (A2C, DDPG, DQN, D4PG, PPO), and models (MLPs, CNNs) and show that policies may be quantized to 8-bits without degrading reward, enabling significant inference speedups on resource-constrained edge devices. Motivated by the effectiveness of standard quantization techniques on reinforcement learning policies, we introduce a novel quantization algorithm, \textit{ActorQ}, for quantized actor-learner distributed reinforcement learning training. By leveraging full precision optimization on the learner and quantized execution on the actors, \textit{ActorQ} enables 8-bit inference while maintaining convergence. We develop a system for quantized reinforcement learning training around \textit{ActorQ} and demonstrate end to end speedups of >> 1.5 ×\times - 2.5 ×\times over full precision training on a range of tasks (Deepmind Control Suite). Finally, we break down the various runtime costs of distributed reinforcement learning training (such as communication time, inference time, model load time, etc) and evaluate the effects of quantization on these system attributes.Comment: Equal contribution from first three authors. Updating with QuaRL for sustainable (carbon emissions) RL result

    Improving compute in-memory ECC reliability with successive correction

    No full text
    Compute in-memory (CIM) is an exciting technique that minimizes data transport, maximizes memory throughput, and performs computation on the bitline of memory sub-arrays. This is especially interesting for machine learning applications, where increased memory bandwidth and analog domain computation offer improved area and energy efficiency. Unfortunately, CIM faces new challenges traditional CMOS architectures have avoided. In this work, we explore the impact of device variation (calibrated with measured data on foundry RRAM arrays) and propose a new class of error correcting codes (ECC) for hard and soft errors in CIM. We demonstrate single, double, and triple error correction offering over 16,000× reduction in bit error rate over a design without ECC and over 427× over prior work, while consuming only 29.1% area and 26.3% power overhead. © 2022 ACM
    corecore