127 research outputs found

    SpaceCube: A NASA Family of Reconfigurable Hybrid On-Board Science Data Processors

    Get PDF
    SpaceCube is a family of Field Programmable Gate Array (FPGA) based on-board science-data processing systems developed at NASA Goddard Space Flight Center. This presentation provides an overview to the Future In-Space Operations Telecon Working Group

    NASA SpaceCube Edge TPU SmallSat Card for Autonomous Operations and Onboard Science-Data Analysis

    Get PDF
    Using state-of-the-art artificial intelligence (AI)frameworks onboard spacecraft is challenging because common spacecraft processors cannot provide comparable performance to data centers with server-grade CPUs and GPUs available for terrestrial applications and advanced deep-learning networks. This limitation makes small, low-power AI microchip architectures, such as the Google Coral Edge Tensor Processing Unit (TPU), attractive for space missions where the application-specific design enables both high-performance and power-efficient computing for AI applications. To address these challenging considerations for space deployment, this research introduces the design and capabilities of a CubeSat-sized Edge TPU-based co-processor card, known as the SpaceCube Low-power Edge Artificial Intelligence Resilient Node (SC-LEARN). This design conforms to NASA’s CubeSat Card Specification (CS2) for integration into next-generation SmallSat and CubeSat systems. This paper describes the overarching architecture and design of the SC-LEARN, as well as, the supporting test card designed for rapid prototyping and evaluation. The SC-LEARN was developed with three operational modes: (1) a high-performance parallel-processing mode,(2)a fault-tolerant mode for onboard resilience, and (3) a power-saving mode with cold spares. Importantly, this research also elaborates on both training and quantization of TensorFlow models for the SC-LEARN for use onboard with representative, open-source datasets. Lastly, we describe future research plans, including radiation-beam testing and flight demonstration

    NASA SpaceCube Intelligent Multi-Purpose System for Enabling Remote Sensing, Communication, and Navigation in Mission Architectures

    Get PDF
    New, innovative CubeSat mission concepts demand modern capabilities such as artificial intelligence and autonomy, constellation coordination, fault mitigation, and robotic servicing – all of which require vastly more processing resources than legacy systems are capable of providing. Enabling these domains within a scalable, configurable processing architecture is advantageous because it also allows for the flexibility to address varying mission roles, such as a command and data-handling system, a high-performance application processor extension, a guidance and navigation solution, or an instrument/sensor interface. This paper describes the NASA SpaceCube Intelligent Multi-Purpose System (IMPS), which allows mission developers to mix-and-match 1U (10 cm × 10 cm) CubeSat payloads configured for mission-specific needs. The central enabling component of the system architecture to address these concerns is the SpaceCube v3.0 Mini Processor. This single-board computer features the 20nm Xilinx Kintex UltraScale FPGA combined with a radiation-hardened FPGA monitor, and extensive IO to integrate and interconnect varying cards within the system. To unify the re-usable designs within this architecture, the CubeSat Card Standard was developed to guide design of 1U cards. This standard defines pinout configurations, mechanical, and electrical specifications for 1U CubeSat cards, allowing the backplane and mechanical enclosure to be easily extended. NASA has developed several cards adhering to the standard (System-on-Chip, power card, etc.), which allows the flexibility to configure a payload from a common catalog of cards

    Task-based acceleration of bidirectional recurrent neural networks on multi-core architectures

    Get PDF
    This paper proposes a novel parallel execution model for Bidirectional Recurrent Neural Networks (BRNNs), B-Par (Bidirectional-Parallelization), which exploits data and control dependencies for forward and reverse input computations. B-Par divides BRNN workloads across different parallel tasks by defining input and output dependencies for each RNN cell in both forward and reverse orders. B-Par does not require per-layer barriers to synchronize the parallel execution of BRNNs. We evaluate B-Par considering the TIDIGITS speech database and the Wikipedia data-set. Our experiments indicate that B-Par outperforms the state-of-the-art deep learning frameworks TensorFlow-Keras and Pytorch by achieving up to 2.34Ă— and 9.16Ă— speed-ups, respectively, on modern multi-core CPU architectures while preserving accuracy. Moreover, we analyze in detail aspects like task granularity, locality, or parallel efficiency to illustrate the benefits of B-Par.This work is partially supported by the Generalitat de Catalunya (contract 2017-SGR-1414) and the Spanish Ministry of Science and Technology through the PID2019- 107255GB project. Marc Casas has been supported by the Spanish Ministry of Economy, Industry and Competitiveness under the Ramon y Cajal fellowship No. RYC-2017-23269.Peer ReviewedPostprint (author's final draft

    Adaptive Intelligent Systems for Extreme Environments

    Get PDF
    As embedded processors become powerful, a growing number of embedded systems equipped with artificial intelligence (AI) algorithms have been used in radiation environments to perform routine tasks to reduce radiation risk for human workers. On the one hand, because of the low price, commercial-off-the-shelf devices and components are becoming increasingly popular to make such tasks more affordable. Meanwhile, it also presents new challenges to improve radiation tolerance, the capability to conduct multiple AI tasks and deliver the power efficiency of the embedded systems in harsh environments. There are three aspects of research work that have been completed in this thesis: 1) a fast simulation method for analysis of single event effect (SEE) in integrated circuits, 2) a self-refresh scheme to detect and correct bit-flips in random access memory (RAM), and 3) a hardware AI system with dynamic hardware accelerators and AI models for increasing flexibility and efficiency. The variances of the physical parameters in practical implementation, such as the nature of the particle, linear energy transfer and circuit characteristics, may have a large impact on the final simulation accuracy, which will significantly increase the complexity and cost in the workflow of the transistor level simulation for large-scale circuits. It makes it difficult to conduct SEE simulations for large-scale circuits. Therefore, in the first research work, a new SEE simulation scheme is proposed, to offer a fast and cost-efficient method to evaluate and compare the performance of large-scale circuits which subject to the effects of radiation particles. The advantages of transistor and hardware description language (HDL) simulations are combined here to produce accurate SEE digital error models for rapid error analysis in large-scale circuits. Under the proposed scheme, time-consuming back-end steps are skipped. The SEE analysis for large-scale circuits can be completed in just few hours. In high-radiation environments, bit-flips in RAMs can not only occur but may also be accumulated. However, the typical error mitigation methods can not handle high error rates with low hardware costs. In the second work, an adaptive scheme combined with correcting codes and refreshing techniques is proposed, to correct errors and mitigate error accumulation in extreme radiation environments. This scheme is proposed to continuously refresh the data in RAMs so that errors can not be accumulated. Furthermore, because the proposed design can share the same ports with the user module without changing the timing sequence, it thus can be easily applied to the system where the hardware modules are designed with fixed reading and writing latency. It is a challenge to implement intelligent systems with constrained hardware resources. In the third work, an adaptive hardware resource management system for multiple AI tasks in harsh environments was designed. Inspired by the “refreshing” concept in the second work, we utilise a key feature of FPGAs, partial reconfiguration, to improve the reliability and efficiency of the AI system. More importantly, this feature provides the capability to manage the hardware resources for deep learning acceleration. In the proposed design, the on-chip hardware resources are dynamically managed to improve the flexibility, performance and power efficiency of deep learning inference systems. The deep learning units provided by Xilinx are used to perform multiple AI tasks simultaneously, and the experiments show significant improvements in power efficiency for a wide range of scenarios with different workloads. To further improve the performance of the system, the concept of reconfiguration was further extended. As a result, an adaptive DL software framework was designed. This framework can provide a significant level of adaptability support for various deep learning algorithms on an FPGA-based edge computing platform. To meet the specific accuracy and latency requirements derived from the running applications and operating environments, the platform may dynamically update hardware and software (e.g., processing pipelines) to achieve better cost, power, and processing efficiency compared to the static system
    • …
    corecore