233 research outputs found

    Agile SoC Development with Open ESP

    Full text link
    ESP is an open-source research platform for heterogeneous SoC design. The platform combines a modular tile-based architecture with a variety of application-oriented flows for the design and optimization of accelerators. The ESP architecture is highly scalable and strikes a balance between regularity and specialization. The companion methodology raises the level of abstraction to system-level design and enables an automated flow from software and hardware development to full-system prototyping on FPGA. For application developers, ESP offers domain-specific automated solutions to synthesize new accelerators for their software and to map complex workloads onto the SoC architecture. For hardware engineers, ESP offers automated solutions to integrate their accelerator designs into the complete SoC. Conceived as a heterogeneous integration platform and tested through years of teaching at Columbia University, ESP supports the open-source hardware community by providing a flexible platform for agile SoC development.Comment: Invited Paper at the 2020 International Conference On Computer Aided Design (ICCAD) - Special Session on Opensource Tools and Platforms for Agile Development of Specialized Architecture

    IoT and Digitization Will Reconnect System Engineering and Science

    Get PDF
    The fully connected world is quickly becoming a reality. Architects and developers of this new world must understand both the hardware and software basics of IoT and IIoT systems as well as the proven way to deal with the complexities of the integration of sensors, processors, wireless connectivity, edge to cloud networks, data partitioning and processing, AI, machine language, digital threads and twins, and much more. Such complexity can only be handled with a systems-of-systems (SoS) engineering approach. But while systems engineering may hold many of the solutions to IoT challenges, systems engineering must evolve from its traditional role. Some have even suggested that the data requirements and digitization of the IoT and corresponding digital threads are putting the engineering back into systems engineering via model-based designs. This will also help reconnect system engineering to system science. This presentation will show how the IoT hardware and software technologies are changing the traditional systems engineering approach. Further, professionals that are so prepared with both the basics of IoT and systems engineering will stand a better chance of competing in the IoT space.https://pdxscholar.library.pdx.edu/systems_science_seminar_series/1078/thumbnail.jp

    NPS: A Framework for Accurate Program Sampling Using Graph Neural Network

    Full text link
    With the end of Moore's Law, there is a growing demand for rapid architectural innovations in modern processors, such as RISC-V custom extensions, to continue performance scaling. Program sampling is a crucial step in microprocessor design, as it selects representative simulation points for workload simulation. While SimPoint has been the de-facto approach for decades, its limited expressiveness with Basic Block Vector (BBV) requires time-consuming human tuning, often taking months, which impedes fast innovation and agile hardware development. This paper introduces Neural Program Sampling (NPS), a novel framework that learns execution embeddings using dynamic snapshots of a Graph Neural Network. NPS deploys AssemblyNet for embedding generation, leveraging an application's code structures and runtime states. AssemblyNet serves as NPS's graph model and neural architecture, capturing a program's behavior in aspects such as data computation, code path, and data flow. AssemblyNet is trained with a data prefetch task that predicts consecutive memory addresses. In the experiments, NPS outperforms SimPoint by up to 63%, reducing the average error by 38%. Additionally, NPS demonstrates strong robustness with increased accuracy, reducing the expensive accuracy tuning overhead. Furthermore, NPS shows higher accuracy and generality than the state-of-the-art GNN approach in code behavior learning, enabling the generation of high-quality execution embeddings

    Power Profiling Model for RISC-V Core

    Get PDF
    The reduction of power consumption is considered to be a critical factor for efficient computation of microprocessors. Therefore, it is necessary to implement a power management system that is aware of the computational load of the CPU cores. To enable such power management, this project aims to develop a power profiling model for the RISC-V core. TheSyDeKick verification environment was used to develop the power profiling models. Additionally, Python-controlled mixed mode simulations of C-programs compiled for A-Core were conducted to obtain needed data for the power profiling of the digital circuitry. The proposed methodology could employ a time-varying power consumption profiling for the A-Core RISC-V microprocessor core which depends on software, voltage, and clock frequency. The results of this project allow for the creation of parameterized power profiles for the A-Core, which can contribute to more efficient and sustainable computing

    Experimental Evaluation of On-Board Contact-Graph Routing Solutions for Future Nano-Satellite Constellations

    Get PDF
    Hardware processing performance and storage capability for nanosatellites have increased notably in recent years. Unfortunately, this progress is not observed at the same pace in transmission data rate, mostly limited by available power in reduced and constrained platforms. Thus, space-to-ground data transfer becomes the operations bottleneck of most modern space applications. As channel rates are approaching the Shannon limit, alternative solutions to manage the data transmission are on the spot. Among these, networked nano-satellite constellations can cooperatively offload data to neighboring nodes via frequent inter-satellite links (ISL) opportunities in order to augment the overall volume and reduce the end-to-end data delivery delay. Nevertheless, the computation of efficient multi-hop routes needs to consider not only present satellite and ground segments as nodes, but a non-trivial time dynamic evolution of the system dictated by orbital dynamics. Moreover, the process should properly model and rely on considerable amount of available information from node’s configuration and network status obtained from recent telemetry. Also, in most practical cases, the forwarding decision shall happen in orbit, where satellites can timely react to local or in-transit traffic demands. In this context, it is appealing to investigate on the applicability of adequate algorithmic routing approaches running on state-of-the-art nanosatellite on-board computers. In this work, we present the first implementation of Contact Graph Routing (CGR) algorithm developed by the Jet Propulsion Laboratory (JPL, NASA) for a nanosatellite on-board computer. We describe CGR, including a Dijkstra adaptation operating at its core as well as protocol aspects depicted in CCSDS Schedule-Aware Bundle Routing (SABR) recommended standard. Based on JPL’s Interplanetary Overlay Network (ION) software stack, we build a strong baseline to develop the first CGR implementation for a nano-satellites. We make our code available to the public and adapt it to the GomSpace toolchain in order to compile it for the NanoMind A712C on-board flight hardware based on a 32-bit ARM7 RISC CPU processor. Next, we evaluate its performance in terms of CPU execution time (Tick counts) and memory resources for increasingly complex satellite networks. Obtained metrics serve as compelling evidence of the polynomial scalability of the approach, matching the predicted theoretical behavior. Furthermore, we are able to determine that the evaluated hardware and implementation can cope with satellite networks of more than 120 nodes and 1200 contact opportunities

    A Modular Platform for Adaptive Heterogeneous Many-Core Architectures

    Get PDF
    Multi-/many-core heterogeneous architectures are shaping current and upcoming generations of compute-centric platforms which are widely used starting from mobile and wearable devices to high-performance cloud computing servers. Heterogeneous many-core architectures sought to achieve an order of magnitude higher energy efficiency as well as computing performance scaling by replacing homogeneous and power-hungry general-purpose processors with multiple heterogeneous compute units supporting multiple core types and domain-specific accelerators. Drifting from homogeneous architectures to complex heterogeneous systems is heavily adopted by chip designers and the silicon industry for more than a decade. Recent silicon chips are based on a heterogeneous SoC which combines a scalable number of heterogeneous processing units from different types (e.g. CPU, GPU, custom accelerator). This shifting in computing paradigm is associated with several system-level design challenges related to the integration and communication between a highly scalable number of heterogeneous compute units as well as SoC peripherals and storage units. Moreover, the increasing design complexities make the production of heterogeneous SoC chips a monopoly for only big market players due to the increasing development and design costs. Accordingly, recent initiatives towards agile hardware development open-source tools and microarchitecture aim to democratize silicon chip production for academic and commercial usage. Agile hardware development aims to reduce development costs by providing an ecosystem for open-source hardware microarchitectures and hardware design processes. Therefore, heterogeneous many-core development and customization will be relatively less complex and less time-consuming than conventional design process methods. In order to provide a modular and agile many-core development approach, this dissertation proposes a development platform for heterogeneous and self-adaptive many-core architectures consisting of a scalable number of heterogeneous tiles that maintain design regularity features while supporting heterogeneity. The proposed platform hides the integration complexities by supporting modular tile architectures for general-purpose processing cores supporting multi-instruction set architectures (multi-ISAs) and custom hardware accelerators. By leveraging field-programmable-gate-arrays (FPGAs), the self-adaptive feature of the many-core platform can be achieved by using dynamic and partial reconfiguration (DPR) techniques. This dissertation realizes the proposed modular and adaptive heterogeneous many-core platform through three main contributions. The first contribution proposes and realizes a many-core architecture for heterogeneous ISAs. It provides a modular and reusable tilebased architecture for several heterogeneous ISAs based on open-source RISC-V ISA. The modular tile-based architecture features a configurable number of processing cores with different RISC-V ISAs and different memory hierarchies. To increase the level of heterogeneity to support the integration of custom hardware accelerators, a novel hybrid memory/accelerator tile architecture is developed and realized as the second contribution. The hybrid tile is a modular and reusable tile that can be configured at run-time to operate as a scratchpad shared memory between compute tiles or as an accelerator tile hosting a local hardware accelerator logic. The hybrid tile is designed and implemented to be seamlessly integrated into the proposed tile-based platform. The third contribution deals with the self-adaptation features by providing a reconfiguration management approach to internally control the DPR process through processing cores (RISC-V based). The internal reconfiguration process relies on a novel DPR controller targeting FPGA design flow for RISC-V-based SoC to change the types and functionalities of compute tiles at run-time

    Extending a modern RISC-V vector accelerator with direct access to the memory hierarchy through AMBA 5 CHI.

    Get PDF
    El BSC està desenvolupant un accelerador vectorial desacoblat basat en RISC-V. A la versió anterior d'aquest projecte, l'accelerador utilitza Open Vector Interface (OVI) per accedir a la memòria cache L2 compartida, a través del nucli escalar del processador. Posteriorment, el nucli del processador accedeix a la memòria cache L2 compartida a través del NoC. Aquest mecanisme d'accés a la memòria en dos nivells introdueix un augment considerable en la latència dels accessos. A més, el temps d'accés a la memòria és fonamental per al rendiment de l'accelerador. Per atacar aquest problema, aquest projecte dissenya una IP que fa d'interfície per a AMBA 5 CHI i que proporciona a l'accelerador accés directe al NoC, reduint així la latència dels accessos a memòria. Aquest projecte pretén obtenir un disseny funcional amb operacions bàsiques per facilitar la fase d'integració. Per tant, el disseny no té restriccions ni d'àrea, ni d'energia. El disseny cobreix diferents aspectes del protocol AMBA 5 CHI, construeix l'arquitectura i proposa un conjunt de proves per verificar la funcionalitat del mòdul dissenyat. Els resultats finals mostren que aquesta IP pot proporcionar amb èxit a l'accelerador accés directe a la memòria cache L2 compartida, reemplaçant la interfície OVI i millorant el rendiment. També brindem suggeriments sobre com millorar encara més la interfície IP AMBA 5 CHI en termes de rendiment i funcionalitatsThe BSC is developing a decoupled RISC-V-based vector accelerator. In the previous version of this project, the vector accelerator uses the Open Vector Interface (OVI) to access the shared L2 cache, through a scalar processor core. Furthermore, the processor core accessed the shared L2 cache via the NoC. This two-level memory access mechanism introduces a significant latency overhead to the system. Moreover, memory access time is critical to the performance of the accelerator. To attack this problem, this project designs an AMBA 5 CHI interface IP that provides the accelerator with direct access to the NoC, thus reducing memory latency. This project aims to obtain a functional design with basic operations to facilitate the integration phase. Therefore, the interface IP has no specific area or power constraints. The IP design covers different AMBA 5 CHI protocol aspects, assembles the architecture, and proposes a set of tests to verify the functionality of the designed module. Final results show that this IP can successfully provide the accelerator with direct access to the shared L2 cache, replacing the OVI interface and improving performance. We also give pointers on how to further improve the AMBA 5 CHI interface IP in terms of performance and functionalities

    The Virtual Block Interface: A Flexible Alternative to the Conventional Virtual Memory Framework

    Full text link
    Computers continue to diversify with respect to system designs, emerging memory technologies, and application memory demands. Unfortunately, continually adapting the conventional virtual memory framework to each possible system configuration is challenging, and often results in performance loss or requires non-trivial workarounds. To address these challenges, we propose a new virtual memory framework, the Virtual Block Interface (VBI). We design VBI based on the key idea that delegating memory management duties to hardware can reduce the overheads and software complexity associated with virtual memory. VBI introduces a set of variable-sized virtual blocks (VBs) to applications. Each VB is a contiguous region of the globally-visible VBI address space, and an application can allocate each semantically meaningful unit of information (e.g., a data structure) in a separate VB. VBI decouples access protection from memory allocation and address translation. While the OS controls which programs have access to which VBs, dedicated hardware in the memory controller manages the physical memory allocation and address translation of the VBs. This approach enables several architectural optimizations to (1) efficiently and flexibly cater to different and increasingly diverse system configurations, and (2) eliminate key inefficiencies of conventional virtual memory. We demonstrate the benefits of VBI with two important use cases: (1) reducing the overheads of address translation (for both native execution and virtual machine environments), as VBI reduces the number of translation requests and associated memory accesses; and (2) two heterogeneous main memory architectures, where VBI increases the effectiveness of managing fast memory regions. For both cases, VBI significanttly improves performance over conventional virtual memory

    Standardisation of Practices in Open Source Hardware

    Get PDF
    Standardisation is an important component in the maturation of any field of technology. It contributes to the formation of a recognisable identity and enables interactions with a wider community. This article reviews past and current standardisation initiatives in the field of Open Source Hardware (OSH). While early initiatives focused on aspects such as licencing, intellectual property and documentation formats, recent efforts extend to ways for users to exercise their rights under open licences and to keep OSH projects discoverable and accessible online. We specifically introduce two standards that are currently being released and call for early users and contributors, the DIN SPEC 3105 and the Open Know How Manifest Specification. Finally, we reflect on challenges around standardisation in the community and relevant areas for future development such as an open tool chain, modularity and hardware specific interface standards.Comment: 9 Pages without abstract and references (else 13), no figure
    • …
    corecore