412 research outputs found

    What is a CGRA?

    Get PDF

    What is a CGRA?

    Get PDF

    How to train accurate BNNs for embedded systems?

    Full text link
    A key enabler of deploying convolutional neural networks on resource-constrained embedded systems is the binary neural network (BNN). BNNs save on memory and simplify computation by binarizing both features and weights. Unfortunately, binarization is inevitably accompanied by a severe decrease in accuracy. To reduce the accuracy gap between binary and full-precision networks, many repair methods have been proposed in the recent past, which we have classified and put into a single overview in this chapter. The repair methods are divided into two main branches, training techniques and network topology changes, which can further be split into smaller categories. The latter category introduces additional cost (energy consumption or additional area) for an embedded system, while the former does not. From our overview, we observe that progress has been made in reducing the accuracy gap, but BNN papers are not aligned on what repair methods should be used to get highly accurate BNNs. Therefore, this chapter contains an empirical review that evaluates the benefits of many repair methods in isolation over the ResNet-20\&CIFAR10 and ResNet-18\&CIFAR100 benchmarks. We found three repair categories most beneficial: feature binarizer, feature normalization, and double residual. Based on this review we discuss future directions and research opportunities. We sketch the benefit and costs associated with BNNs on embedded systems because it remains to be seen whether BNNs will be able to close the accuracy gap while staying highly energy-efficient on resource-constrained embedded systems

    How Flexible is Your Computing System?

    Get PDF
    In literature, computer architectures are frequently claimed to be highly flexible, typically implying the existence of trade-offs between flexibility and performance or energy efficiency. Processor flexibility, however, is not very sharply defined, and consequently these claims cannot be validated, nor can such hypothetical relations be fully understood and exploited in the design of computing systems. This paper is an attempt to introduce scientific rigour to the notion of flexibility in computing systems. A survey is conducted to provide an overview of references to flexibility in literature, both in the computer architecture domain, as well as related fields. A classification is introduced to categorize different views on flexibility, which ultimately form the foundation for a qualitative definition of flexibility. Departing from the qualitative definition of flexibility, a generic quantifiable metric is proposed, enabling valid quantitative comparison of the flexibility of various architectures. To validate the proposed method, and evaluate the relation between the proposed metric and the general notion of flexibility, the flexibility metric is measured for 25 computing systems, including CPUs, GPUs, DSPs, and FPGAs, and 40 ASIPs taken from literature. The obtained results provide insights into some of the speculative trade-offs between flexibility and properties such as energy efficiency and area efficiency. Overall the proposed quantitative flexibility metric shows to be commensurate with some generally accepted qualitative notions of flexibility collected in the survey, although some surprising discrepancies can also be observed. The proposed metric and the obtained results are placed into context of the state of the art on compute flexibility, and extensive reflection provides not only a complete overview of the field, but also discusses possible alternative approaches and open issues. Note that this work does not aim to provide a final answer to the definition of flexibility, but rather provides a framework to initiate a broader discussion in the computer architecture society on defining, understanding, and ultimately taking advantage of flexibility.</p

    CGRA-EAM - Rapid Energy and Area Estimation for Coarse-grained Reconfigurable Architectures

    Get PDF
    Reconfigurable architectures are quickly gaining in popularity due to their flexibility and ability to provide high energy efficiency. However, reconfigurable systems allow for a huge design space. Iterative design space exploration (DSE) is often required to achieve good Pareto points with respect to some combination of performance, area, and/or energy. DSE tools depend on information about hardware characteristics in these aspects. These characteristics can be obtained from hardware synthesis and net-list simulation, but this is very time-consuming. Therefore, architecture models are common. This work introduces CGRA-EAM (Coarse-Grained Reconfigurable Architecture - Energy &amp; Area Model), a model for energy and area estimation framework for coarse-grained reconfigurable architectures. The model is evaluated for the Blocks CGRA. The results demonstrate that the mean absolute percentage error is 15.5% and 2.1% for energy and area, respectively, while the model achieves a speedup of close to three orders of magnitude compared to synthesis.</p

    Delay Prediction for ASIC HLS:Comparing Graph-Based and Nongraph-Based Learning Models

    Get PDF
    While high-level synthesis (HLS) tools offer faster design of hardware accelerators with different area versus delay tradeoffs, HLS-based delay estimates often deviate significantly from results obtained from ASIC logic synthesis (LS) tools. Current HLS tools rely on simple additive delay models which fail to capture the downstream optimizations performed during LS and technology mapping. Inaccurate delay estimates prevent fast and accurate design-space exploration without performing time-consuming LS tasks. In this work, we exploit different machine learning models which automatically learn to map the different downstream optimizations onto the HLS critical paths. In particular, we compare graph-based and nongraph-based learning models to investigate their efficacy, devise hybrid models to get the best of the both worlds. To carry out our learning-assisted methodology, we create a dataset of different HLS benchmarks and develop an automated framework, which extends a commercial HLS toolchain, to extract essential information from LS critical path and automatically matches this information to HLS path. This is a nontrivial task to perform manually due to difference in level of abstractions. Finally, we train the proposed hybrid models through inductive learning and integrate them in the commercial HLS toolchain to improve delay prediction accuracy. Experimental results demonstrate significant improvements in delay estimation accuracy across a wide variety of benchmark designs. We demonstrate that the graph-based models can infer essential structural features from the input design, while incorporating them into traditional nongraph-based models can significantly improve the model accuracy. Such 'hybrid' models can improve delay prediction accuracy by 93% compared to simple additive models and provide 175× speedup compared to LS. Furthermore, we discuss key insights from our experiments, identifying the influence of different HLS features on model performance.</p

    Memory and Parallelism Analysis Using a Platform-Independent Approach

    Full text link
    Emerging computing architectures such as near-memory computing (NMC) promise improved performance for applications by reducing the data movement between CPU and memory. However, detecting such applications is not a trivial task. In this ongoing work, we extend the state-of-the-art platform-independent software analysis tool with NMC related metrics such as memory entropy, spatial locality, data-level, and basic-block-level parallelism. These metrics help to identify the applications more suitable for NMC architectures.Comment: 22nd ACM International Workshop on Software and Compilers for Embedded Systems (SCOPES '19), May 201

    MTTR reduction of FPGA scrubbing:Exploring SEU sensitivity

    Get PDF
    SRAM-based FPGAs are widely used for developing many critical systems due to their huge capacity, flexibility, and high performance. However, their applicability in the presence of the Single Event Upsets (SEUs) should be ensured using mitigation techniques for specific critical systems e.g. space applications or automotive industry. Among all existing mitigation techniques, the scrubbing scheme is considered the most reliable technique that particularly avoids SEU accumulation in Configuration Memory (CM) that is the most SEU vulnerable component in an SRAM-based FPGAs. In spite of that, the error repair time realized by Mean Time to Repair (MTTR) attained by scrubbing is a pressing concern for real-time systems. To reduce MTTR, the impact of SEU in CM bits on correct circuit operation is considered in state-of-the-art scrubbing methods. In this paper, we examine the MTTR reduction taking into account multiple proposed precision levels to identify CM sensitive bits; i.e. bits have an adverse impact on correct circuit operation when infected by SEU. Two scrubbing methods are proposed based on these precision levels. Experimental results show an average of 20% \45% \46.5% MTTR reduction if the proposed precision-aware scrubbing methods taking into account sensitive bits recognized with low \medium \high precision compared to with very low precision. Experiments also show that higher MTTR reduction was achievable for non uniform structure like FFT (i.e about 68%). Thus the cost of distinguishing sensitive bits with higher precision is worth only when the circuit has a non-uniform structure in CM.</p
    • …
    corecore