40 research outputs found
Building Faithful High-level Models and Performance Evaluation of Manycore Embedded Systems
International audiencePerformance and functional correctness are key for successful design of modern embedded systems. Both aspects must be considered early in the design process to enable founded decision making towards final implementation. Nonetheless, building abstract system-level models that faithfully capture performance information along to functional behavior is a challenging task. In contrast to functional aspects, performance details are rarely available during early design phases and no clear method is known to characterize them. Moreover, once such system-level models are built they are inherently complex as they usually mix software models, hardware architecture constraints and environment abstractions. Their analysis by using traditional performance evaluation methods is reaching the limits and the need for more scalable and accurate techniques is becoming urgent. In this paper, we introduce a systematic method for building stochastic abstract performance models using statistical inference and model calibration and we propose statistical model checking as performance evaluation technique upon the obtained models. We experimented our method on a real-life case study. We were able to verify different timing properties
Energy Management via PI Control for Data Parallel Applications with Throughput Constraints
International audienceThis paper presents a new proportional-integral (PI) controller that sets the operating point of computing tiles in a system on chip (SoC). We address data-parallel applications with throughput constraints. The controller settings are investigated for application configurations with different QoS levels and different buffer sizes. The control method is evaluated on a test chip with four tiles executing a realistic HMAX object recognition application. Experimental results suggest that the proposed controller outperforms the state-of-the-art results: it attains, on average, 25% less number of frequency switches and has slightly higher energy savings. The reduction in number of frequency switches is important because it decreases the involved overhead. In addition, the PI controller meets the throughput constraint in cases where other approaches fail
Composable local memory organisation for streaming applications on embedded MPSoCs
Multi-Processor Systems on a Chip (MPSoCs) are suitable platforms for the implementation of complex embedded applications. An MPSoC is composable if the functional and temporal behaviour of each application is independent of the absence or presence of other applications. Composability is required for application design and analysis in isolation, and integration with linear effort. In this paper we propose a composable organisation for the top level of a memory hierarchy. This organisation preserves the short (one cycle) access time desirable for a processor's frequent local accesses and enables the predictability demanded by real-time applications. We partition the local memory in two blocks, one private, for local tile data, and another shared for inter-tile data communication. To avoid application interference, we instantiate one such shared local memory block and an Remote Direct Memory Access (RDMA) for each application running on the processor. We implement this organisation on an MPSoC with two processors on an FPGA. On this platform we execute a composition of applications consisting of a JPEG decoder, and a synthetic application. Our experiments indicate that an application's timing is not affected by the behaviour of another application, thus composability is achieved. Moreover, the utilisation of the RDMA component leads to 45% performance increase on average for a number of workloads covering a large range of communication/computation ratios
A methodology for the design of dynamic accuracy operators by runtime back bias
Mobile and IoT applications must balance increasing processing demands with limited power and cost budgets. Approximate computing achieves this goal leveraging the error tolerance features common in many emerging applications to reduce power consumption. In particular, adequate (i.e., energy/quality-configurable) hardware operators are key components in an error tolerant system. Existing implementations of these operators require significant architectural modifications, hence they are often design-specific and tend to have large overheads compared to accurate units. In this paper, we propose a methodology to design adequate data-path operators in an automatic way, which uses threshold voltage scaling as a knob to dynamically control the power/accuracy tradeoff. The method overcomes the limitations of previous solutions based on supply voltage scaling, in that it introduces lower overheads and it allows fine-grain regulation of this tradeoff. We demonstrate our approach on a state-of-the-art 28nm FDSOI technology, exploiting the strong effect of back biasing on threshold voltage. Results show a power consumption reduction of as much as 39% compared to solutions based only on supply voltage scaling, at iso-accuracy
Scaling-up Memristor Monte Carlo with magnetic domain-wall physics
By exploiting the intrinsic random nature of nanoscale devices, Memristor
Monte Carlo (MMC) is a promising enabler of edge learning systems. However, due
to multiple algorithmic and device-level limitations, existing demonstrations
have been restricted to very small neural network models and datasets. We
discuss these limitations, and describe how they can be overcome, by mapping
the stochastic gradient Langevin dynamics (SGLD) algorithm onto the physics of
magnetic domain-wall Memristors to scale-up MMC models by five orders of
magnitude. We propose the push-pull pulse programming method that realises SGLD
in-physics, and use it to train a domain-wall based ResNet18 on the CIFAR-10
dataset. On this task, we observe no performance degradation relative to a
floating point model down to an update precision of between 6 and 7-bits,
indicating we have made a step towards a large-scale edge learning system
leveraging noisy analogue devices.Comment: Presented at the 1st workshop on Machine Learning with New Compute
Paradigms (MLNCP) at NeurIPS 2023 (New Orleans, USA
The transprecision computing paradigm: Concept, design, and applications
Guaranteed numerical precision of each elementary step in a complex computation has been the mainstay of traditional computing systems for many years. This era, fueled by Moore’s law and the constant exponential improvement in computing efficiency, is at its twilight: from tiny nodes of the Internet-of-Things, to large HPC computing centers, subpicoJoule/operation energy efficiency is essential for practical realizations. To overcome the power wall, a shift from traditional computing paradigms is now mandatory. In this paper we present the driving motivations, roadmap, and expected impact of the European project OPRECOMP. OPRECOMP aims to (i) develop the first complete transprecision computing framework, (ii) apply it to a wide range of hardware platforms, from the sub-milliWatt up to the MegaWatt range, and (iii) demonstrate impact in a wide range of computational domains, spanning IoT, Big Data Analytics, Deep Learning, and HPC simulations. By combining together into a seamless design transprecision advances in devices, circuits, software tools, and algorithms, we expect to achieve major energy efficiency improvements, even when there is no freedom to relax end-to-end application quality of results. Indeed, OPRECOMP aims at demolishing the ultraconservative “precise” computing abstraction, replacing it with a more flexible and efficient one, namely transprecision computing
Model-implementation fidelity in cyber physical system design
This book puts in focus various techniques for checking modeling fidelity of Cyber Physical Systems (CPS), with respect to the physical world they represent. The authors' present modeling and analysis techniques representing different communities, from very different angles, discuss their possible interactions, and discuss the commonalities and differences between their practices. Coverage includes model driven development, resource-driven development, statistical analysis, proofs of simulator implementation, compiler construction, power/temperature modeling of digital devices, high-level performance analysis, and code/device certification. Several industrial contexts are covered, including modeling of computing and communication, proof architectures models and statistical based validation techniques. Addresses CPS design problems such as cross-application interference, parsimonious modeling, and trustful code production Describes solutions, such as simulation for extra-functional properties, extension of coding techniques, model-driven development, resource driven modeling, and quantitative and qualitative verification, based on statistics and formal proofs Applies techniques to several CPS design challenges, such as mixed criticality, communication protocols, and computing platform simulation