39 research outputs found

    Building Faithful High-level Models and Performance Evaluation of Manycore Embedded Systems

    Get PDF
    International audiencePerformance and functional correctness are key for successful design of modern embedded systems. Both aspects must be considered early in the design process to enable founded decision making towards final implementation. Nonetheless, building abstract system-level models that faithfully capture performance information along to functional behavior is a challenging task. In contrast to functional aspects, performance details are rarely available during early design phases and no clear method is known to characterize them. Moreover, once such system-level models are built they are inherently complex as they usually mix software models, hardware architecture constraints and environment abstractions. Their analysis by using traditional performance evaluation methods is reaching the limits and the need for more scalable and accurate techniques is becoming urgent. In this paper, we introduce a systematic method for building stochastic abstract performance models using statistical inference and model calibration and we propose statistical model checking as performance evaluation technique upon the obtained models. We experimented our method on a real-life case study. We were able to verify different timing properties

    Composable local memory organisation for streaming applications on embedded MPSoCs

    Get PDF
    Multi-Processor Systems on a Chip (MPSoCs) are suitable platforms for the implementation of complex embedded applications. An MPSoC is composable if the functional and temporal behaviour of each application is independent of the absence or presence of other applications. Composability is required for application design and analysis in isolation, and integration with linear effort. In this paper we propose a composable organisation for the top level of a memory hierarchy. This organisation preserves the short (one cycle) access time desirable for a processor's frequent local accesses and enables the predictability demanded by real-time applications. We partition the local memory in two blocks, one private, for local tile data, and another shared for inter-tile data communication. To avoid application interference, we instantiate one such shared local memory block and an Remote Direct Memory Access (RDMA) for each application running on the processor. We implement this organisation on an MPSoC with two processors on an FPGA. On this platform we execute a composition of applications consisting of a JPEG decoder, and a synthetic application. Our experiments indicate that an application's timing is not affected by the behaviour of another application, thus composability is achieved. Moreover, the utilisation of the RDMA component leads to 45% performance increase on average for a number of workloads covering a large range of communication/computation ratios

    Energy Management via PI Control for Data Parallel Applications with Throughput Constraints

    Get PDF
    International audienceThis paper presents a new proportional-integral (PI) controller that sets the operating point of computing tiles in a system on chip (SoC). We address data-parallel applications with throughput constraints. The controller settings are investigated for application configurations with different QoS levels and different buffer sizes. The control method is evaluated on a test chip with four tiles executing a realistic HMAX object recognition application. Experimental results suggest that the proposed controller outperforms the state-of-the-art results: it attains, on average, 25% less number of frequency switches and has slightly higher energy savings. The reduction in number of frequency switches is important because it decreases the involved overhead. In addition, the PI controller meets the throughput constraint in cases where other approaches fail

    A methodology for the design of dynamic accuracy operators by runtime back bias

    Get PDF
    Mobile and IoT applications must balance increasing processing demands with limited power and cost budgets. Approximate computing achieves this goal leveraging the error tolerance features common in many emerging applications to reduce power consumption. In particular, adequate (i.e., energy/quality-configurable) hardware operators are key components in an error tolerant system. Existing implementations of these operators require significant architectural modifications, hence they are often design-specific and tend to have large overheads compared to accurate units. In this paper, we propose a methodology to design adequate data-path operators in an automatic way, which uses threshold voltage scaling as a knob to dynamically control the power/accuracy tradeoff. The method overcomes the limitations of previous solutions based on supply voltage scaling, in that it introduces lower overheads and it allows fine-grain regulation of this tradeoff. We demonstrate our approach on a state-of-the-art 28nm FDSOI technology, exploiting the strong effect of back biasing on threshold voltage. Results show a power consumption reduction of as much as 39% compared to solutions based only on supply voltage scaling, at iso-accuracy

    The transprecision computing paradigm: Concept, design, and applications

    Get PDF
    Guaranteed numerical precision of each elementary step in a complex computation has been the mainstay of traditional computing systems for many years. This era, fueled by Moore’s law and the constant exponential improvement in computing efficiency, is at its twilight: from tiny nodes of the Internet-of-Things, to large HPC computing centers, subpicoJoule/operation energy efficiency is essential for practical realizations. To overcome the power wall, a shift from traditional computing paradigms is now mandatory. In this paper we present the driving motivations, roadmap, and expected impact of the European project OPRECOMP. OPRECOMP aims to (i) develop the first complete transprecision computing framework, (ii) apply it to a wide range of hardware platforms, from the sub-milliWatt up to the MegaWatt range, and (iii) demonstrate impact in a wide range of computational domains, spanning IoT, Big Data Analytics, Deep Learning, and HPC simulations. By combining together into a seamless design transprecision advances in devices, circuits, software tools, and algorithms, we expect to achieve major energy efficiency improvements, even when there is no freedom to relax end-to-end application quality of results. Indeed, OPRECOMP aims at demolishing the ultraconservative “precise” computing abstraction, replacing it with a more flexible and efficient one, namely transprecision computing

    Model-implementation fidelity in cyber physical system design

    No full text
    This book puts in focus various techniques for checking modeling fidelity of Cyber Physical Systems (CPS), with respect to the physical world they represent. The authors' present modeling and analysis techniques representing different communities, from very different angles, discuss their possible interactions, and discuss the commonalities and differences between their practices. Coverage includes model driven development, resource-driven development, statistical analysis, proofs of simulator implementation, compiler construction, power/temperature modeling of digital devices, high-level performance analysis, and code/device certification. Several industrial contexts are covered, including modeling of computing and communication, proof architectures models and statistical based validation techniques. Addresses CPS design problems such as cross-application interference, parsimonious modeling, and trustful code production Describes solutions, such as simulation for extra-functional properties, extension of coding techniques, model-driven development, resource driven modeling, and quantitative and qualitative verification, based on statistics and formal proofs Applies techniques to several CPS design challenges, such as mixed criticality, communication protocols, and computing platform simulation

    Conservative Dynamic Energy Management for Real-Time Dataflow Applications Mapped on Multiple Processors

    No full text
    Abstract—Voltage-frequency scaling (VFS) trades a linear processor slowdown for a potentially quadratic reduction in energy consumption. Complex dependencies may exist between different tasks of an application. The impact of VFS on the endto-end application performance is difficult to predict, especially when these tasks are mapped on multiple processors that are scaled independently. This is a problem for real-time (RT) applications that require guaranteed end-to-end performance. In this paper we first classify the slack existing in RT applications consisting of multiple dependent tasks mapped on multiple processors independently using VFS, resulting in static, work, and share slack. Then we concentrate on work and share slack as they can only be detected at run time, thus their conservative use is challenging. We propose SlackOS, a dynamic, dependency-aware, task scheduling that conservatively scales the voltage and frequency of each processor, to respect RT deadlines. When applied to a H.264 application, our method delivers 22 % to 33 % energy reduction, compared to dynamic RT scheduling that is not energy aware. Keywords-Real-time; DVFS; Multi-processor; Dataflow; I
    corecore