Search CORE

145 research outputs found

An Instruction Scratchpad Memory Allocation for the Precision Timed Architecture

Author: Prakash Aayush
Publication venue: 'University of Waterloo'
Publication date: 11/12/2012
Field of study

This work presents a static instruction allocation scheme for the precision timed architecture’s (PRET) scratchpad memory. Since PRET provides timing instructions to control the temporal execution of programs, the objective of the allocation scheme is to ensure that the explicitly specified temporal requirements are met. Furthermore, this allocation incorporates instructions from multiple hardware threads of the PRET architecture. We formulate the allocation as an integer-linear programming problem, and we implement a tool that takes binaries, constructs a control-flow graph, performs the allocation, rewrites the binary with the new allocation, and generates an output binary for the PRET architecture. We carry out experiments on a modified version of the Malardalen benchmarks to illustrate that commonly known ACET and WCET based approaches cannot be directly applied to meet explicit timing requirements. We also show the advantage of performing the allocation across multiple threads. We present a real time benchmark controlling an Unmanned Air Vehicle as the case study

University of Waterloo's Institutional Repository

Time-predictable Chip-Multiprocessor Design

Author: Schoeberl Martin
Publication venue
Publication date: 01/01/2010
Field of study

Abstract—Real-time systems need time-predictable platforms to enable static worst-case execution time (WCET) analysis. Improving the processor performance with superscalar techniques makes static WCET analysis practically impossible. However, most real-time systems are multi-threaded applications and performance can be improved by using several processor cores on a single chip. In this paper we present a time-predictable chipmultiprocessor system that aims to improve system performance while still enabling WCET analysis. The proposed chip-multiprocessor (CMP) uses a shared memory with a time-division multiple access (TDMA) based memory access scheduling. The static TDMA schedule can be integrated into the WCET analysis. Experiments with a JOP based CMP showed that the memory access starts to dominate the execution time when using more than 4 processor cores. To provide a better scalability, more local memories have to be used. We add a processor local scratchpad memory and split data caches, which are still time-predictable, to the processor cores. I

CiteSeerX

Crossref

Online Research Database In Technology

Is Time Predictability Quantifiable?

Author: Schoeberl Martin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

Abstract—Computer architects and researchers in the realtime domain start to investigate processors and architectures optimized for real-time systems. Optimized for real-time systems means time predictable, i.e., architectures where it is possible to statically derive a tight bound of the worst-case execution time. To compare different approaches we would like to quantify time predictability. That means we need to measure time predictability. In this paper we discuss the different approaches for these measurements and conclude that time predictability is practically not quantifiable. We can only compare the worst-case execution time bounds of different architectures. I

CiteSeerX

Crossref

Online Research Database In Technology

An extensible framework for multicore response time analysis

Author: B Bhat
C Ferdinand
C Ferdinand
C Lee
CG Lee
Claire Maiza
D Dasari
E Bini
G Yao
H Kim
Jan Reineke
K Lampka
Leandro S. Indrusiak
M Joseph
M Paolieri
N Audsley
N Binkert
P Radojković
RI Davis
Robert I. Davis
S Altmeyer
Sebastian Altmeyer
T Kelter
TP Baker
Vincent Nelis
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

In this paper, we introduce a multicore response time analysis (MRTA) framework, which decouples response time analysis from a reliance on context independent WCET values. Instead, the analysis formulates response times directly from the demands placed on different hardware resources. The MRTA framework is extensible to different multicore architectures, with a variety of arbitration policies for the common interconnects, and different types and arrangements of local memory. We instantiate the framework for single level local data and instruction memories (cache or scratchpads), for a variety of memory bus arbitration policies, including: Round-Robin, FIFO, Fixed-Priority, Processor-Priority, and TDMA, and account for DRAM refreshes. The MRTA framework provides a general approach to timing verification for multicore systems that is parametric in the hardware configuration and so can be used at the architectural design stage to compare the guaranteed levels of real-time performance that can be obtained with different hardware configurations. We use the framework in this way to evaluate the performance of multicore systems with a variety of different architectural components and policies. These results are then used to compose a predictable architecture, which is compared against a reference architecture designed for good average-case behaviour. This comparison shows that the predictable architecture has substantially better guaranteed real-time performance, with the precision of the analysis verified using cycle-accurate simulation

OPUS Augsburg

Repositório Científico do Instituto Politécnico do Porto

Crossref

White Rose Research Online

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Towards a Time-predictable Dual-Issue Microprocessor: The Patmos Approach

Author: Christian W. Probst
Ens De Lyon
Florian Br
Martin Schoeberl
Pascal Schleuniger
Sven Karlsson
Tommy Thorn
Wolfgang Puffitsch
Publication venue: OASICS
Publication date: 01/01/2011
Field of study

Current processors are optimized for average case performance, often leading to a high worst-case execution time (WCET). Many architectural features that increase the average case performance are hard to be modeled for the WCET analysis. In this paper we present Patmos, a processor optimized for low WCET bounds rather than high average case performance. Patmos is a dual-issue, statically scheduled RISC processor. The instruction cache is organized as a method cache and the data cache is organized as a split cache in order to simplify the cache WCET analysis. To fill the dual-issue pipeline with enough useful instructions, Patmos relies on a customized compiler. The compiler also plays a central role in optimizing the application for the WCET instead of average case performance

HAL-ENS-LYON

CiteSeerX

INRIA a CCSD electronic archive server

Dagstuhl Research Online Publication Server

Online Research Database In Technology

Hal-Diderot

An extensible framework for multicore response time analysis

Author: Altmeyer S.
Davis R.I.
Indrusiak L.S.
Maiza C.
Nelis V.
Reineke J.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/07/2018
Field of study

International Migration, Integration and Social Cohesion online publications

Doctor of Philosophy

Author: Redd Bennion
Publication venue: University of Utah
Publication date: 01/01/2015
Field of study

dissertationAdvancements in process technology and circuit techniques have enabled the creation of small chemical microsystems for use in a wide variety of biomedical and sensing applications. For applications requiring a small microsystem, many components can be integrated onto a single chip. This dissertation presents many low-power circuits, digital and analog, integrated onto a single chip called the Utah Microcontroller. To guide the design decisions for each of these components, two specific microsystems have been selected as target applications: a Smart Intravaginal Ring (S-IVR) and an NO releasing catheter. Both of these applications share the challenging requirements of integrating a large variety of low-power mixed-signal circuitry onto a single chip. These applications represent the requirements of a broad variety of small low-power sensing systems. In the course of the development of the Utah Microcontroller, several unique and significant contributions were made. A central component of the Utah Microcontroller is the WIMS Microprocessor, which incorporates a low-power feature called a scratchpad memory. For the first time, an analysis of scaling trends projected that scratchpad memories will continue to save power for the foreseeable future. This conclusion was bolstered by measured data from a fabricated microcontroller. In a 32 nm version of the WIMS Microprocessor, the scratchpad memory is projected to save ~10-30% of memory access energy depending upon the characteristics of the embedded program. Close examination of application requirements informed the design of an analog-to-digital converter, and a unique single-opamp buffered charge scaling DAC was developed to minimize power consumption. The opamp was designed to simultaneously meet the varied demands of many chip components to maximize circuit reuse. Each of these components are functional, have been integrated, fabricated, and tested. This dissertation successfully demonstrates that the needs of emerging small low-power microsystems can be met in advanced process nodes with the incorporation of low-power circuit techniques and design choices driven by application requirements

The University of Utah: J. Willard Marriott Digital Library

Data cache organization for accurate timing analysis

Author: Huber Benedikt
Puffitsch Wolfgang
Schoeberl Martin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Online Research Database In Technology

A memory-centric approach to enable timing-predictability within embedded many-core accelerators

Author: Bertogna Marko
Burgio Paolo
Marongiu Andrea
Valente Paolo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

There is an increasing interest among real-time systems architects for multi- and many-core accelerated platforms. The main obstacle towards the adoption of such devices within industrial settings is related to the difficulties in tightly estimating the multiple interferences that may arise among the parallel components of the system. This in particular concerns concurrent accesses to shared memory and communication resources. Existing worst-case execution time analyses are extremely pessimistic, especially when adopted for systems composed of hundreds-tothousands of cores. This significantly limits the potential for the adoption of these platforms in real-time systems. In this paper, we study how the predictable execution model (PREM), a memory-aware approach to enable timing-predictability in realtime systems, can be successfully adopted on multi- and manycore heterogeneous platforms. Using a state-of-the-art multi-core platform as a testbed, we validate that it is possible to obtain an order-of-magnitude improvement in the WCET bounds of parallel applications, if data movements are adequately orchestrated in accordance with PREM. We identify which system parameters mostly affect the tremendous performance opportunities offered by this approach, both on average and in the worst case, moving the first step towards predictable many-core systems

Crossref

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

A Dynamic Scratchpad Memory Unit for Predictable Real-Time Embedded Systems

Author: Wasly Saud
Publication venue: 'University of Waterloo'
Publication date: 21/12/2012
Field of study

Scratch-pad memory is a popular alternative to caches in real-time embedded systems due to its advantages in terms of timing predictability and power consumption. However, dynamic management of scratch-pad content is challenging in multitasking environments. To address this issue, this thesis proposes the design of a novel Real-Time Scratchpad Memory Unit (RSMU). The RSMU can be integrated into existing systems with minimal architectural modi cations. Furthermore, scratchpad management is performed at the OS level, requiring no application changes. In conjunction with a two-level scheduling scheme, the RSMU provides strong timing guarantees to critical tasks. Demonstration and evaluation of the system design is provided on an embedded FPGA platform

University of Waterloo's Institutional Repository