Search CORE

14,699 research outputs found

An energy-efficient memory unit for clustered microarchitectures

Author: Bieschewski Stefan
González Colás Antonio María
Parcerisa Bundó Joan Manuel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

Whereas clustered microarchitectures themselves have been extensively studied, the memory units for these clustered microarchitectures have received relatively little attention. This article discusses some of the inherent challenges of clustered memory units and shows how these can be overcome. Clustered memory pipelines work well with the late allocation of load/store queue entries and physically unordered queues. Yet this approach has characteristic problems such as queue overflows and allocation patterns that lead to deadlocks. We propose techniques to solve each of these problems and show that a distributed memory unit can offer significant energy savings and speedups over a centralized unit. For instance, compared to a centralized cache with a load/store queue of 64/24 entries, our four-cluster distributed memory unit with load/store queues of 16/8 entries each consumes 31 percent less energy and performs 4,7 percent better on SPECint and consumes 36 percent less energy and performs 7 percent better for SPECfp.Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Empowering a helper cluster through data-width aware instruction selection policies

Author: Ergin Oguz
González Colás Antonio María
Unsal Osman Sabri
Vera Rivera Francisco Javier
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

Narrow values that can be represented by less number of bits than the full machine width occur very frequently in programs. On the other hand, clustering mechanisms enable cost- and performance-effective scaling of processor back-end features. Those attributes can be combined synergistically to design special clusters operating on narrow values (a.k.a. helper cluster), potentially providing performance benefits. We complement a 32-bit monolithic processor with a low-complexity 8-bit helper cluster. Then, in our main focus, we propose various ideas to select suitable instructions to execute in the data-width based clusters. We add data-width information as another instruction steering decision metric and introduce new data-width based selection algorithms which also consider dependency, inter-cluster communication and load imbalance. Utilizing those techniques, the performance of a wide range of workloads are substantially increased; helper cluster achieves an average speedup of 11% for a wide range of 412 apps. When focusing on integer applications, the speedup can be as high as 22% on averagePeer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Execution time distributions in embedded safety-critical systems using extreme value theory

Author: Abella Ferrer Jaume
Cazorla Francisco J.
del Castillo Joan
Padilla Maria
Publication venue: 'Inderscience Publishers'
Publication date: 01/01/2017
Field of study

Several techniques have been proposed to upper-bound the worst-case execution time behaviour of programs in the domain of critical real-time embedded systems. These computing systems have strong requirements regarding the guarantees that the longest execution time a program can take is bounded. Some of those techniques use extreme value theory (EVT) as their main prediction method. In this paper, EVT is used to estimate a high quantile for different types of execution time distributions observed for a set of representative programs for the analysis of automotive applications. A major challenge appears when the dataset seems to be heavy tailed, because this contradicts the previous assumption of embedded safety-critical systems. A methodology based on the coefficient of variation is introduced for a threshold selection algorithm to determine the point above which the distribution can be considered generalised Pareto distribution. This methodology also provides an estimation of the extreme value index and high quantile estimates. We have applied these methods to execution time observations collected from the execution of 16 representative automotive benchmarks to predict an upper-bound to the maximum execution time of this program. Several comparisons with alternative approaches are discussed.The research leading to these results has received funding from the European Community’s Seventh Framework Programme [FP7/2007-2013] under the PROXIMA Project (grant agreement 611085). This study was also partially supported by the Spanish Ministry of Science and Innovation under grants MTM2012-31118 (2013-2015) and TIN2015-65316-P. Jaume Abella is partially supported by the Ministry of Economy and Competitiveness under Ramon y Cajal postdoctoral fellowship number RYC-2013- 14717.Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

An integrated search-based approach for automatic testing from extended finite state machine (EFSM) models

Author: Abdul Salam Kalaji
Bochmann
Boehm
Briand
Budkowski
Carroll
Cheng
Clark
Cohen
Dahbura
Darringer
Derderian
Dssouli
Duale
Goldberg
Harman
Harman
Hierons
Hierons
Hierons
Hierons
Holland
Keum
King
Korel
Lee
Lee
McMinn
Michael
Nilsson
Petrenko
Petrenko
Ramalingom
Robert Mark Hierons
Sarikaya
Srinivas
Stephen Swift
Ural
Ural
Wegener
Wong
Publication venue: 'Elsevier BV'
Publication date: 01/12/2011
Field of study

This is the post-print version of the Article - Copyright @ 2011 ElsevierThe extended finite state machine (EFSM) is a modelling approach that has been used to represent a wide range of systems. When testing from an EFSM, it is normal to use a test criterion such as transition coverage. Such test criteria are often expressed in terms of transition paths (TPs) through an EFSM. Despite the popularity of EFSMs, testing from an EFSM is difficult for two main reasons: path feasibility and path input sequence generation. The path feasibility problem concerns generating paths that are feasible whereas the path input sequence generation problem is to find an input sequence that can traverse a feasible path. While search-based approaches have been used in test automation, there has been relatively little work that uses them when testing from an EFSM. In this paper, we propose an integrated search-based approach to automate testing from an EFSM. The approach has two phases, the aim of the first phase being to produce a feasible TP (FTP) while the second phase searches for an input sequence to trigger this TP. The first phase uses a Genetic Algorithm whose fitness function is a TP feasibility metric based on dataflow dependence. The second phase uses a Genetic Algorithm whose fitness function is based on a combination of a branch distance function and approach level. Experimental results using five EFSMs found the first phase to be effective in generating FTPs with a success rate of approximately 96.6%. Furthermore, the proposed input sequence generator could trigger all the generated feasible TPs (success rate = 100%). The results derived from the experiment demonstrate that the proposed approach is effective in automating testing from an EFSM

Crossref

Brunel University Research Archive

Fast, Interactive Worst-Case Execution Time Analysis With Back-Annotation

Author: Harmon Trevor
Kim Kwang H.
Kirner Raimund
Klefstad Raymond
Lowry Michael R.
Schoeberl Martin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

Abstract—For hard real-time systems, static code analysis is needed to derive a safe bound on the worst-case execution time (WCET). Virtually all prior work has focused on the accuracy of WCET analysis without regard to the speed of analysis. The resulting algorithms are often too slow to be integrated into the development cycle, requiring WCET analysis to be postponed until a final verification phase. In this paper we propose interactive WCET analysis as a new method to provide near-instantaneous WCET feedback to the developer during software programming. We show that interactive WCET analysis is feasible using tree-based WCET calculation. The feedback is realized with a plugin for the Java editor jEdit, where the WCET values are back-annotated to the Java source at the statement level. Comparison of this treebased approach with the implicit path enumeration technique (IPET) shows that tree-based analysis scales better with respect to program size and gives similar WCET values. Index Terms—Real time systems, performance analysis, software performance, software reliability, software algorithms, safety I

CiteSeerX

Online Research Database In Technology

University of Hertfordshire Research Archive

A direct-execution parallel architecture for the Advanced Continuous Simulation Language (ACSL)

Author: Carroll Chester C.
Owen Jeffrey E.
Publication venue
Publication date
Field of study

A direct-execution parallel architecture for the Advanced Continuous Simulation Language (ACSL) is presented which overcomes the traditional disadvantages of simulations executed on a digital computer. The incorporation of parallel processing allows the mapping of simulations into a digital computer to be done in the same inherently parallel manner as they are currently mapped onto an analog computer. The direct-execution format maximizes the efficiency of the executed code since the need for a high level language compiler is eliminated. Resolution is greatly increased over that which is available with an analog computer without the sacrifice in execution speed normally expected with digitial computer simulations. Although this report covers all aspects of the new architecture, key emphasis is placed on the processing element configuration and the microprogramming of the ACLS constructs. The execution times for all ACLS constructs are computed using a model of a processing element based on the AMD 29000 CPU and the AMD 29027 FPU. The increase in execution speed provided by parallel processing is exemplified by comparing the derived execution times of two ACSL programs with the execution times for the same programs executed on a similar sequential architecture

NASA Technical Reports Server