14,699 research outputs found

    An energy-efficient memory unit for clustered microarchitectures

    Get PDF
    Whereas clustered microarchitectures themselves have been extensively studied, the memory units for these clustered microarchitectures have received relatively little attention. This article discusses some of the inherent challenges of clustered memory units and shows how these can be overcome. Clustered memory pipelines work well with the late allocation of load/store queue entries and physically unordered queues. Yet this approach has characteristic problems such as queue overflows and allocation patterns that lead to deadlocks. We propose techniques to solve each of these problems and show that a distributed memory unit can offer significant energy savings and speedups over a centralized unit. For instance, compared to a centralized cache with a load/store queue of 64/24 entries, our four-cluster distributed memory unit with load/store queues of 16/8 entries each consumes 31 percent less energy and performs 4,7 percent better on SPECint and consumes 36 percent less energy and performs 7 percent better for SPECfp.Peer ReviewedPostprint (author's final draft

    Empowering a helper cluster through data-width aware instruction selection policies

    Get PDF
    Narrow values that can be represented by less number of bits than the full machine width occur very frequently in programs. On the other hand, clustering mechanisms enable cost- and performance-effective scaling of processor back-end features. Those attributes can be combined synergistically to design special clusters operating on narrow values (a.k.a. helper cluster), potentially providing performance benefits. We complement a 32-bit monolithic processor with a low-complexity 8-bit helper cluster. Then, in our main focus, we propose various ideas to select suitable instructions to execute in the data-width based clusters. We add data-width information as another instruction steering decision metric and introduce new data-width based selection algorithms which also consider dependency, inter-cluster communication and load imbalance. Utilizing those techniques, the performance of a wide range of workloads are substantially increased; helper cluster achieves an average speedup of 11% for a wide range of 412 apps. When focusing on integer applications, the speedup can be as high as 22% on averagePeer ReviewedPostprint (published version

    Execution time distributions in embedded safety-critical systems using extreme value theory

    Get PDF
    Several techniques have been proposed to upper-bound the worst-case execution time behaviour of programs in the domain of critical real-time embedded systems. These computing systems have strong requirements regarding the guarantees that the longest execution time a program can take is bounded. Some of those techniques use extreme value theory (EVT) as their main prediction method. In this paper, EVT is used to estimate a high quantile for different types of execution time distributions observed for a set of representative programs for the analysis of automotive applications. A major challenge appears when the dataset seems to be heavy tailed, because this contradicts the previous assumption of embedded safety-critical systems. A methodology based on the coefficient of variation is introduced for a threshold selection algorithm to determine the point above which the distribution can be considered generalised Pareto distribution. This methodology also provides an estimation of the extreme value index and high quantile estimates. We have applied these methods to execution time observations collected from the execution of 16 representative automotive benchmarks to predict an upper-bound to the maximum execution time of this program. Several comparisons with alternative approaches are discussed.The research leading to these results has received funding from the European Community’s Seventh Framework Programme [FP7/2007-2013] under the PROXIMA Project (grant agreement 611085). This study was also partially supported by the Spanish Ministry of Science and Innovation under grants MTM2012-31118 (2013-2015) and TIN2015-65316-P. Jaume Abella is partially supported by the Ministry of Economy and Competitiveness under Ramon y Cajal postdoctoral fellowship number RYC-2013- 14717.Peer ReviewedPostprint (author's final draft

    An integrated search-based approach for automatic testing from extended finite state machine (EFSM) models

    Get PDF
    This is the post-print version of the Article - Copyright @ 2011 ElsevierThe extended finite state machine (EFSM) is a modelling approach that has been used to represent a wide range of systems. When testing from an EFSM, it is normal to use a test criterion such as transition coverage. Such test criteria are often expressed in terms of transition paths (TPs) through an EFSM. Despite the popularity of EFSMs, testing from an EFSM is difficult for two main reasons: path feasibility and path input sequence generation. The path feasibility problem concerns generating paths that are feasible whereas the path input sequence generation problem is to find an input sequence that can traverse a feasible path. While search-based approaches have been used in test automation, there has been relatively little work that uses them when testing from an EFSM. In this paper, we propose an integrated search-based approach to automate testing from an EFSM. The approach has two phases, the aim of the first phase being to produce a feasible TP (FTP) while the second phase searches for an input sequence to trigger this TP. The first phase uses a Genetic Algorithm whose fitness function is a TP feasibility metric based on dataflow dependence. The second phase uses a Genetic Algorithm whose fitness function is based on a combination of a branch distance function and approach level. Experimental results using five EFSMs found the first phase to be effective in generating FTPs with a success rate of approximately 96.6%. Furthermore, the proposed input sequence generator could trigger all the generated feasible TPs (success rate = 100%). The results derived from the experiment demonstrate that the proposed approach is effective in automating testing from an EFSM

    Fast, Interactive Worst-Case Execution Time Analysis With Back-Annotation

    Get PDF
    Abstract—For hard real-time systems, static code analysis is needed to derive a safe bound on the worst-case execution time (WCET). Virtually all prior work has focused on the accuracy of WCET analysis without regard to the speed of analysis. The resulting algorithms are often too slow to be integrated into the development cycle, requiring WCET analysis to be postponed until a final verification phase. In this paper we propose interactive WCET analysis as a new method to provide near-instantaneous WCET feedback to the developer during software programming. We show that interactive WCET analysis is feasible using tree-based WCET calculation. The feedback is realized with a plugin for the Java editor jEdit, where the WCET values are back-annotated to the Java source at the statement level. Comparison of this treebased approach with the implicit path enumeration technique (IPET) shows that tree-based analysis scales better with respect to program size and gives similar WCET values. Index Terms—Real time systems, performance analysis, software performance, software reliability, software algorithms, safety I

    A direct-execution parallel architecture for the Advanced Continuous Simulation Language (ACSL)

    Get PDF
    A direct-execution parallel architecture for the Advanced Continuous Simulation Language (ACSL) is presented which overcomes the traditional disadvantages of simulations executed on a digital computer. The incorporation of parallel processing allows the mapping of simulations into a digital computer to be done in the same inherently parallel manner as they are currently mapped onto an analog computer. The direct-execution format maximizes the efficiency of the executed code since the need for a high level language compiler is eliminated. Resolution is greatly increased over that which is available with an analog computer without the sacrifice in execution speed normally expected with digitial computer simulations. Although this report covers all aspects of the new architecture, key emphasis is placed on the processing element configuration and the microprogramming of the ACLS constructs. The execution times for all ACLS constructs are computed using a model of a processing element based on the AMD 29000 CPU and the AMD 29027 FPU. The increase in execution speed provided by parallel processing is exemplified by comparing the derived execution times of two ACSL programs with the execution times for the same programs executed on a similar sequential architecture
    • …
    corecore