58 research outputs found

    Experiments with Subsetting Benchmark Suites

    No full text
    Benchmarks are one of the most popular tools to compare the performance of computing systems. Benchmark suites typically contain multiple benchmark programs with more or less the same properties. Hence the suite contains redundancy, which increases the cost of executing or simulating the benchmark suite without adding value. To limit simulation time, researchers frequently subset benchmark suites. However, correctly identifying a representative subset is of paramount importance to perform a trustworthy evaluation. This paper shows that subsetting a benchmark suite in such a way that representativeness of the suite is maintained is non-trivial. We show that a small randomly selected subset is not representative of the full benchmark suite. We discuss algorithms to subset the SPEC CPU 2000 benchmark suite and show that they provide more representative subsets than randomly selected subsets. However, the algorithms evaluated in this paper do not always compute representative subsets: the algorithms produce bad results for some subset sizes. In this sense, these algorithms are unreliable, as it remains necessary to validate the benchmark suite subset. We find one subsetting algorithm that is reliable. It is, however, uncertain whether this algorithm is also reliable under other circumstances

    A technique for high bandwidth and deterministic low latency load/store accesses to multiple cache banks

    No full text
    One of the problems in future processors will be the resource conflicts caused by several load/store units competing to access the same cache bank. The traditional approach for handling this case is by introducing buffers combined with a cross-bar. This approach suffers from (i) the nondeterministic latency of a load/store and (ii) the extra latency caused by the cross-bar and the buffer management. A deterministic latency is of the utmost importance for the forwarding mechanism of out-of-order processors because it enables back-to-back operation of instructions. We propose a technique by which we eliminate the buffers and crossbars from the critical path of the load/store execution. This results in both, a low and a deterministic latency. Our solution consists of predicting which bank is to be accessed. Only in the case of a wrong prediction a penalty results. 1
    • …
    corecore