125 research outputs found
Optimization of Discrete-parameter Multiprocessor Systems using a Novel Ergodic Interpolation Technique
Modern multi-core systems have a large number of design parameters, most of
which are discrete-valued, and this number is likely to keep increasing as chip
complexity rises. Further, the accurate evaluation of a potential design choice
is computationally expensive because it requires detailed cycle-accurate system
simulation. If the discrete parameter space can be embedded into a larger
continuous parameter space, then continuous space techniques can, in principle,
be applied to the system optimization problem. Such continuous space techniques
often scale well with the number of parameters.
We propose a novel technique for embedding the discrete parameter space into
an extended continuous space so that continuous space techniques can be applied
to the embedded problem using cycle accurate simulation for evaluating the
objective function. This embedding is implemented using simulation-based
ergodic interpolation, which, unlike spatial interpolation, produces the
interpolated value within a single simulation run irrespective of the number of
parameters. We have implemented this interpolation scheme in a cycle-based
system simulator. In a characterization study, we observe that the interpolated
performance curves are continuous, piece-wise smooth, and have low statistical
error. We use the ergodic interpolation-based approach to solve a large
multi-core design optimization problem with 31 design parameters. Our results
indicate that continuous space optimization using ergodic interpolation-based
embedding can be a viable approach for large multi-core design optimization
problems.Comment: A short version of this paper will be published in the proceedings of
IEEE MASCOTS 2015 conferenc
An evaluation of a microprocessor with two independent hardware execution threads coupled through a shared cache
We investigate the utility of augmenting a microprocessor with a single
execution pipeline by adding a second copy of the execution pipeline in
parallel with the existing one. The resulting dual-hardware-threaded
microprocessor has two identical, independent, single-issue in-order execution
pipelines (hardware threads) which share a common memory sub-system (consisting
of instruction and data caches together with a memory management unit). From a
design perspective, the assembly and verification of the dual threaded
processor is simplified by the use of existing verified implementations of the
execution pipeline and a memory unit. Because the memory unit is shared by the
two hardware threads, the relative area overhead of adding the second hardware
thread is 25\% of the area of the existing single threaded processor. Using an
FPGA implementation we evaluate the performance of the dual threaded processor
relative to the single threaded one. On applications which can be parallelized,
we observe speedups of 1.6X to 1.88X. For applications that are not
parallelizable, the speedup is more modest. We also observe that the dual
threaded processor performance is degraded on applications which generate large
numbers of cache misses
- …