564,835 research outputs found
How Multithreading Addresses the Memory Wall
The memory wall is the predicted situation where improvements to processor speed will be masked by the much slower improvement in dynamic random access (DRAM) memory speed. Since the prediction was made in 1995, considerable progress has been made in addressing the memory wall. There have been advances in DRAM organization, improved approaches to memory hierarchy have been proposed, integrating DRAM onto the processor chip has been investigated and alternative approaches to organizing the instruction stream have been researched. All of these approaches contribute to reducing the predicted memory wall effect; some can potentially be combined. This paper reviews several approaches with a view to assessing the most promising option. Given the growing CPU-DRAM speed gap, any strategy which finds alternative work while waiting for DRAM is likely to be a win
Kilo-instruction processors: overcoming the memory wall
Historically, advances in integrated circuit technology have driven improvements in processor microarchitecture and led to todays microprocessors with sophisticated pipelines operating at very high clock frequencies. However, performance improvements achievable by high-frequency microprocessors have become seriously limited by main-memory access latencies because main-memory speeds have improved at a much slower pace than microprocessor speeds. Its crucial to deal with this performance disparity, commonly known as the memory wall, to enable future high-frequency microprocessors to achieve their performance potential. To overcome the memory wall, we propose kilo-instruction processors-superscalar processors that can maintain a thousand or more simultaneous in-flight instructions. Doing so means designing key hardware structures so that the processor can satisfy the high resource requirements without significantly decreasing processor efficiency or increasing energy consumption.Peer ReviewedPostprint (published version
When parallel speedups hit the memory wall
After Amdahl's trailblazing work, many other authors proposed analytical
speedup models but none have considered the limiting effect of the memory wall.
These models exploited aspects such as problem-size variation, memory size,
communication overhead, and synchronization overhead, but data-access delays
are assumed to be constant. Nevertheless, such delays can vary, for example,
according to the number of cores used and the ratio between processor and
memory frequencies. Given the large number of possible configurations of
operating frequency and number of cores that current architectures can offer,
suitable speedup models to describe such variations among these configurations
are quite desirable for off-line or on-line scheduling decisions. This work
proposes new parallel speedup models that account for variations of the average
data-access delay to describe the limiting effect of the memory wall on
parallel speedups. Analytical results indicate that the proposed modeling can
capture the desired behavior while experimental hardware results validate the
former. Additionally, we show that when accounting for parameters that reflect
the intrinsic characteristics of the applications, such as degree of
parallelism and susceptibility to the memory wall, our proposal has significant
advantages over machine-learning-based modeling. Moreover, besides being
black-box modeling, our experiments show that conventional machine-learning
modeling needs about one order of magnitude more measurements to reach the same
level of accuracy achieved in our modeling.Comment: 24 page
Magnetoelectric domain wall dynamics and its implications for magnetoelectric memory
Domain wall dynamics in a magnetoelectric antiferromagnet is analyzed, and
its implications for magnetoelectric memory applications are discussed.
CrO is used in the estimates of the materials parameters. It is found
that the domain wall mobility has a maximum as a function of the electric field
due to the gyrotropic coupling induced by it. In CrO the maximal
mobility of 0.1 m/(sOe) is reached at V/nm. Fields of
this order may be too weak to overcome the intrinsic depinning field, which is
estimated for B-doped CrO. These major drawbacks for device
implementation can be overcome by applying a small in-plane shear strain, which
blocks the domain wall precession. Domain wall mobility of about 0.7
m/(sOe) can then be achieved at V/nm. A split-gate scheme is
proposed for the domain-wall controlled bit element; its extension to
multiple-gate linear arrays can offer advantages in memory density,
programmability, and logic functionality.Comment: 5 pages, 2 figures, revised and corrected version, accepted in
Applied Physics Letter
- …