Search CORE

2,266 research outputs found

Spaceborne memory organization Interim report

Author: Gunderson D. C.
Hastings C. W.
Prom G. J.
Publication venue
Publication date
Field of study

Associative memory applications in unmanned space vehicle

NASA Technical Reports Server

Performance Optimization of Memory Intensive Applications on FPGA Accelerator

Author: Arif Arslan
Publication venue: Politecnico di Torino
Publication date
Field of study

L'abstract è presente nell'allegato / the abstract is in the attachmen

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

A Symbolic Transformation Language and its Application to a Multiscale Method

Author: Belkhir Walid
Giorgetti Alain
Lenczner Michel
Publication venue
Publication date: 10/12/2013
Field of study

The context of this work is the design of a software, called MEMSALab, dedicated to the automatic derivation of multiscale models of arrays of micro- and nanosystems. In this domain a model is a partial differential equation. Multiscale methods approximate it by another partial differential equation which can be numerically simulated in a reasonable time. The challenge consists in taking into account a wide range of geometries combining thin and periodic structures with the possibility of multiple nested scales. In this paper we present a transformation language that will make the development of MEMSALab more feasible. It is proposed as a Maple package for rule-based programming, rewriting strategies and their combination with standard Maple code. We illustrate the practical interest of this language by using it to encode two examples of multiscale derivations, namely the two-scale limit of the derivative operator and the two-scale model of the stationary heat equation.Comment: 36 page

arXiv.org e-Print Archive

HAL - Université de Franche-Comté

INRIA a CCSD electronic archive server

On a New Notion of Partial Refinement

Author: Sekerinski Emil
Zhang Tian
Publication venue: 'Open Publishing Association'
Publication date: 01/05/2013
Field of study

Formal specification techniques allow expressing idealized specifications, which abstract from restrictions that may arise in implementations. However, partial implementations are universal in software development due to practical limitations. Our goal is to contribute to a method of program refinement that allows for partial implementations. For programs with a normal and an exceptional exit, we propose a new notion of partial refinement which allows an implementation to terminate exceptionally if the desired results cannot be achieved, provided the initial state is maintained. Partial refinement leads to a systematic method of developing programs with exception handling.Comment: In Proceedings Refine 2013, arXiv:1305.563

arXiv.org e-Print Archive

Directory of Open Access Journals

Software and hardware methods for memory access latency reduction on ILP processors

Author: Zhang Zhao
Publication venue: W&M ScholarWorks
Publication date: 01/01/2002
Field of study

While microprocessors have doubled their speed every 18 months, performance improvement of memory systems has continued to lag behind. to address the speed gap between CPU and memory, a standard multi-level caching organization has been built for fast data accesses before the data have to be accessed in DRAM core. The existence of these caches in a computer system, such as L1, L2, L3, and DRAM row buffers, does not mean that data locality will be automatically exploited. The effective use of the memory hierarchy mainly depends on how data are allocated and how memory accesses are scheduled. In this dissertation, we propose several novel software and hardware techniques to effectively exploit the data locality and to significantly reduce memory access latency.;We first presented a case study at the application level that reconstructs memory-intensive programs by utilizing program-specific knowledge. The problem of bit-reversals, a set of data reordering operations extensively used in scientific computing program such as FFT, and an application with a special data access pattern that can cause severe cache conflicts, is identified in this study. We have proposed several software methods, including padding and blocking, to restructure the program to reduce those conflicts. Our methods outperform existing ones on both uniprocessor and multiprocessor systems.;The access latency to DRAM core has become increasingly long relative to CPU speed, causing memory accesses to be an execution bottleneck. In order to reduce the frequency of DRAM core accesses to effectively shorten the overall memory access latency, we have conducted three studies at this level of memory hierarchy. First, motivated by our evaluation of DRAM row buffer\u27s performance roles and our findings of the reasons of its access conflicts, we propose a simple and effective memory interleaving scheme to reduce or even eliminate row buffer conflicts. Second, we propose a fine-grain priority scheduling scheme to reorder the sequence of data accesses on multi-channel memory systems, effectively exploiting the available bus bandwidth and access concurrency. In the final part of the dissertation, we first evaluate the design of cached DRAM and its organization alternatives associated with ILP processors. We then propose a new memory hierarchy integration that uses cached DRAM to construct a very large off-chip cache. We show that this structure outperforms a standard memory system with an off-level L3 cache for memory-intensive applications.;Memory access latency has become a major performance bottleneck for memory-intensive applications. as long as DRAM technology remains its most cost-effective position for making main memory, the memory performance problem will continue to exist. The studies conducted in this dissertation attempt to address this important issue. Our proposed software and hardware schemes are effective and applicable, which can be directly used in real-world memory system designs and implementations. Our studies also provide guidance for application programmers to understand memory performance implications, and for system architects to optimize memory hierarchies

College of William & Mary: W&M Publish

Power considerations for memory-related microarchitecture designs

Author: Zhu Zhichun
Publication venue: W&M ScholarWorks
Publication date: 01/01/2003
Field of study

The fast performance improvement of computer systems in the last decade comes with the consistent increase on power consumption. In recent years, power dissipation is becoming a design constraint even for high-performance systems. Higher power dissipation means higher packaging and cooling cost, and lower reliability. This Ph.D. dissertation will investigate several memory-related design and optimization issues of general-purpose computer microarchitectures, aiming at reducing the power consumption without sacrificing the performance. The memory system consumes a large percentage of the system\u27s power. In addition, its behavior affects the processor power consumption significantly. In this dissertation, we propose two schemes to address the power-aware architecture issues related to memory: (1) We develop and evaluate low-power techniques for high-associativity caches. By dynamically applying different access modes for cache hits and misses, our proposed cache structure can achieve nearly lowest power consumption with minimal performance penalty. (2) We propose and evaluate look-ahead architectural adaptation techniques to reduce power consumption in processor pipelines based on the memory access information. The scheme can significantly reduce the power consumption of memory-intensive applications. Combined with other adaptation techniques, our schemes can effectively reduce the power consumption for both computer- and memory-intensive applications. The significance, potential impacts, and contributions of this dissertation are: (1) Academia and industry R & D has solely targeted the objective of high performance in both hardware and software designs since the beginning stage of building computer systems. However, the pursuit of high performance without considering energy consumption will inevitably lead to increased power dissipation and thus will eventually limit the development and progress of increasingly demanded mobile, portable, and high-performance computing systems. (2) Since our proposed method adaptively combines the merits of existing low-power cache designs, it approaches the optimum in terms of both retaining performance and saving energy. This low power solution for highly associative caches can be easily deployed with a low cost. (3) Using a cache miss , a common program execution event, as a triggering signal to slow down the processor issue rate, our scheme can effectively reduce processor power consumption. This design can be easily and practically deployed in many processor architectures with a low cost

College of William & Mary: W&M Publish

Using the High Productivity Language Chapel to Target GPGPU Architectures

Author: Chamberlain Bradford L.
Garzaran Maria J.
Padua David
Sidelnik Albert
Publication venue
Publication date: 25/04/2011
Field of study

It has been widely shown that GPGPU architectures offer large performance gains compared to their traditional CPU counterparts for many applications. The downside to these architectures is that the current programming models present numerous challenges to the programmer: lower-level languages, explicit data movement, loss of portability, and challenges in performance optimization. In this paper, we present novel methods and compiler transformations that increase productivity by enabling users to easily program GPGPU architectures using the high productivity programming language Chapel. Rather than resorting to different parallel libraries or annotations for a given parallel platform, we leverage a language that has been designed from first principles to address the challenge of programming for parallelism and locality. This also has the advantage of being portable across distinct classes of parallel architectures, including desktop multicores, distributed memory clusters, large-scale shared memory, and now CPU-GPU hybrids. We present experimental results from the Parboil benchmark suite which demonstrate that codes written in Chapel achieve performance comparable to the original versions implemented in CUDA.NSF CCF 0702260Cray Inc. Cray-SRA-2010-016962010-2011 Nvidia Research Fellowshipunpublishednot peer reviewe

Illinois Digital Environment for Access to Learning and Scholarship Repository