Search CORE

11 research outputs found

Code layout optimizations for transaction processing workloads

Author: A. Ramirez
J. Larriba-Pey
K. Gharachorloo
L.A. Barroso
M. Valero
P.G. Lowney
R. Cohn
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2002
Field of study

A detailed comparison of two transaction processing workloads

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2002
Field of study

Data Replication Strategies for Fault Tolerance and Availability on Commodity Clusters

Author: Amza C.
Cox A.L.
Zwaenepoel W
Publication venue
Publication date: 19/10/2005
Field of study

Recent work has shown the advantages of using persistent memory for transaction processing. In particular, the Vista transaction system uses recoverable memory to avoid disk I/O, thus improving performance by several orders of magnitude. In such a system, however, the data is safe when a node fails, but unavailable until it recovers, because the data is kept in only one memory. In contrast, our work uses data replication to provide both reliability and data availability while still maintaining very high transaction throughput. We investigate four possible designs for a primary-backup system, using a cluster of commodity servers connected by a write-through capable system area network (SAN). We show that logging approaches outperform mirroring approaches, even when communicating more data, because of their better locality. Finally, we show that the best logging approach also scales well to small shared-memory multiprocessors

Portable high-performance programs

Author: Frigo Matteo, 1968-
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/1999
Field of study

Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1999.Includes bibliographical references (p. 159-169).by Matteo Frigo.Ph.D

CiteSeerX

Software and hardware methods for memory access latency reduction on ILP processors

Author: Zhang Zhao
Publication venue: W&M ScholarWorks
Publication date: 01/01/2002
Field of study

While microprocessors have doubled their speed every 18 months, performance improvement of memory systems has continued to lag behind. to address the speed gap between CPU and memory, a standard multi-level caching organization has been built for fast data accesses before the data have to be accessed in DRAM core. The existence of these caches in a computer system, such as L1, L2, L3, and DRAM row buffers, does not mean that data locality will be automatically exploited. The effective use of the memory hierarchy mainly depends on how data are allocated and how memory accesses are scheduled. In this dissertation, we propose several novel software and hardware techniques to effectively exploit the data locality and to significantly reduce memory access latency.;We first presented a case study at the application level that reconstructs memory-intensive programs by utilizing program-specific knowledge. The problem of bit-reversals, a set of data reordering operations extensively used in scientific computing program such as FFT, and an application with a special data access pattern that can cause severe cache conflicts, is identified in this study. We have proposed several software methods, including padding and blocking, to restructure the program to reduce those conflicts. Our methods outperform existing ones on both uniprocessor and multiprocessor systems.;The access latency to DRAM core has become increasingly long relative to CPU speed, causing memory accesses to be an execution bottleneck. In order to reduce the frequency of DRAM core accesses to effectively shorten the overall memory access latency, we have conducted three studies at this level of memory hierarchy. First, motivated by our evaluation of DRAM row buffer\u27s performance roles and our findings of the reasons of its access conflicts, we propose a simple and effective memory interleaving scheme to reduce or even eliminate row buffer conflicts. Second, we propose a fine-grain priority scheduling scheme to reorder the sequence of data accesses on multi-channel memory systems, effectively exploiting the available bus bandwidth and access concurrency. In the final part of the dissertation, we first evaluate the design of cached DRAM and its organization alternatives associated with ILP processors. We then propose a new memory hierarchy integration that uses cached DRAM to construct a very large off-chip cache. We show that this structure outperforms a standard memory system with an off-level L3 cache for memory-intensive applications.;Memory access latency has become a major performance bottleneck for memory-intensive applications. as long as DRAM technology remains its most cost-effective position for making main memory, the memory performance problem will continue to exist. The studies conducted in this dissertation attempt to address this important issue. Our proposed software and hardware schemes are effective and applicable, which can be directly used in real-world memory system designs and implementations. Our studies also provide guidance for application programmers to understand memory performance implications, and for system architects to optimize memory hierarchies

College of William & Mary: W&M Publish

Exploiting cache locality at run-time

Author: Yan Yong
Publication venue: W&M ScholarWorks
Publication date: 01/01/1998
Field of study

With the increasing gap between the speeds of the processor and memory system, memory access has become a major performance bottleneck in modern computer systems. Recently, Symmetric Multi-Processor (SMP) systems have emerged as a major class of high-performance platforms. Improving the memory performance of Parallel applications with dynamic memory-access patterns on Symmetric Multi-Processors (SMP) is a hard problem. The solution to this problem is critical to the successful use of the SMP systems because dynamic memory-access patterns occur in many real-world applications. This dissertation is aimed at solving this problem.;Based on a rigorous analysis of cache-locality optimization, we propose a memory-layout oriented run-time technique to exploit the cache locality of parallel loops. Our technique have been implemented in a run-time system. Using simulation and measurement, we have shown our run-time approach can achieve comparable performance with compiler optimizations for those regular applications, whose load balance and cache locality can be well optimized by tiling and other program transformations. However, our approach was shown to improve significantly the memory performance for applications with dynamic memory-access patterns. Such applications are usually hard to optimize with static compiler optimizations.;Several contributions are made in this dissertation. We present models to characterize the complexity and present a solution framework for optimizing cache locality. We present an effective estimation technique for memory-access patterns to support efficient locality optimizations and information integration. We present a memory-layout oriented run-time technique for locality optimization. We present efficient scheduling algorithms to trade off locality and load imbalance. We provide a detailed performance evaluation of the run-time technique

College of William & Mary: W&M Publish

Distribuert fellesminne i software over SCI

Author: Spjelkavik Henning
Publication venue
Publication date: 01/01/1999
Field of study

Denne oppgaven diskuterer utnyttelse av SCI i et system for distribuert fellesminne implementert i software. Foruten å benytte SCI som et medium for rask meldingsutveksling, er også muligheten for å skrive direkte til minnet i en annen maskin via SCI-kortet testet. Konklusjonen er at man har mulighet til å oppnå bedre ytelse ved å utnytte SCI og den ekstra funksjonaliteten. Dette krever dog at programmereren har kjennskap til programmets datastrukturer og bruk av data for å få en effektiv utnyttelse

Recommended from our members

Physics Division annual report, April 1, 1995--March 31, 1996

Author: Thayer K. J.
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date: 01/11/1996
Field of study

The past year has seen several major advances in the Division`s research programs. In heavy-ion physics these include experiments with radioactive beams of interest to nuclear astrophysics, a first exploration of the structure of nuclei situated beyond the proton drip line, the discovery of new proton emitters--the heaviest known, the first unambiguous detection of discrete linking transitions between superdeformed and normal deformed states, and the impact of the APEX results which were the first to report, conclusively, no sign of the previously reported sharp electron positron sum lines. The medium energy nuclear physics program of the Division has led the first round of experiments at the CEBAF accelerator at the Thomas Jefferson National Accelerator Facility and the study of color transparency in rho meson propagation at the HERMES experiment at DESY, and it has established nuclear polarization in a laser driven polarized hydrogen target. In atomic physics, the non-dipolar contribution to photoionization has been quantitatively established for the first time, the atomic physics beamline at the Argonne 7 GeV Advanced Photon Source was constructed and, by now, first experiments have been successfully performed. The theory program has pushed exact many-body calculations with fully realistic interactions (the Argonne v{sub 18} potential) to the seven-nucleon system, and interesting results have been obtained for the structure of deformed nuclei through meanfield calculations and for the structure of baryons with QCD calculations based on the Dyson-Schwinger approach. Brief summaries are given of the individual research programs

UNT Digital Library