11 research outputs found
Data Replication Strategies for Fault Tolerance and Availability on Commodity Clusters
Recent work has shown the advantages of using persistent memory for transaction processing. In particular, the Vista transaction system uses recoverable memory to avoid disk I/O, thus improving performance by several orders of magnitude. In such a system, however, the data is safe when a node fails, but unavailable until it recovers, because the data is kept in only one memory. In contrast, our work uses data replication to provide both reliability and data availability while still maintaining very high transaction throughput. We investigate four possible designs for a primary-backup system, using a cluster of commodity servers connected by a write-through capable system area network (SAN). We show that logging approaches outperform mirroring approaches, even when communicating more data, because of their better locality. Finally, we show that the best logging approach also scales well to small shared-memory multiprocessors
Portable high-performance programs
Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1999.Includes bibliographical references (p. 159-169).by Matteo Frigo.Ph.D
Software and hardware methods for memory access latency reduction on ILP processors
While microprocessors have doubled their speed every 18 months, performance improvement of memory systems has continued to lag behind. to address the speed gap between CPU and memory, a standard multi-level caching organization has been built for fast data accesses before the data have to be accessed in DRAM core. The existence of these caches in a computer system, such as L1, L2, L3, and DRAM row buffers, does not mean that data locality will be automatically exploited. The effective use of the memory hierarchy mainly depends on how data are allocated and how memory accesses are scheduled. In this dissertation, we propose several novel software and hardware techniques to effectively exploit the data locality and to significantly reduce memory access latency.;We first presented a case study at the application level that reconstructs memory-intensive programs by utilizing program-specific knowledge. The problem of bit-reversals, a set of data reordering operations extensively used in scientific computing program such as FFT, and an application with a special data access pattern that can cause severe cache conflicts, is identified in this study. We have proposed several software methods, including padding and blocking, to restructure the program to reduce those conflicts. Our methods outperform existing ones on both uniprocessor and multiprocessor systems.;The access latency to DRAM core has become increasingly long relative to CPU speed, causing memory accesses to be an execution bottleneck. In order to reduce the frequency of DRAM core accesses to effectively shorten the overall memory access latency, we have conducted three studies at this level of memory hierarchy. First, motivated by our evaluation of DRAM row buffer\u27s performance roles and our findings of the reasons of its access conflicts, we propose a simple and effective memory interleaving scheme to reduce or even eliminate row buffer conflicts. Second, we propose a fine-grain priority scheduling scheme to reorder the sequence of data accesses on multi-channel memory systems, effectively exploiting the available bus bandwidth and access concurrency. In the final part of the dissertation, we first evaluate the design of cached DRAM and its organization alternatives associated with ILP processors. We then propose a new memory hierarchy integration that uses cached DRAM to construct a very large off-chip cache. We show that this structure outperforms a standard memory system with an off-level L3 cache for memory-intensive applications.;Memory access latency has become a major performance bottleneck for memory-intensive applications. as long as DRAM technology remains its most cost-effective position for making main memory, the memory performance problem will continue to exist. The studies conducted in this dissertation attempt to address this important issue. Our proposed software and hardware schemes are effective and applicable, which can be directly used in real-world memory system designs and implementations. Our studies also provide guidance for application programmers to understand memory performance implications, and for system architects to optimize memory hierarchies
Exploiting cache locality at run-time
With the increasing gap between the speeds of the processor and memory system, memory access has become a major performance bottleneck in modern computer systems. Recently, Symmetric Multi-Processor (SMP) systems have emerged as a major class of high-performance platforms. Improving the memory performance of Parallel applications with dynamic memory-access patterns on Symmetric Multi-Processors (SMP) is a hard problem. The solution to this problem is critical to the successful use of the SMP systems because dynamic memory-access patterns occur in many real-world applications. This dissertation is aimed at solving this problem.;Based on a rigorous analysis of cache-locality optimization, we propose a memory-layout oriented run-time technique to exploit the cache locality of parallel loops. Our technique have been implemented in a run-time system. Using simulation and measurement, we have shown our run-time approach can achieve comparable performance with compiler optimizations for those regular applications, whose load balance and cache locality can be well optimized by tiling and other program transformations. However, our approach was shown to improve significantly the memory performance for applications with dynamic memory-access patterns. Such applications are usually hard to optimize with static compiler optimizations.;Several contributions are made in this dissertation. We present models to characterize the complexity and present a solution framework for optimizing cache locality. We present an effective estimation technique for memory-access patterns to support efficient locality optimizations and information integration. We present a memory-layout oriented run-time technique for locality optimization. We present efficient scheduling algorithms to trade off locality and load imbalance. We provide a detailed performance evaluation of the run-time technique
Distribuert fellesminne i software over SCI
Denne oppgaven diskuterer utnyttelse av SCI i et system for
distribuert fellesminne implementert i software. Foruten å benytte
SCI som et medium for rask meldingsutveksling, er også muligheten
for å skrive direkte til minnet i en annen maskin via SCI-kortet
testet. Konklusjonen er at man har mulighet til å oppnå bedre ytelse
ved å utnytte SCI og den ekstra funksjonaliteten. Dette krever
dog at programmereren har kjennskap til programmets datastrukturer
og bruk av data for å få en effektiv utnyttelse
Recommended from our members
Physics Division annual report, April 1, 1995--March 31, 1996
The past year has seen several major advances in the Division`s research programs. In heavy-ion physics these include experiments with radioactive beams of interest to nuclear astrophysics, a first exploration of the structure of nuclei situated beyond the proton drip line, the discovery of new proton emitters--the heaviest known, the first unambiguous detection of discrete linking transitions between superdeformed and normal deformed states, and the impact of the APEX results which were the first to report, conclusively, no sign of the previously reported sharp electron positron sum lines. The medium energy nuclear physics program of the Division has led the first round of experiments at the CEBAF accelerator at the Thomas Jefferson National Accelerator Facility and the study of color transparency in rho meson propagation at the HERMES experiment at DESY, and it has established nuclear polarization in a laser driven polarized hydrogen target. In atomic physics, the non-dipolar contribution to photoionization has been quantitatively established for the first time, the atomic physics beamline at the Argonne 7 GeV Advanced Photon Source was constructed and, by now, first experiments have been successfully performed. The theory program has pushed exact many-body calculations with fully realistic interactions (the Argonne v{sub 18} potential) to the seven-nucleon system, and interesting results have been obtained for the structure of deformed nuclei through meanfield calculations and for the structure of baryons with QCD calculations based on the Dyson-Schwinger approach. Brief summaries are given of the individual research programs