A. Description of the Scientific Research Goals
The purpose of this research was to develop and implement compiler assisted strategies for recovery through multiple instruction re~execution (rollback) in highly parallel computer architectures utilizing hierarchical shared memories. The goal was to facilitate very rapid recovery from high rates of transient and intermittent failures in SDI environments. We worked to achieve this goal with minimal impact on system performance and little hardware overhead by exploiting the hardware features already present in recently developed high performance processor architectures. Our objective was to demonstrate that through appropriate compilation techniques these hardware features can be utilized to perform rapid recovery, without significant architecture redesign. Our research effort concentrated on multiprocessor machines with hierarchical memory structures, due to the architectural trend toward hierarchical memory, shared variable, multiprocessor architectures and due to the current lack of understanding as to how rapid recovery can be accomplished in this class of machines.
B. Summary of Significant Results
Our research results include the development of techniques for rapid recovery in multiprocessor systcms by compiler strategies for insertion and maintenance of checkpoints as well as new memory management protocols for rapid recovery from transient failures. Additional results include the development of a memory management protocol for rapid recovery in shared virtual memory environments and also distributed shared memory architectures. We integrated the checkpointing process with shared virtual memory protocols and also developed a twin-page storage management protocol for rapid recovery from failures. The strategy allows for recovery without explicit undoes or propagated rollbacks. Our compiler-based checkpointing results focus on the development of techniques for maintaining desired checkpoint intervals and performing live variable analysis for minimizing the size of checkpoints.
We developed a method of applying optimizing compiler techniques to signature monitoring to reduce performance overhead and simplify monitor hardware. We showed that some previous signature insertion approaches have exponential algorithm complexity and we developed an algorithm with O(N^2) complexity. We have implemented this technique in the GNU optimizing compiler and evaluated the effectiveness of this approach with large production application programs. In addition, we developed approaches for bounding the error detection latency (for fast recovery) and evaluated the impact on performance and memory overheads. Dear Dr. Bromley, Enclosed please find the final report for grant N00014-88-K-0656 by Kent Fuchs and myself. In the last three years, this grant has resulted in several significant research results published in major conferences and journals. We have summarized these results in the report. Please feel free to contact us if there is any other information that we should provide regarding the project.
C. List of Publications
Our business office manager just informed us that he had yet to receive the final allocation of $8,000 from ONR to balance the account. Please transfer the fund at your earliest convenience.
All in all, we would like to thank you again for your support in the last three years. We look forward to discussing other research opportunities with you in the future.
Sincerely,
Wen-mei Hwu
Enclosure.
