2 research outputs found
Designing an Adaptive Application-Level Checkpoint Management System for Malleable MPI Applications
Dynamic resource management opens up numerous opportunities in High
Performance Computing. It improves the system-level services as well as
application performance. Checkpointing can also be deemed as a system-level
service and can reap the benefits offered by dynamism. A checkpointing system
can have better resource availability by integrating with a malleable resource
management system. In addition to fault tolerance, the checkpointing system can
cater to the data redistribution demand of malleable applications during
resource change. Therefore, we propose iCheck, an adaptive application-level
checkpoint management system that can efficiently utilize the system and
application level dynamism to provide better checkpointing and data
redistribution services to applications.Comment: Third International Symposium on Checkpointing for Supercomputing
(SuperCheck-SC22