55,883 research outputs found
Recommended from our members
Constant-time cost evaluation for behavioral partitioning
Given a system behavioral specification, partitioning can be used to distribute among chips the processes, procedures, and storage elements that comprise the specification. We introduce a technique for constant-time recomputation of pin, area, and execution-time estimates for a behavioral partitioning move. The technique permits fast, accurate estimations of a large number of partitionings, thus enabling better results than approaches which attain tractable computation time by using gross estimates or less thorough partitioning algorithms. The key to our technique is the isolation and extraction before partitioning of the basic design attributes needed for estimation, and the updating of this information in constant-time for each move. The estimation models are almost as detailed as those presented in previous estimation approaches not intended for constant-time update. The results we provide indicate the speed and practicality of our estimation approach in conjunction with sophisticated partitioning algorithms
Bounding inconsistency using a novel threshold metric for dead reckoning update packet generation
Human-to-human interaction across distributed applications requires that sufficient consistency be maintained among participants in the face of network characteristics such as latency and limited bandwidth. The level of inconsistency arising from the network is proportional to the network delay, and thus a function of bandwidth consumption. Distributed simulation has often used a bandwidth reduction technique known as dead reckoning that combines approximation and estimation in the communication of entity movement to reduce network traffic, and thus improve consistency. However, unless carefully tuned to application and network characteristics, such an approach can introduce more inconsistency than it avoids. The key tuning metric is the distance threshold. This paper questions the suitability of the standard distance threshold as a metric for use in the dead reckoning scheme. Using a model relating entity path curvature and inconsistency, a major performance related limitation of the distance threshold technique is highlighted. We then propose an alternative time—space threshold criterion. The time—space threshold is demonstrated, through simulation, to perform better for low curvature movement. However, it too has a limitation. Based on this, we further propose a novel hybrid scheme. Through simulation and live trials, this scheme is shown to perform well across a range of curvature values, and places bounds on both the spatial and absolute inconsistency arising from dead reckoning
CRAFT: A library for easier application-level Checkpoint/Restart and Automatic Fault Tolerance
In order to efficiently use the future generations of supercomputers, fault
tolerance and power consumption are two of the prime challenges anticipated by
the High Performance Computing (HPC) community. Checkpoint/Restart (CR) has
been and still is the most widely used technique to deal with hard failures.
Application-level CR is the most effective CR technique in terms of overhead
efficiency but it takes a lot of implementation effort. This work presents the
implementation of our C++ based library CRAFT (Checkpoint-Restart and Automatic
Fault Tolerance), which serves two purposes. First, it provides an extendable
library that significantly eases the implementation of application-level
checkpointing. The most basic and frequently used checkpoint data types are
already part of CRAFT and can be directly used out of the box. The library can
be easily extended to add more data types. As means of overhead reduction, the
library offers a build-in asynchronous checkpointing mechanism and also
supports the Scalable Checkpoint/Restart (SCR) library for node level
checkpointing. Second, CRAFT provides an easier interface for User-Level
Failure Mitigation (ULFM) based dynamic process recovery, which significantly
reduces the complexity and effort of failure detection and communication
recovery mechanism. By utilizing both functionalities together, applications
can write application-level checkpoints and recover dynamically from process
failures with very limited programming effort. This work presents the design
and use of our library in detail. The associated overheads are thoroughly
analyzed using several benchmarks
- …