Scientific codes which use iterative methods are often difficult to
parallelize well. Such codes usually contain \code{while} loops which
iterate until they converge upon the solution. Problems arise since
the number of iterations cannot be determined at compile time, and
tests for termination usually require a global reduction and an
associated barrier. We present a method which allows us avoid
performing global barriers and exploit pipelined parallelism when
processors can detect non-convergence from local information.
(Also cross-referenced as UMIACS-TR-96-31.1