Abstract. In this work we discuss a class of defect correction methods which is easily adapted to create parallel time integrators for multi-core architectures and is ideally suited for developing methods which can be order adaptive in time. The method is based on Integral Deferred Correction (IDC), which was itself motivated by Spectral Deferred Correction by Dutt, Greengard and Rokhlin (BIT-2000).
1. Introduction. In this paper, we construct and analyze a class of novel time integrators for initial value problems (IVP), known as Revisionist Integral Deferred Correction methods (RIDC), which can be efficiently implemented with multi-core architectures. We adopt the "revisionist" terminology to highlight that (1) this is a revision of the standard IDC formulation, and (2) successive corrections, running in parallel but lagging in time, revise and improve the approximation to the solution. Although the RIDC schemes are presented in the integral deferred correction framework, this new method can equivalently be viewed as a sequence of time steps with an occasional "reset".
Integral deferred correction (IDC) methods, also known in some instances as spectral deferred correction (SDC), were first introduced by Dutt, Greengard and Rokhlin [9] , and were further developed in [1, 22, 21, 18, 19, 7, 6, 29, 23, 17 ]. An iterative correction procedure is used to correct an estimate of the solution by solving an integral formulation of the error equation. While it has been demonstrated that IDC methods are competitive with popular high-order Runge-Kutta (RK) integrators [6] , there is no systematic mechanism for spreading the workload on multi-core architectures for either IDC or popular RK methods. The issue is that both IDC and RK methods are formulated in a sequential framework, in that a high-order approximation to the solution at time t + ∆t can not be computed until a high-order approximation to the solution at time t is known. This drawback of a sequential integrators is an important area to address as computing technologies are moving in the direction of multi-core architectures, and many existing algorithms will need to be modified to effectively make use of these advances.
Several approaches have been proposed to address the limitations of sequential integrators. In [31] , Nievergelt proposed a framework for decomposing the time integration interval into sub-intervals, solving the initial value problem on each sub-interval concurrently, and then enforcing continuity of the solution. In [30] , Miranker and Liniger discuss parallel integrators based on predictor/corrector ideas. This approach leads them to a family of integrators, which are essentially second-and third-order multi-step Adams schemes where certain steps can be computed in parallel, as well as second-and third-order RK integrators, where certain stages can be computed in parallel. A more contemporary approach are the "parareal" algorithms, where the time integration interval is once again divided into sub-intervals and a serial prediction computation is performed, followed by a parallel corrective iteration and then a serial corrective iteration. Discussion and analysis of parareal algorithms can be found in [11, 12, 24, 27, 13] . There has also been recent work in [28] , using SDC methods to take successive parallel and serial corrective iterations. We note that parareal algorithms appear particularly promising for solving IVPs when a very large number of computing cores (e.g., > 32) are available.
Our proposed family of time integrators take a different approach in that the prediction and correction loops are all performed in parallel. For example, in roughly the same wall-clock time it would take to compute a first-order forward Euler approximation, our RIDC algorithm can compute a fourth-order approximation to the solution of an IVP, provided four computing cores are available for the computation. If an even higher-order integrator is required, the RIDC algorithm can compute an eighth-order approximation to the solution in roughly the same amount of wall-clock time as a forward Euler integrator, provided eight computing cores are available for the computation. Alternatively, on four cores, RIDC can achieve eighth order in the same wall clock time as a second order RK integrator, provided that second order RK integrator is embedded in IDC. A non-exhaustive list of applications where our RIDC algorithms will be advantageous includes classic N -body simulations involving charged particle interactions in plasmas [34, 8, 4] , biological systems [2, 25] , and gravitational bodies [3, 16] . Additionally, one can imagine applying RIDC algorithms to explicit/implicit solutions of PDEs with non-local operators such as Landau-Fokker-Planck [10] or image processing applications [14] . This paper is divided into five main sections. In §2, IDC methods are reviewed and RIDC methods are introduced. In §3, the implementation of RIDC methods in a multi-core environment is discussed. Convergence and stability of RIDC methods are presented in §4. Then, numerical benchmarks comparing IDC, RIDC, and popular Runge-Kutta methods are given in §5, followed by conclusions in §6.
2. IDC and RIDC methods. In this section, we review IDC methods from [9] , and introduce our new RIDC methods. All of these methods compute an approximate solution to an IVP consisting of a system of ODES and initial conditions,
In this paper, we focus the bulk of our discussions on IDC and RIDC methods constructed using first-order forward Euler integrators, and note that IDC and RIDC methods can easily be constructed using higher-order implicit and explicit RK integrators if the time domain is discretized uniformly [7, 6] . IDC and RIDC methods are predictor-corrector schemes. Given an approximate solution η(t) to the exact solution y(t), the error of the approximate solution is
If we define the residual as (t) = η (t) − f (t, η(t)), then the derivative of the error (2.2) satisfies
The integral form of the error equation can then be obtained,
Substituting the definition for the residual gives
3)
The basic idea of this paper is that while a single computing core is computing a forward Euler predicted solution to IVP (2.1), other computing cores can simultaneously compute increasingly accurate corrections using equation (2.3) . Each correction revises the approximate solution, improving the accuracy by, for example, one order of accuracy. Because each computation occurs simultaneously, high-order accurate results are obtained in just slightly more wall-clock time than a forward Euler computation on a single processor computer. We first describe standard IDC methods, and then introduce the revisions that result in a scheme that can be computed in parallel. I j = {t j,0 , t j,1 , . . . , t j,M }, j = 0, . . . , J − 1, contains p = M + 1 nodes. This is illustrated in Figure 2 .1 IDC methods are predictor-corrector schemes that iterate completely on each group of intervals I j sequentially, starting with j = 0. The idea is as follows. 1. Suppose that the solution is known at t j,0 . This is true for j = 0, where η(t 0,0 ) = α. 2. Using an integrator of choice, solve the IVP η = f (t, η) for a provisional solution at all nodes contained in I j . Denote this provisional solution as η j,M , j ← j + 1 and return to step 1. It was shown in [9] that if forward Euler integrators are used to solve for the provisional solution in step 2, and M forward Euler corrections are applied in steps 3 and 4, then the resulting algorithm is (M + 1)
st -order accurate. The IDC algorithm using forward Euler integrators is written out explicitly in Algorithm 1, where line 19 shows a forward Euler discretization of equation (2.3), manipulated to give an update formula to the provisional solution η
where S mi are quadrature weights, compactly expressed in an integration matrix whose elements are defined as integrals of Lagrange interpolating polynomials,
2.1.1. IDC with Runge-Kutta integrators. It was shown in [6] that if an r th -order Runge-Kutta integrator is used in the prediction and correction loops (instead of forward Euler), then the order of accuracy of the IDC method increases to order r × (M + 1). For example, if a trapezoidal RK-2 discretization of (2.3) is performed instead, an update formula for the provisional solution, after algebraic manipulations, is
where
It was also shown in [6] that using higher-order integrators to solve for the provisional and corrected solution results in IDC schemes with superior stability properties.
Input: endpoints {a, b}; initial condition α; number of intervals N ; order of method p (Note that we require N to be divisible by M = p − 1, so that there are J groups of M intervals) (Initialize and pre-compute integration matrix)
Algorithm 1: IDCp-FE algorithm -a p th -order IDC method constructed using forward Euler integrators for the prediction and correction loops.
Revisionist IDC methods.
The proposed RIDC algorithm relaxes the requirement of iterating M correction loops completely in each group of M intervals, I j = {t j,0 , t j,1 , . . . , t j,M }. Instead, we allow each group to contain K intervals, where K M , but still iterate only M times, using a stencil size of (M + 1) to approximate the quadrature of the residual. For example, if K = N 2 , then there are J = 2 groups and this modified grouping is illustrated in Figure 2 .2.
Enumerating nodes such that each grouping contains K + 1 nodes.
The RIDC algorithm for a p th -order method (using M = p − 1 Euler corrections) is described in Algorithm 2. Note that if K = M , then the RIDC method is exactly the previously formulated p th -order IDC method. We note that this algorithm could also be considered as a sequence of K time steps after which a "reset" is performed, where a reset takes the most accurate solution available at t = t (namely η
[M ] (t )), and uses that solution as the "initial conditions" to restart the algorithm. The total number of resets performed is J = N K . At first glance, RIDC appears to offer little benefit over IDC. However, if there are multiple computing cores available to compute the portions of the algorithm simultaneously, then the result can potentially be computed faster. The implementation and benefit of this RIDC generalization for multi-core architectures is described in §3, and the analysis justifying this formulation is given in §4.
Lines 20 and 24 of Algorithm 2 use quadrature to approximate tm+1 tm f (t, η(t)) dt. The choice of stencils for this quadrature is not unique. In line 20, we have chosen stencils that correspond to IDC methods if K = M ; specifically for m < M , we use function values at the nodes {t 0 , t 1 , . . . , t M }. The quadrature weights are consequently elements of the integration matrix S mi , introduced earlier in equation (2.5). In line 24 (for m ≥ M ), we have chosen the stencil which uses the nodes {t m−M , t m−(M −1) , . . . , t m }. Since we are working with nodes that are uniformly spaced, the previously computed quadrature weights can be used to approximate the integral at each time step,
2.3. RIDC methods with reduced stencils and order adaptivity. In both Algorithm 1 and Algorithm 2 for p th -order IDC and RIDC methods respectively, the integral of the residual is computed by fitting an M th -degree Lagrange polynomial through a data set at every correction loop, and then performing quadrature on that interpolating polynomial. However, it is not necessary to find the M th -degree Lagrange interpolating polynomial at every correction loop. It suffices to find the l thdegree Lagrange interpolating polynomial at the l th correction loop. In §3.1.1, we will see that the reduced stencils corresponding to lower-degree polynomial interpolants Input: endpoints {a, b}; initial condition α; number of intervals N ; order of method p; number of correction loops M = p − 1; K intervals per group (note that we require N to be divisible by K so that there are J groups of K intervals) (Initialize and pre-compute integration matrix)
th -order RIDC method constructed using K subintervals, and forward Euler integrators for the prediction and M = p − 1 correction loops. lead to faster start up times. Also lower-degree interpolating polynomials may be less oscillatory [32] . Algorithm 3 describes the RIDC method with reduced stencils. Note that the integration matrices can still be precomputed.
If one is willing to store the solution for an entire group of intervals, η [l] j,m 0 ≤ m ≤ K, the error at t j,M can be estimated to determine if an additional correction loop is required to guarantee that the error is below some tolerance. Note that this form of p-adaptivity (i.e., adaptivity in terms of order) was not previously obvious for p th -order IDC methods where there was little motivation for taking fewer than M correction loops, and no obvious way to make use of previously computed values to generate an (p + 1) st -order method.
Input: endpoints {a, b}; initial condition α; number of intervals N ; order of method p; number of correction loops M = p − 1; K intervals per group (note that we require N to be divisible by K so that there are J groups of K intervals) (Initialize and pre-compute integration matrices)
Algorithm 3: RIDC(p, K)-FE algorithm with reduced stencils -a p th -order RIDC method constructed using K subintervals, forward Euler integrators for the prediction and M = p − 1 correction loops, and reduced stencils.
3. Multi-core / Multi-cpu implementation. RIDC methods can be efficiently computed in a multi-core, multi-cpu environment (e.g., an shared memory machine), or graphics processing unit (GPU) environment, where information can be conveyed rapidly from one computing core to another. In this section, we outline the strategy for using multiple computing cores, and discuss implementations for minimizing memory usage. The multi-core implementation of RIDC methods excels when the function f (t, y) in equation (2.1) is expensive to compute. This is a realistic scenario for many problems of interest.
3.1. Using multiple computing cores. We outline the strategy for using p computing cores to solve an p th -order RIDC method given in Algorithm 2. These ideas also extend to RIDC methods with reduced stencils given in Algorithm 3.
For each group of intervals I j = {t j,0 , t j,1 , . . . , t j,K }, the strategy is as follows. We drop the subscript j, with the understanding that this RIDC method is described for one group of intervals, i.e., we set t m := t j,m and η m = η j,m .
• Start one computing core on the prediction loop to compute η m , m = 1, . . . , K using the forward Euler scheme.
• Use a second core to compute the first correction loop, i.e., η [1] m , m = 1, . . . , K, once sufficient information is available from the prediction loop. Specifically, the second core computes η
z+M have been computed by the first computing core, where z = max(m − M, 0).
• Use a third core to compute the second correction loop, i.e., η [2] m , m = 1, . . . , K, once sufficient information is available from the first correction loop.
• Use a fourth core to compute the third correction loop, once sufficient information is available from the second correction loop, and so on. st processor. This can be easily implemented using the multiprocessing module in Python [33] . For optimal use of memory, the l th processor should not get too far ahead of (l + 1) st processor: that way, older values of η [l] and f (η [l] ) which are no longer needed can be discarded. m is shown in the dashed region for a fourth-order RIDC method. Additionally, this figure shows the minimum memory "foot print" if special care is taken to minimize the amount of data that has to be stored. Specifically, if the processors are not allowed to get too far ahead of each other, then the minimum data values that have to be stored to compute the open circles are shown as black dots. Central to our algorithm, is the realization that the prediction and each of the three corrections can be computed simultaneously and independently. After all four computations have completed, the oldest value at each level can be discarded and the process repeated. a p-core implementation of a p th order RIDC method constructed using J groups of K intervals and forward Euler integrators in the prediction and correction loops (denoted RIDC(p, K)) can be computed in N + JM 2 steps. This can be compared to the forward Euler method running on a single processor which takes N steps. We can consider the ratio as γ =
For example, consider the scenario where one discretizes a time domain into 60 intervals, and applies a forward Euler time integrator. If that simulation takes one minute on a single processor, then an RIDC(4,60) in theory will generate a fourthorder solution in 1 minute and 9 seconds if four computing cores are available for the computation. Here, γ = 1.15 and one would expect the RIDC simulation to take 15% longer than that of forward Euler. The ratio γ decreases if K M . For example, a four core implementation of RIDC(4,600) in theory has a 1.5% increase in clock time over a single core forward Euler time integrator with 600 intervals. Practical implementation details, such as communication delays, shared memory overhead or other interprocess communication issues, would be expected to increase the wall clock time for the RIDC method. Actual timing tests are provided in §5 for a system of interacting particles.
For the RIDC method with the reduced stencils, the start up delay for each group of intervals is roughly halved: the ratio γ is reduced to γ = 1 +
2 . An obvious question to pose is how to choose K. As K increases, the ratio between the computational and idle processor time increases. However, this apparent gain in efficiency may be offset by an accumulation of error. A careful numerical study is performed in Example 5.2.
3.2. Other RIDC algorithms. As discussed in §2.3, for truly p-adaptive (i.e., variable order) schemes, the error e m of the solution after the l th correction could be estimated before deciding whether to perform an additional correction. In terms of implementation, this may require that the data for the entire l th correction level be stored. The practicality and performance of p-adaptive schemes is not explored in this paper.
We have described RIDC using forward Euler integrators. However, if the number of computing cores available is less than the order of the method desired, one could consider embedding higher-order integrators in the prediction and/or correction loops, as was done for IDC in [7, 6] and described in §2.1.1.
4. Convergence and stability of RIDC methods. We first discuss convergence of RIDC methods, followed by the stability regions of RIDC methods, and the impact of the number of intervals on the error.
Convergence of RIDC methods.
The analysis in [7] (and summarized in [6] ), proving convergence under mild conditions for IDC methods, extends simply to these RIDC methods.
Theorem 4.1. Let f (t, y) and y(t) in IVP (2.1) be sufficiently smooth. Then, the local truncation error for an RIDC method constructed using K > p uniformly distributed nodes {t k = kh, k = 0, . . . , K}, an Euler prediction and M = p − 1 Euler correction loops is O(h p+1 ). Proof. The proof follows from the analysis in [7] . We simply replace {M ← K, k l ← M } in Theorem 4.1 of [7] , and the proof follows. We refer the reader to [7] for the actual proofs.
Theorem 4.2. Let f (t, y) and y(t) in IVP (2.1) be sufficiently smooth. Then, the local truncation error for an RIDC method constructed using K > M + 1 uniformly distributed nodes {t k = kh, k = 0, . . . , K}, an r 0 th -order Runge-Kutta method in the prediction loop, and (r 1 , r 2 , · · · , r M ) th -order RK methods in the correction loops, is
The proof follows from the analysis in [7] . We simply replace {M ← K, k l ← M } in Theorem 5.3 of [7] , and the proof follows. We refer the reader to [7] for the actual proofs. after a time step of size ∆t for λ ∈ C, i.e., Am(λ∆t) = y(∆t). Definition 4.4. The stability region, S, for a numerical method, is the subset of the complex plane C, consisting of all λ∆t such that Am(λ∆t) ≤ 1,
In Figure 4 .1, we plot the stability regions for RIDC (4, 4) and RIDC(4,40) after the prediction and correction loops to show how the corrective loops alter the stability regions. Note that to generate this plot, we take N = K = 4 and N = K = 40, respectively, in Algorithm 2, and scale the resulting stability regions by the number of intervals in the group, K. Correspondingly, one observes a circle of radius one for the stability region after the predictive loop, which is consistent with the forward Euler integrator used. Interestingly, RIDC(4,4) (which corresponds to the standard fourth-order IDC method) encompasses an increasing amount of the imaginary axis as the number of correction loops is increased, whereas the imaginary inclusion of the RIDC(4,40) scheme is reduced after the second corrective loop.
The change in the shape of the stability region after the third correction loop for RIDC (4, 4) and RIDC(4,40) is not particularly surprising, since one might expect the (4,40) are shown for the prediction (simply that of FE) and each of the three corrections. The third correction shows the actual stability of the method. While the radius of the largest disk that can be contained in RIDC(4,40) stability region after third corrective loop is larger than that of RIDC (4,4), the loss of the imaginary inclusion is disappointing. stability region to approach that of forward Euler as K increases. Correspondingly, we plot the stability regions for several fourth-order RIDC schemes, i.e., the stability regions for RIDC(4,K) for various K in Figure 4 .2. The main observations that one should make are, for a fixed order (i) the amount of imaginary inclusion decreases as the number of intervals K increases, and (ii) the stability region approaches that of the forward Euler method as K increases. (4,40) with reduced stencils. No significant differences are observed between the stability regions for RIDC methods and RIDC methods using reduced stencils.
4.3.
Choice of K, the number of intervals in each group. As mentioned in §3, maximizing the ratio K M for multi-core systems minimizes the amount of idle processor time. However as the ratio K M is increased, the overall accuracy of the solution is expected to decrease. In §5 example 5.2, we explore the impact of the number of intervals K for a test equation, noting that the optimal choice of K is likely to be problem dependent.
5. Numerical benchmarks. Example 5.1. This first example validates the order of accuracy of the constructed RIDC schemes. We consider the initial value problem
which has the analytic solution y(t) = (1 + t 2 ) 2 . Table 5 .1 shows the error and rate of convergence for the RIDC schemes, where RIDC(p, 40) denotes the p th -order RIDC scheme described in Algorithm 2, constructed using K = 40 intervals per group. The numerical solution was compared to the analytic solution. In Table 5 .2, a convergence test is shown for the RIDC schemes using reduced stencils.
Example 5.2. In this example, we consider the nonlinear ODE system presented in [1] ,
which has the analytic solution y(t) = (cos t, sin t) T . We use this example to study the choice of K, the number of intervals in each grouping. This particular problem was chosen because additional properties of the system (e.g., the squared amplitude error, y , and the corresponding phase error) can be studied in addition to the error of the actual components.
In this numerical study, we fix the number of intervals at N = 1000, and show the effect of the interval groupings in Figure 5 .1 for RIDC(4,K). Three different measures of error are shown and in all three, it appears advantageous to select K larger than M = 4. There is a minimum in error for K between 40 and 100, after which the error increases again. We expect this minimum to be problem dependent. 
. Then, the equations of motion are given by
where d > 0 is a regularization constant. The initial conditions chosen are
The time to evaluate the right-hand-ride scales like the square of the number of particles.
We implement the RIDC algorithm on a multi-core architecture, and demonstrate the speedup obtained with such an implementation. Our implementation uses the multiprocessing module in Python [33] . Computations and timings are performed on a 16-core machine: a Dell Poweredge R900 with four quad-core Xeon X7350 CPUs running at 2.93 GHz sharing 32 GiB of memory. Table 5 .3: Wall-clock timings on the particle problem. Fourth-and eighth-order RIDC schemes utilize four and eight cores respectively, FE and RK schemes use only one core. The "theoretical time" is based on comparing to the actual time of the forward Euler method. For the Runge-Kutta schemes it is the number of stages (here 4 and 13 respectively) times the forward Euler time. For the RIDC schemes (with reduced stencils), it is based on the analysis in §3.1.1.
Our simulations use 400 particles (N + = N − = 200) and d = 0.05. Tables 5.3  and 5 .4 compare the forward Euler integrator, fourth-and eighth-order RIDC (using four and eight cores respectively), and two Runge-Kutta schemes (using one core only). Here RK4 is the popular fourth-order Runge-Kutta scheme and RK8 is an eighth-order, 13-stage Runge-Kutta [26] . For a fixed number of intervals N = 320, Table 5 .3 shows that the Runge-Kutta schemes take significantly longer than forward Euler, whereas the RIDC methods take approximately the same wall-clock time as forward Euler. We note the theoretical wall-clock time based on the analysis in §3.1.1 does not include overhead due to inter-process communication, latency or other concerns. The actual observed ratios γ are close to, although larger than, their theoretical values. It may be possible to improve our Python implementation. For fourth-order RIDC, we observe the expected increase in γ for smaller values of K (fewer intervals in each group). We observe a dependence of error on K which is consistent with the results of Example 5.2.
The errors shown in Table 5 .3 are potentially misleading in the sense that the errors for RK4 and RK8 are superior to those of the RIDC methods. However, the actual computation time is also much larger. A direct comparison between the methods is in Table 5 .4 where N is chosen such that each method computes the solution at T = 10 in roughly the same fixed amount of wall-clock time. Here it is clear that RIDC8 computed with eight cores is vastly superior to the other schemes and in particular the eighth-order Runge-Kutta method. Figure 5 .2 shows the rate of convergence of second-, third-and fourth-order RIDC methods for the error as a function of the number of intervals. We note that RIDC4 has a larger error per step than the popular RK4 Runge-Kutta scheme by about one order of magnitude. However, the right plot of Figure 5 .2 shows the error as a function of the wall-clock time, the appropriate measure of effectiveness for these multi-core simulations. Using four cores, we observe a roughly twofold speedup in RIDC4 over the popular RK4 Table 5 .4: For a fixed amount of wall-clock time, the error at T = 10 resulting from the various integrators is shown. These results are for the particle problem Example 5.3 and the error is the relative error in electron position. Note RIDC8 has lower error than the other schemes.
integrator. 3) using RIDC2, RIDC3 and RIDC4 (using two, three, and four cores respectively) with K = min(N, 160) and reduced stencils. A convergence study (left) shows the methods achieve their design orders. In particular, note that RK4 has significantly better error per step. However, examining the wall-clock time (right), reveals that RIDC4 can be more efficient than RK4. Figure 5 .3 (left), we note that the higher-order RIDC5, RIDC6, RIDC7 and RIDC8 schemes also achieve their design orders. In the wall-clock timings in Figure 5.3 (right) , we see that RIDC8 running on eight cores significantly outperforms RK4 and RK8, an eighth-order Runge-Kutta scheme [26] . In particular, RIDC8 can produce an error of 10 −8 roughly 10 times faster than RK4 and roughly four times faster than RK8. Put another way, given the same fixed amount of wall-clock time, RIDC8 can produce an error roughly four orders of magnitude smaller than the eighth-order Runge-Kutta scheme.
6. Conclusion. In this work we discussed a class of defect correction methods which is easily adapted to create parallel time integrators for multi-core architectures. The class is also well-suited for developing methods which can be order-adaptive in 2) using high-order RIDC5, RIDC6, RIDC7 and RIDC8 (using five, six, seven, and eight cores respectively) with K = N and reduced stencils. A convergence study (left) shows the methods achieve their design orders up to order 8. The wall-clock time (right) shows that the RIDC significantly outperform high-order Runge-Kutta schemes.
time. The methods presented are a revised formulation of explicit integral deferred correction, dubbed revisionist IDC, which can achieve p th -order accuracy in wall-clock time comparable to a single forward Euler integration.
The key idea is to re-write the defect correction framework so that, after initial startup costs, each correction loop can be lagged behind the previous correction loop in a manner that facilitates running the M = p − 1 correctors and predictor in parallel on an interval which has K steps, where K p. In addition, we proved that given an r th -order Runge-Kutta method in both the prediction and correction loops of RIDC, then the method is order r × (M + 1) after M correction loops. Numerical experiments were presented to validate our parallel integrators. A four-core implementation of fourth-order RIDC4 gives a speedup of roughly two over the popular fourth-order Runge-Kutta integrator for an N -body problem. On the same problem, eighth-order RIDC8 running on eight cores produces an error roughly four orders of magnitude smaller than an eighth-order Runge-Kutta method in the same wall-clock time.
The ideas behind RIDC extend to implicit and semi-implicit IDC [5] and have high potential in this area. They also extend to other defect correction methods (e.g., [1] ). The authors are presently exploring strong stability preserving [20, 15] and other nonlinear stability properties of these RIDC schemes.
An interesting topic yet to be fully explored is the issue of adaptivity. With the advent of modern day computing architectures such as the Intel Larrabee chip and the Nvidia Tesla chip, it is not far fetched to imagine that a user might have access to a large number of computing cores. The question arises thus, how do order-adaptive RIDC methods (where, if the residual after the p th correction loop is too large, an additional correction iteration is locally performed) compare with the usual paradigm of locally refining the time step? In the order adaptive paradigm, efficient integrators (such as RK4) can be embedded within the correction framework whereas in the usual paradigm of locally refining the time step, the resulting non-uniform stencil may constrain the RIDC method to incorporating only first order correctors. The authors are researching adaptive RIDC schemes.
