Abstract-The recent push for post-Moore computer architectures has introduced a wide variety of application-specific accelerators. One particular accelerator, the resistance network analog, has been well received due to its ability to efficiently solve partial differential equations, while eliminating the iterative stages required by today's numerical solvers. However, in the age of programmable integrated circuits, the static nature of the resistance network analog, and other analog mesh computers like it, has relegated it to an academic curiosity. Recent developments in materials, such as the memristor, have made the resistance network analogue viable for inclusion in future heterogeneous computer architectures. However, selection of an appropriate sized mesh to be incorporated into a computer system requires that energy-quality trade-offs are made regarding the problem size and required resolution of the solution. This paper provides an in-depth study of the scaling of analog mesh computer hardware, from the perspective of energy per bit and required resolution, introduces a metric to aid in quantifying analog mesh computers with different parameters, and introduces a method of virtualization which enables an analog mesh computer of a fixed size to approximate the calculations of a larger-sized mesh.
I. INTRODUCTION
HE slowing of Dennard scaling has ushered in a new era of computer design. No longer can the underlying computer architecture remain stagnant, and rely on advances in transistor manufacturing to realize major gains in performance, as called for by recent programs advancing computer architectures [1] . Several competing computation models and architectures have been introduced in response to the need for innovation in this area [3] , [22] , [23] , [44] . Computation models, such as approximate computing, recognize that precision is wasted in some calculations and gain efficiencies by trading off computation quality with effort expended [3] . Alternative computer architectures, such as non-Von Neumann, gain efficiencies by limiting unnecessary calculations or data movements.
Heterogeneous computing architectures achieve gains in efficiency through the combination of general-purpose processors with application-specific computation engines [2] . Efficiencies are realized through the use of coprocessors designed to accelerate specific computations in an efficient manner. approximately 30% [4] , while application-specific computations have been shown to have a greater than 5x increase in performance per watt [5] .
Partial Differential Equations (PDEs) are a common mathematical tool in scientific computing and engineering, and are used to model physical phenomena such as fluid dynamics [35] , electricity [36] , magnetism [37] , mechanics [38] , optics [39] , and heat flow [6] . Due to the difficulty in solving such equations, multiple techniques have been invented to simplify their calculation, such as transformations into ordinary differential equations (ODEs) and numerical methods executed by computers. However, the numerical PDE solvers include a large iterative component, and much attention has been paid to creating efficient implementations through a reduction in iterations [7] , [8] . As PDEs form the basis for many applications in scientific computing, efficiencies gained in this domain would be of great benefit to the scientific community.
One specific implementation of a PDE solver, the resistive network analogue, uses a network of resistors to solve PDEs. In fact, this architecture was originally developed to provide efficient computation of heat transfer [9] and oscillatory flow problems in aeronautical engineering [10] . However, due to the dynamic nature of the physics problems this computer was intended to solve, its integration into a static very-large-scale integration (VLSI) architecture proved difficult, relegating the resistance network analogue, and other analog mesh computers like it, to an academic curiosity [11] . This shortcoming was improved upon by Ramirez-Angulo and DeYong in [11] with a VLSI-friendly implementation of an analog mesh computer using Complementary Metal Oxide Semiconductor (CMOS) transistors operated in the subthreshold regime. However, modern digital VLSI designs prefer the use of minimum-size devices, which is at odds with subthreshold CMOS designs, which require larger devices to ensure proper matching [12] . The recent introduction of new, programmable VLSI devices, such as the memristor, has created opportunities for innovative architectures that can take advantage of these new devices [21] , [22] . This leaves the analog mesh computer well positioned for inclusion in future heterogeneous architectures and promises a resurgence of interest in mesh-based accelerators and their integration into scientific computing systems.
In order to effectively integrate analog mesh computers into energy-quality (EQ) scalable systems, which require the ability to explicitly trade off energy and quality at different levels of abstraction [47] , a significant amount of information must be known about the capabilities of the particular physical mesh configuration and the problem size. The heat transfer T problem, for example, has many correct solutions; each one providing a solution with a different resolution. The resolution provided by an analog mesh computer is directly linked to its physical configuration, as it is influenced by the total number of elements and their interconnection [9] , [13] .
This paper aims to provide a comprehensive study of EQ scaling in analog mesh computers, and is organized as follows: Section II covers background material relevant to numerical solutions to PDEs and resistive mesh computers. This is followed by Section III, which covers hardware and software scaling in mesh computers. Section IV provides a discussion of results and future work, and is followed by concluding remarks in Section V.
II. BACKGROUND
Recent innovations in science and engineering can be traced back to advances in modeling and simulation. From the design of composite materials used on aircraft to the design and simulation of advanced microprocessors, each has been enabled by innovations in techniques used to efficiently model complex physical phenomena. However, the additional complexity required in the development of modern systems is at odds with expected productivity in the development of such systems. To continue the expected pace of innovation, techniques must be developed to eliminate existing bottlenecks in physical system modeling. Prior to elaborating on specific techniques, it is beneficial to understand which facets of physical modeling can benefit most from optimization.
A. Partial Differential Equations and Their Solutions
Many important physical systems can be modeled using PDEs [6] . PDEs are equations that involve rates of change with respect to continuous variables, and usually take the form
The difficulties in solving PDEs, when compared to ordinary differential equations, arise from a PDE's use of an infinitedimensional configuration space. Infinitedimensionality, while difficult to solve for, is necessary to adequately model complex systems, such as those commonly found in fluid dynamics. Additionally, it is important to note that a solution of a PDE is generally not unique; additional conditions must generally be specified on the boundary of the region where the solution is defined [45] .
Due to the breadth of physical phenomena that PDEs can model, much attention has been invested in the development of efficient PDE solvers, from different numerical methods [15] - [17] , to analytical models such as the electrolytic tank [14] . Each has its own advantages and drawbacks; the numerical methods often require lengthy and tedious work if a high degree of accuracy is desired, whereas the electrolytic tank is seriously limited in its accuracy [9] .
As they can be tuned for predetermined levels of accuracy and executed on general-purpose computers, numerical methods are a common technique for evaluating PDEs. One particular mechanization of numerical methods, the difference equation, is encountered frequently when solving PDEs [18] . A difference equation equates 0 to a polynomial that is linear in the various iterates of a variable [46] . Usually the context is the evolution of some variable over time, with the current time period or discrete moment in time denoted as t, one period earlier denoted as t − 1, one period later as t + 1, etc. For instance, an n-th order linear difference equation is one that can be written in terms of parameters ai and b as
Difference equations used to obtain solutions of partial differential equations should approximate the PDE in a welldefined way [18] . Mattheij, in [18] , describes an approximation technique using schemes, or a discrete analog of the continuous PDE, which are the result of replacing differentials by finite differences. For instance, the onedimensional diffusion equation, shown in equation (3) , can be discretized as in equation (4) and characterized using the stencil shown in Figure 1 .
If, in a system of time-dependent PDEs, all spatial derivatives are replaced by suitable difference approximations, we obtain a system of ODEs in time [18] . This system of ODEs can, in turn, be solved to accuracy e with a complexity no worse than exponential in ln e for each ODE [19] .
B. Analog Mesh Computers in Solving Difference Equations
Redshaw and Liebmann designed an apparatus to solve PDEs describing oscillatory flow [10] and heat transfer problems [9] using a resistive mesh, also called a resistance network analogue, comprised of resistors connected in a twodimensional mesh configuration. Finite difference mesh points, as shown in Figure 2 , characterize the stencil for a specific PDE, and are mapped to a resistive mesh for calculation. Solutions are read at the intersection of resistor terminals, or nodes. Due to its structure, the time required for a resistive mesh to execute a calculation is equal to the time required for the mesh to reach a steady state, shown in equation 3,
where diameter is the longest path through the mesh, and RC is the product of the lumped resistance and capacitance of each section of the mesh in between two nodes. Another advantage of this architecture is that the complexity of such a mesh is linear with respect to growth in mesh size. Monitoring of point P0 provides a solution to f) through the analogous equations, and accuracy analysis by De Packh suggests that accuracy of a resistive mesh might approach the tolerance limit of the individual resistance units, e.g. = 1% [9] . For advanced VLSI processes, tolerances such as these are to be expected [20] , enabling the integration of mesh computers into larger systems.
III. SCALING IN ANALOG MESH COMPUTERS
Prior to integrating an analog mesh computer into a larger system, design decisions must be made regarding the size of the mesh in the X and Y dimensions. Mesh size dictates the size and resolution of the problem being solved, as seen in Figure  3 , while also dictating energy consumption of the system, increasing the importance of an informed size selection.
The mesh size-selection problem is exacerbated by the necessity for programmability in scientific computing; for problems such as heat transfer, the size of the problem will vary based on the size of the heat flow path, while the mesh selected for use in system will always be of a fixed size. Mismatch between problem size and mesh size manifests itself as mismatches in amplitude due to a reduction in resolution of the solution.
Error caused by a lack of resolution introduces two problems, as shown in Figure 4 : an error in the amplitude of the solution, and a loss of information pertaining to the solution. The error in amplitude is well understood by the data converter community and is known as gain and offset error. These errors, once known, can be predicted with some accuracy and several methods exist for their mitigation [24] . The other error type, loss of information during the mapping of a small mesh to a larger problem (grid), is a more difficult problem to solve, as new and accurate information must be generated using only a known subset of all information.
Interpolation is a fundamental method used to estimate unknown values which reside between a set of known values, and has recently been improved upon for specific applications [25] . However, it is generally acknowledged that there is an inherent error introduced with the estimation of unknown values. In fact, for problems related to natural phenomena, including the heat transfer problem, errors introduced by the use of linear and quadratic interpolation cannot be ignored [26] , [30] .
The regions bounded by the broken lines in Figure 4 show the error introduced by linear interpolation. The dashed lines represent the best-guess mapping of a 4x4 mesh onto a 32x32 solution-space using linear interpolation as a data generator. Due to offset errors introduced by the 4x4 mesh, the average error introduced by linear interpolation is 7%, as seen in Figure  5 . However, this only tells part of the story, as the average error, as seen the curvilinear region of the diagonal in Figure 4 is much higher, and could be as high as 20%. After applying correction for offset errors, the average error introduced by linear interpolation is reduced to 4%, but error in the curvilinear region is still higher than 10%, and most of the characteristics of the curve are lost.
A simple solution to mitigate the information loss due to interpolation is to increase the size of the mesh. In this study, we only consider square meshes, so any scaling in the mesh size implies equal scaling in the X and Y dimensions. However, Liebmann's follow on work to his initial study in Resistance Network Analogues has shown that the same principles apply to rectangular meshes as well [13] . Figure 4 shows an approximately 15% reduction in error for each doubling of the dimensions of a coarse mesh. While this does, indeed, increase the resolution of the solution, thus decreasing error, the doubling of the size increases the area of the mesh, along with the number of nodes within it, by a factor of 4. This causes a 4x increase in the energy consumption of the mesh.
When designing a mesh computer, there are multiple hardware and software tradeoffs to consider. Manufacturability and energy consumption of mesh hardware, coupled with the energy consumption of multiple mesh calculations, in the case of large problems, must be traded against the required resolution of the solution to come to an optimal architecture.
A. Hardware Scaling in Analog Mesh Computers
When choosing dimensions for a mesh, many factors must be considered. Obviously, silicon real-estate is not to be wasted in the twilight of Dennard scaling, making the size of the physical mesh implementation an important factor. While a mesh of any size is capable of solving a PDE, choosing a mesh too small increases the error in the solution due to the reduced resolution of the mesh. This constraint, and the fixedsize nature of the mesh itself, makes consideration of error tolerance an important factor in the design space exploration for mesh computers.
1) Design Space Exploration: Consider mesh computers of dimension 4x4, 8x8, 16x16 and 32x32, each simulated using SPICE with a single source and single sink (see Figure 4) . Energy consumption for each mesh was calculated using SPICE, and support circuitry energy consumption was estimated using datasheets for the individual components.
Energy consumption of the mesh is the sum of energy required for mesh control circuitry, comprising input bias circuitry and readout circuitry, and the mesh components, as shown in Figure 6 . As the solution of the PDE is the sum of currents taken at each node, a brute-force approach requires an operational transimpedance amplifier at the output of each node, followed by an analog-to-digital converter (ADC). However, when considering the energy consumption of the individual readout circuitry components [27] , [29] , it becomes obvious that this approach will not support scaling, and that other architectures should be considered.
As the mesh computer reaches a steady state upon completion of a calculation, the mesh can be considered to have a memory, thus node currents are not required to be read simultaneously. Relief of the simultaneous readout requirement allows for serial readout, reducing the number of ADCs required by the mesh, with an expected reduction in performance (Figure 7) , described in equation 7. The study of readout circuitry and its effect on mesh performance was studied in detail by Liebmann in [42] .
Additional energy savings can be gained by relaxing the requirement for an amplifier at every current-summing node, as seen in Figure 7 . The sharing of operational transimpedance amplifiers by multiple nodes was proposed in [21] , and reduces energy consumption further through a reduction in amplifiers and their replacement with comparatively lowpower analog multiplexors [27] , [28] . This architecture has the advantage of slowing the growth of the high-power components, allowing the low-power components of the mesh grow fast and eventually dominate energy consumption, as seen in Figure 6 . Upon minimization of the readout circuitry, the physical structure of a mesh makes scaling pretty straightforward. As stated before, a doubling of the mesh size increases the area of the mesh, along with the number of nodes within it, by a factor of 4. This massive scaling of nodes places tremendous importance on the selection of low-power components, whenever a node-for-node match in hardware is required. The calculation of energy consumed by a mesh of a given size is estimated by equations 8, 9 and 10,
where m is the number of nodes in the mesh, defined in terms of x and y dimensions as x * y for a rectangular mesh, and l is the number of nodes required to be read for a solution with a specific resolution.
As can be seen from equation 9, a square mesh having its length doubled results in a node increase described by equation
where n is the number of length doubles required to reach a specific resolution.
2) Analysis of Results: Consider mesh computers of dimension 4x4, 8x8, 16x16, 32x32 and 64x64, each simulated using SPICE with a single heat source and sink, as shown in Figure 9 . Upon reaching a steady state, the nodes within each mesh nearest to the source and sink were recorded and compared. As the 64x64 mesh computer has the highest resolution of the meshes tested, it is regarded as the ground truth for all comparisons.
The error of corresponding mesh points is shown in Figure  10 . It was calculated as the difference between the read out value from a mesh of a specific dimension and the ground truth. Note that, generally, the trend from Figure 5 continues and the average error continues to decrease linearly as the mesh area increases.
The average error of each mesh is shown in Figure 11 . It was calculated as the average of errors between corresponding nodes in each mesh. As expected, the average error produced Fig. 9 . Analog mesh computer simulations run on SPICE for this study (left). Meshes of dimension 4x4, 8x8, 16x16, etc. up to 64x64 were simulated, and nodes within the mesh closest to the source and sink were recorded for comparison. Results from simulation (right). Fig. 10 . Error in calculations executed during simulations described in previous figure. 64x64 mesh was considered ground truth to which all other meshes were compared. Note that the error for each node is inversely proportional to the mesh size.
by each mesh decreases as the mesh size increases. It is also instructive to analyze the slope of the average error, which identifies any trends in the scaling. From this, we see that the inflection point appears at 32x32 to 64x64 scaling, which shows a definite roll-off for meshes with a dimension larger than 32x32. This shows that the reward-to-cost ratio has been reduced, and suggests that further increases in mesh size may not be worthwhile, except for situations where the tolerance for error are very low.
The average error for each mesh only tells part of the story, however, as the majority of low-error points reside in the region farthest from the source and sink. Figure 12 shows the distribution of errors throughout the mesh. The majority of errors are seen closest to the source and sink, and the initial drop-off in errors is followed by a gradual reduction until the Fig. 11 . Average error in meshes of different sizes when compared to a 128x128 ground truth (left). Maximum error (in blue) scales with average error (in orange), and is inversely proportional to mesh size. Slope of error when comparing one mesh to a mesh of the next-larger size (right). As the mesh resolution increases, the slope in error between adjacent mesh sizes decreases, suggesting a convergence for large meshes. Note the similarity in the error slopes between the medium-sized (between 8x8 and 32x32) meshes. Fig. 12 . Absolute error in meshes of different sizes when compared to a 128x128 ground truth. 64x64 mesh (left) and 4x4 mesh (right) show a steep roll-off in error from the source and sink towards the middle of the mesh, followed by a gradual reduction in error until the middle of the mesh is reached. Fig. 13 . Average error increase as mesh size increases. Finest and coarsest meshes have the smallest error, making them appropriate for substitution during EQ scaling. middle of the mesh is reached, at which point the inverse occurs. Maximum error, also shown in Figure 11 , is more than double the average error for each mesh. Figure 13 shows the average offset error for corresponding nodes in a mesh when compared to its next-larger sibling. This comparison is meant to identify mesh sizes which may be compatible for wholesale substitution. From Figure 13 , we see that the coarsest and finest meshes are most suitable for substitution, as they introduce only a 2.5% offset error. This enables an EQ trade to be made by designers, where the energy required for PDE computation can be reduced by 75% with only a small error introduced into the results.
More generally, a metric can be introduced which supports the EQ trade, enabling designers to quantify the superiority of one mesh over another. This metric, which we call the Precision-to-Energy Ratio (PER), is defined simply as equation 12 
where precision is defined as 100% − errorpercentage. This ratio works by assigning importance to the precision of the solution computed by the mesh, while comparing it to the energy required by the mesh for the computation. Readers should note that since the size of the mesh dictates its energy consumption, this ratio also serves to, indirectly, compare precision with mesh size as well. Intuitively, as the precision increases or energy decreases, the PER gets larger.
Hardware scaling, while effective in controlling energy per calculation, is most useful when the size of the problem is known beforehand and promises to remain static. While application-specific computers have been created for very complex applications [31] , and applications which require very high speed or efficiency [32] , a computer must be able to calculate for a variety of problems, or problem sizes, to be truly useful. This problem was recognized by Liebmann, and he proposed a method to scale problems onto submeshes of different sizes [13] . While his method enables equivalent meshes to be created using meshes of different sizes, it is not easily implemented using VLSI technology, where hardware parameters must remain static after fabrication. A more software-like approach must be adopted to enable the appropriate level of mesh computer scaling.
B. Software Scaling in Analog Mesh Computers
Parallelization of a large problem in high performance computing involves decomposing a problem into pieces, and then working on each piece individually [34] . A subset of parallelization techniques, called divide-and-conquer, recursively decomposes a large problem into smaller subproblems, enabling each subproblem to be mapped to specific hardware resources [33] . Virtualization of a computing environment can enable divide-and-conquer of calculations on a single computer, as it enables time-sharing of hardware resources among different pieces of the problem. This allows a computer with fixed hardware resources to compute a solution to a problem of a larger size than the computer was originally designed to handle.
To this end, we propose a method to virtualize a mesh computer called Recursive Mesh Refinement (RMR), which enables a coarse-grained analog mesh computer to approximate the solution of a much finer-grained analog mesh computer. This method of virtualization enables an analog mesh computer to fulfill the programmability requirement of scientific computing tasks. Thus, RMR enables EQ scaling by allowing hardware designers to select the coarsest mesh dimension allowable by the common case, and then using the coarse mesh with RMR to solve a PDE when higher resolution is needed.
1) Design Space Exploration:
RMR is a method of virtualization which exploits the natural behaviors of a resistive mesh to enable time-sharing of the mesh among individual pieces of a decomposed problem. Shown in Figure  14 , RMR is called recursive because, much like functional recursion in computing, the entirety of the problem is fed to the mesh, where it is decomposed into quadrants, each of which is then fed back into the mesh, and so on until the solution reaches the required resolution.
This type of recursion is made possible with the addition of biasing circuitry around the perimeter of the mesh. When the initial problem is fed to the mesh, the centermost nodes in the horizontal and vertical axes are read out. This effectively creates a "system snapshot" of the mesh separated into quadrants, which is then stored in memory. The bias information for each quadrant is then used to bias the periphery of the mesh. Shown in Figure 15 , this effectively spreads a single quadrant over Fig. 14. Recursive Mesh Refinement enables time-sharing of mesh resources, which allows a problem to be decomposed into multiple quadrants, each of which is then fed into the mesh, thereby increasing its effective resolution. Fig. 15 . Recursive Mesh Refinement acts in a similar manner to a "zoom in" imaging function, fitting a continually reduced-size system snapshot into a fixed-size mesh. This gives the appearance of a mesh with increased resolution.
the entirety of the mesh hardware, giving the appearance of increased resolution, which we call effective resolution.
There exists a natural mismatch in that the number of biases comprising the perimeter of a quadrant is always one half of the biases required to completely set the mesh's perimeter biasing circuitry. Linear interpolation can be accurately used to estimate the missing biases, considering that the missing nodes are, in essence, the center of a resistive divider made up of resistors of R/2 resistance [43] . This technique enables the forced matching of node voltages, thereby reducing the effects of resistor mismatch.
Due to RMR requiring multiple mesh calculations to arrive at a specific effective resolution, it requires more energy and time to execute a calculation for a given effective resolution. The energy and time required for a RMR calculation are related to the number of calculations required for an effective resolution by equation 13,
where n is the number of recursive levels needed for a specific effective resolution, calculated in 14.
RMR can also be used to spread a calculation over multiple mesh computers. An initial calculation could be executed by the root mesh computer, resulting in the readout of boundary conditions for each quadrant. These values could then be sent to other computers, each of which can initialize its internal mesh with the boundary conditions received. Results Fig. 16 . Error when emulating a 8x8 mesh (left) and 16x16 mesh (right) with 4x4 mesh hardware. Note that for the 8x8 mesh emulation, the smallest errors were seen at the diagonal boundaries from the source towards the sink, which expected, since all simulations show an area of low error in the center of the mesh. For the 16x16 mesh emulation, the smallest error is seen nearest the source, with the error reaching a point of inflection and decreasing towards the sink. This decrease in error coincides with the boundary of the linear region of the mesh.
of calculations from each computer can then be sent back to the root mesh computer, which concatenates the results.
2) Analysis of Results: RMR was simulated using SPICE. The simulation environment, shown in Figure 15 consisted of a 4x4 mesh virtualized to emulate larger meshes. Results were then compared to corresponding areas in the larger meshes. Figures 16 and 17 show the results of RMR compared to the actual results from larger meshes being emulated. The error introduced by RMR has a maximum of 6% when emulating a mesh with size doubled once. Larger doublings of size resulted in larger errors, as expected, with the maximum error seen being 12%. The authors suspect that the error introduced by RMR is a function of the offset error introduced by the mesh hardware, and integrated nonlinearity (INL) and differential nonlinearity (DNL) of both the analog-to-digital conversion during readout of boundary conditions and the digitaltoanalog conversion of the biasing circuitry. Figure 18 shows the average error introduced by RMR. As expected, the lowest error was introduced when a 4x4 mesh emulated an 8x8 mesh. A 4x4 mesh emulating larger meshes resulted in considerably larger errors. Analysis of the slope of errors shows a knee in the curve for emulations larger than one doubling of the mesh size. However, a gradual roll-off of errors introduced by RMR is encountered for emulation of meshes with size much greater than the original mesh. This suggests the existence of an upper bound of RMR error, which can be taken advantage of by applications with a larger error tolerance, such that small mesh hardware can be used to successfully emulate larger meshes.
Once again, the energy and precision calculations described above can be ratioed as described by the PER metric, enabling it to be used for system quantification. Take, for instance, a 4x4 mesh required to emulate an 8x8 mesh and compare it to an 8x8 mesh. The energy required by the 8x8 mesh is 4x the energy required by a 4x4 mesh. However, RMR requires 5x the energy of a 4x4 mesh, due to the number of calculations required for resolution increase. This increases the denominator of the PER for the 4x4 mesh, making its PER initially lower than that of an 8x8 mesh. Additionally, the numerator of the 4x4 mesh's PER decreases due to its higher offset error, and the addition of INL and DNL errors. Fig. 17 . Error when emulating a 32x32 mesh (left) and 64x64 mesh (right) with 4x4 mesh hardware. Note that for the smallest error is seen nearest the source, with error increasing towards the sink. Note that the 32x32 emulated mesh results reach an expected point of inflection that corresponds with the center of the 16x16 emulated mesh results. Fig. 18 . Average RMR virtualization error of a 4x4 analog mesh computer emulating larger meshes (left). Note that the while the error does increase as larger meshes are emulated, there is a roll-off, suggesting an upper bound for error introduced by RMR. The slope of the error during RMR virtualization of the next-larger mesh (right) suggests a minimum-sized mesh required to minimize data loss and a lower bound for emulation error as the meshes become larger.
Altogether, this makes its PER lower than that of the 8x8 mesh hardware, which matches intuition.
IV. DISCUSSION
Due to the importance of EQ scaling in future computer architectures, it is worthwhile to discuss combinations of the aforementioned scaling techniques that, when used properly, can increase the PER of a mesh computer. Ideally, a combination of these techniques can be found which creates a mesh computer with optimal parameters for a given computing environment.
Take, for instance, a mesh computer that is required to calculate solutions to PDEs with the precision of a 32x32 mesh for 80% of the time, and required to calculate with the precision of a 128x128 mesh for the remainder. This 80% time period, also called the common case, dominates execution time, leaving a computer architect to wonder whether the optimal solution is a 128x128 mesh, a 32x32 mesh, or somewhere in between.
The brute-force approach requires a 128x128 mesh and simply throws away 3/4 of the data comprising the solution for the common case. The use of a 128x128 mesh in place of a 32x32 would gain 2.5% in precision, according to Figure 11 , thereby increasing the numerator in the PER to 1.025. However, it would be offset by an increase in energy, which increases the denominator in the PER to 16. For the 20% case, where a 128x128 mesh is required for calculations, the PER is 1/1. Using the time expected for the common case as a weight factor, the PER for the 128x128 mesh is 0.25.
The other option uses a 32x32 mesh for all calculations, requiring RMR for emulation of the 128x128 mesh. The PER for the 32x32 mesh is 0.98 (calculated using 1/1 for the common case and .94/1.06 for the remainder of the time). Using a similar calculation, the PER for the 64x64 mesh is 0.36. Assuming that 94% precision in the 128x128 calculation falls within the error tolerance, the 32x32 mesh has the highest PER, making it the optimal mesh for the required calculations.
V. FUTURE WORK
Observing the plot from Figure 2 and contrasting it with equations 13 and 14, it becomes obvious that RMR, while useful in enabling mesh emulation, is not the most efficient method to gain resolution with a mesh computer. This is due to the structure of the expected results, where a large, linear region exists in between two curvilinear regions. This region, which is much easier to model with high fidelity, does not need as high a resolution mesh as the curvilinear regions to fall within required error tolerances. Adaptive Mesh Refinement (AMR) is a technique which exploits this characteristic to simultaneously raise precision and lower energy, thus increasing the PER, of mesh emulation. This technique builds on RMR, which serves as an enabling function that allows finegrained mesh emulation using coarser mesh hardware.
AMR is a fundamental technique in Green Computing initiatives and allows a computer to selectively increase resolution where it is required, allowing it to adapt to temporally or spatially localized features [40] . In calculating three-dimensional datasets using a two-dimensional array, a computer using AMR can decompose the array into regions that require various precisions. The low precision regions are solved using coarse-grained boundary conditions, and the higher precision regions are solved using various sets of finergrained boundary conditions. This allows a computer using AMR to execute a reduced number of calculations, thereby expending less energy, when computing a solution. In fact, early AMR-enabled systems have been shown to increase efficiency by 80% [41] . AMR can similarly increase the efficiency of a mesh computer by reducing the number of RMR calculations required to compute a solution within error tolerances.
Finally, a combination of AMR and linear interpolation can be used to increase the PER further. This combination would restrict linear interpolation to linear regions, where it is most appropriate, and then use RMR in regions where spatially localized features are present. This combination is best suited for non real-time systems, as linear interpolation is a time and energy-consuming postprocessing step. Therefore, using linear interpolation poses its own design challenges in terms of finding the tradeoff between precision, energy and execution time.
VI. CONCLUSION
Future high-performance computing systems will likely consist of heterogeneous components designed to accelerate complex calculations. This call for heterogeneous accelerators, along with the recent introduction of new materials used in VLSI manufacturing, has led researchers to reevaluate existing accelerator architectures which were once deemed unusable. Among them, the analog mesh computer appears to be primed for a comeback due to its accuracy and performance. However, an in-depth study of the EQ scaling of analog meshbased architectures has yet to be done.
A comprehensive study of hardware scaling of an analog mesh computer has been presented. It was shown that meshes with increased resolution provide solutions with lower error, not only due to errors introduced by estimation of data by reduced-size meshes, but also due to offset errors introduced by these smaller meshes. Additionally, a metric, PER, was introduced to aid in the quantification of mesh performance during EQ trade studies.
Additionally, a software scaling mechanism was introduced, RMR, which uses virtualization to enable a mesh computer to emulate a mesh computer of a larger dimension. RMR was shown to allow a mesh of a particular dimension to successfully emulate a mesh of a larger dimension (doubled). It was shown that an upper bound in errors introduced by RMR exists, opening the possibility for error-tolerant architectures to use a small-dimensioned mesh to emulate a mesh of a much larger dimension (doubled three to four times). 
