509 research outputs found

    An efficient null space inexact Newton method for hydraulic simulation of water distribution networks

    Full text link
    Null space Newton algorithms are efficient in solving the nonlinear equations arising in hydraulic analysis of water distribution networks. In this article, we propose and evaluate an inexact Newton method that relies on partial updates of the network pipes' frictional headloss computations to solve the linear systems more efficiently and with numerical reliability. The update set parameters are studied to propose appropriate values. Different null space basis generation schemes are analysed to choose methods for sparse and well-conditioned null space bases resulting in a smaller update set. The Newton steps are computed in the null space by solving sparse, symmetric positive definite systems with sparse Cholesky factorizations. By using the constant structure of the null space system matrices, a single symbolic factorization in the Cholesky decomposition is used multiple times, reducing the computational cost of linear solves. The algorithms and analyses are validated using medium to large-scale water network models.Comment: 15 pages, 9 figures, Preprint extension of Abraham and Stoianov, 2015 (https://dx.doi.org/10.1061/(ASCE)HY.1943-7900.0001089), September 2015. Includes extended exposition, additional case studies and new simulations and analysi

    Accelerating SPICE Model-Evaluation using FPGAs

    Get PDF
    Single-FPGA spatial implementations can provide an order of magnitude speedup over sequential microprocessor implementations for data-parallel, floating-point computation in SPICE model-evaluation. Model-evaluation is a key component of the SPICE circuit simulator and it is characterized by large irregular floating-point compute graphs. We show how to exploit the parallelism available in these graphs on single-FPGA designs with a low-overhead VLIW-scheduled architecture. Our architecture uses spatial floating-point operators coupled to local high-bandwidth memories and interconnected by a time-shared network. We retime operation inputs in the model-evaluation to allow independent scheduling of computation and communication. With this approach, we demonstrate speedups of 2–18× over a dual-core 3GHz Intel Xeon 5160 when using a Xilinx Virtex 5 LX330T for a variety of SPICE device models

    A Lossy, Synchronization-Free, Race-Full, But Still Acceptably Accurate Parallel Space-Subdivision Tree Construction Algorithm

    Get PDF
    We present a new synchronization-free space-subdivision tree construction algorithm. Despite data races, this algorithm produces trees that are consistent enough for the client Barnes-Hut center of mass and force computation phases to use successfully. Our performance results show that eliminating synchronization improves the performance of the parallel algorithm by approximately 20%. End-to-end accuracy results show that the resulting partial data structure corruption has a neglible effect on the overall accuracy of the Barnes-Hut N-body simulation. We note that many data structure manipulation algorithms use many of the same basic operations (linked data structure updates and array insertions) as our tree construction algorithm. We therefore anticipate that the basic principles the we develop in this paper may effectively guide future efforts in this area

    A New Method for Efficient Parallel Solution of Large Linear Systems on a SIMD Processor.

    Get PDF
    This dissertation proposes a new technique for efficient parallel solution of very large linear systems of equations on a SIMD processor. The model problem used to investigate both the efficiency and applicability of the technique was of a regular structure with semi-bandwidth β,\beta, and resulted from approximation of a second order, two-dimensional elliptic equation on a regular domain under the Dirichlet and periodic boundary conditions. With only slight modifications, chiefly to properly account for the mathematical effects of varying bandwidths, the technique can be extended to encompass solution of any regular, banded systems. The computational model used was the MasPar MP-X (model 1208B), a massively parallel processor hostnamed hurricane and housed in the Concurrent Computing Laboratory of the Physics/Astronomy department, Louisiana State University. The maximum bandwidth which caused the problem\u27s size to fit the nyproc ×\times nxproc machine array exactly, was determined. This as well as smaller sizes were used in four experiments to evaluate the efficiency of the new technique. Four benchmark algorithms, two direct--Gauss elimination (GE), Orthogonal factorization--and two iterative--symmetric over-relaxation (SOR) (ω\omega = 2), the conjugate gradient method (CG)--were used to test the efficiency of the new approach based upon three evaluation metrics--deviations of results of computations, measured as average absolute errors, from the exact solution, the cpu times, and the mega flop rates of executions. All the benchmarks, except the GE, were implemented in parallel. In all evaluation categories, the new approach outperformed the benchmarks and very much so when N ≫\gg p, p being the number of processors and N the problem size. At the maximum system\u27s size, the new method was about 2.19 more accurate, and about 1.7 times faster than the benchmarks. But when the system size was a lot smaller than the machine\u27s size, the new approach\u27s performance deteriorated precipitously, and, in fact, in this circumstance, its performance was worse than that of GE, the serial code. Hence, this technique is recommended for solution of linear systems with regular structures on array processors when the problem\u27s size is large in relation to the processor\u27s size
    • …
    corecore