388,928 research outputs found

    Resilience for Asynchronous Iterative Methods for Sparse Linear Systems

    Get PDF
    Large scale simulations are used in a variety of application areas in science and engineering to help forward the progress of innovation. Many spend the vast majority of their computational time attempting to solve large systems of linear equations; typically arising from discretizations of partial differential equations that are used to mathematically model various phenomena. The algorithms used to solve these problems are typically iterative in nature, and making efficient use of computational time on High Performance Computing (HPC) clusters involves constantly improving these iterative algorithms. Future HPC platforms are expected to encounter three main problem areas: scalability of code, reliability of hardware, and energy efficiency of the platform. The HPC resources that are expected to run the large programs are planned to consist of billions of processing units that come from more traditional multicore processors as well as a variety of different hardware accelerators. This growth in parallelism leads to the presence of all three problems. Previously, work on algorithm development has focused primarily on creating fault tolerance mechanisms for traditional iterative solvers. Recent work has begun to revisit using asynchronous methods for solving large scale applications, and this dissertation presents research into fault tolerance for fine-grained methods that are asynchronous in nature. Classical convergence results for asynchronous methods are revisited and modified to account for the possible occurrence of a fault, and a variety of techniques for recovery from the effects of a fault are proposed. Examples of how these techniques can be used are shown for various algorithms, including an analysis of a fine-grained algorithm for computing incomplete factorizations. Lastly, numerous modeling and simulation tools for the further construction of iterative algorithms for HPC applications are developed, including numerical models for simulating faults and a simulation framework that can be used to extrapolate the performance of algorithms towards future HPC systems

    High-performance computing and communication models for solving the complex interdisciplinary problems on DPCS

    Get PDF
    The paper presents some advanced high performance (HPC) and parallel computing (PC) methodologies for solving a large space complex problem involving the integrated difference research areas. About eight interdisciplinary problems will be accurately solved on multiple computers communicating over the local area network. The mathematical modeling and a large sparse simulation of the interdisciplinary effort involve the area of science, engineering, biomedical, nanotechnology, software engineering, agriculture, image processing and urban planning. The specific methodologies of PC software under consideration include PVM, MPI, LUNA, MDC, OpenMP, CUDA and LINDA integrated with COMSOL and C++/C. There are different communication models of parallel programming, thus some definitions of parallel processing, distributed processing and memory types are explained for understanding the main contribution of this paper. The matching between the methodology of PC and the large sparse application depends on the domain of solution, the dimension of the targeted area, computational and communication pattern, the architecture of distributed parallel computing systems (DPCS), the structure of computational complexity and communication cost. The originality of this paper lies in obtaining the complex numerical model dealing with a large scale partial differential equation (PDE), discretization of finite difference (FDM) or finite element (FEM) methods, numerical simulation, high-performance simulation and performance measurement. The simulation of PDE will perform by sequential and parallel algorithms to visualize the complex model in high-resolution quality. In the context of a mathematical model, various independent and dependent parameters present the complex and real phenomena of the interdisciplinary application. As a model executes, these parameters can be manipulated and changed. As an impact, some chemical or mechanical properties can be predicted based on the observation of parameter changes. The methodologies of parallel programs build on the client-server model, slave-master model and fragmented model. HPC of the communication model for solving the interdisciplinary problems above will be analyzed using a flow of the algorithm, numerical analysis and the comparison of parallel performance evaluations. In conclusion, the integration of HPC, communication model, PC software, performance and numerical analysis happens to be an important approach to fulfill the matching requirement and optimize the solution of complex interdisciplinary problems

    A distributed-memory parallel algorithm for discretized integral equations using Julia

    Full text link
    Boundary value problems involving elliptic PDEs such as the Laplace and the Helmholtz equations are ubiquitous in physics and engineering. Many such problems have alternative formulations as integral equations that are mathematically more tractable than their PDE counterparts. However, the integral equation formulation poses a challenge in solving the dense linear systems that arise upon discretization. In cases where iterative methods converge rapidly, existing methods that draw on fast summation schemes such as the Fast Multipole Method are highly efficient and well established. More recently, linear complexity direct solvers that sidestep convergence issues by directly computing an invertible factorization have been developed. However, storage and compute costs are high, which limits their ability to solve large-scale problems in practice. In this work, we introduce a distributed-memory parallel algorithm based on an existing direct solver named ``strong recursive skeletonization factorization.'' The analysis of its parallel scalability applies generally to a class of existing methods that exploit the so-called strong admissibility. Specifically, we apply low-rank compression to certain off-diagonal matrix blocks in a way that minimizes data movement. Given a compression tolerance, our method constructs an approximate factorization of a discretized integral operator (dense matrix), which can be used to solve linear systems efficiently in parallel. Compared to iterative algorithms, our method is particularly suitable for problems involving ill-conditioned matrices or multiple right-hand sides. Large-scale numerical experiments are presented to demonstrate the performance of our implementation using the Julia language

    A parallel metaheuristic for large mixed-integer dynamic optimization problems, with applications in computational biology

    Get PDF
    [Abstract] Background: We consider a general class of global optimization problems dealing with nonlinear dynamic models. Although this class is relevant to many areas of science and engineering, here we are interested in applying this framework to the reverse engineering problem in computational systems biology, which yields very large mixed-integer dynamic optimization (MIDO) problems. In particular, we consider the framework of logic-based ordinary differential equations (ODEs). Methods: We present saCeSS2, a parallel method for the solution of this class of problems. This method is based on an parallel cooperative scatter search metaheuristic, with new mechanisms of self-adaptation and specific extensions to handle large mixed-integer problems. We have paid special attention to the avoidance of convergence stagnation using adaptive cooperation strategies tailored to this class of problems. Results: We illustrate its performance with a set of three very challenging case studies from the domain of dynamic modelling of cell signaling. The simpler case study considers a synthetic signaling pathway and has 84 continuous and 34 binary decision variables. A second case study considers the dynamic modeling of signaling in liver cancer using high-throughput data, and has 135 continuous and 109 binaries decision variables. The third case study is an extremely difficult problem related with breast cancer, involving 690 continuous and 138 binary decision variables. We report computational results obtained in different infrastructures, including a local cluster, a large supercomputer and a public cloud platform. Interestingly, the results show how the cooperation of individual parallel searches modifies the systemic properties of the sequential algorithm, achieving superlinear speedups compared to an individual search (e.g. speedups of 15 with 10 cores), and significantly improving (above a 60%) the performance with respect to a non-cooperative parallel scheme. The scalability of the method is also good (tests were performed using up to 300 cores). Conclusions: These results demonstrate that saCeSS2 can be used to successfully reverse engineer large dynamic models of complex biological pathways. Further, these results open up new possibilities for other MIDO-based large-scale applications in the life sciences such as metabolic engineering, synthetic biology, drug scheduling.Ministerio de Economía y Competitividad; DPI2014-55276-C5-2-RMinisterio de Economía y Competitividad; TIN2016-75845-PGalicia. Consellería de Cultura, Educación e Ordenación Universitaria; R2016/045Galicia. Consellería de Cultura, Educación e Ordenación Universitaria; GRC2013/05

    Efficient and Flexible First-Order Optimization Algorithms

    Get PDF
    Optimization problems occur in many areas in science and engineering. When the optimization problem at hand is of large-scale, the computational cost of the optimization algorithm is a main concern. First-order optimization algorithms—in which updates are performed using only gradient or subgradient of the objective function—have low per-iteration computational cost, which make them suitable for tackling large-scale optimization problems. Even though the per-iteration computational cost of these methods is reasonably low, the number of iterations needed for finding a solution—especially if medium or high accuracy is needed—can in practice be very high; as a result, the overall computational cost of using these methods would still be high. This thesis focuses on one of the most widely used first-order optimization algorithms, namely, the forward–backward splitting algorithm, and attempts to improve its performance. To that end, this thesis proposes novel first-order optimization algorithms which all are built upon the forward–backward method. An important feature of the proposed methods is their flexibility. Using the flexibility of the proposed algorithms along with the safeguarding notion, this thesis provides a framework through which many new and efficient optimization algorithms can be developed. To improve efficiency of the forward–backward algorithm, two main approaches are taken in this thesis. In the first one, a technique is proposed to adjust the point at which the forward–backward operator is evaluated. This is done through including additive terms—which are called deviations—in the input argument of the forward– backward operator. The deviations then, in order to have a convergent algorithm, have to satisfy a safeguard condition at each iteration. Incorporating deviations provides great flexibility to the algorithm and paves the way for designing new and improved forward–backward-based methods. A few instances of employing this flexibility to derive new algorithms are presented in the thesis.In the second proposed approach, a globally (and potentially slow) convergent algorithm can be combined with a fast and locally convergent one to form an efficient optimization scheme. The role of the globally convergent method is to ensure convergence of the overall scheme. The fast local algorithm’s role is to speed up the convergence; this is done by switching from the globally convergent algorithm to the local one whenever it is safe, i.e., when a safeguard condition is satisfied. This approach, which allows for combining different global and local algorithms within its framework, can result in fast and globally convergent optimization schemes
    corecore