604,475 research outputs found

    Failure analysis and reliability -aware resource allocation of parallel applications in High Performance Computing systems

    Get PDF
    The demand for more computational power to solve complex scientific problems has been driving the physical size of High Performance Computing (HPC) systems to hundreds and thousands of nodes. Uninterrupted execution of large scale parallel applications naturally becomes a major challenge because a single node failure interrupts the entire application, and the reliability of a job completion decreases with increasing the number of nodes. Accurate reliability knowledge of a HPC system enables runtime systems such as resource management and applications to minimize performance loss due to random failures while also providing better Quality Of Service (QOS) for computational users. This dissertation makes three major contributions for reliability evaluation and resource management in HPC systems. First we study the failure properties of HPC systems and observe that Times To Failure (TTF\u27s) of individual compute nodes follow a time-varying failure rate based distribution like Weibull distribution. We then propose a model for the TTF distribution of a system of k independent nodes when individual nodes exhibit time varying failure rates. Based on the reliability of the proposed TTF model, we develop reliability-aware resource allocation algorithms and evaluated them on actual parallel workloads and failure data of a HPC system. Our observations indicate that applying time varying failure rate-based reliability function combined with some heuristics reduce the performance loss due to unexpected failures by as much as 30 to 53 percent. Finally, we also study the effect of reliability with respect to the number of nodes and propose reliability-aware optimal k node allocation algorithm for large scale parallel applications. Our simulation results of comparing the optimal k node algorithm indicate that choosing the number of nodes for large scale parallel applications based on the reliability of compute nodes can reduce the overall completion time and waste time when the k may be smaller than the total number of nodes in the system

    A parallel algorithm of ICSYM for complex symmetric linear systems in quantum chemistry

    Get PDF
    Computational effort is a common issue for solving large-scale complex symmetric linear systems, particularly in quantum chemistry applications. In order to alleviate this problem, we propose a parallel algorithm of improved conjugate gradient-type iterative (CSYM). Using three-term recurrence relation and orthogonal properties of residual vectors to replace the tridiagonalization process of classical CSYM, which allows to decrease the degree of the reduce-operator from two to one communication at each iteration and to reduce the amount of vector updates and vector multiplications. Several numerical examples are implemented to show that high performance of proposed improved version is obtained both in convergent rate and in parallel efficiency

    High-performance computing and communication models for solving the complex interdisciplinary problems on DPCS

    Get PDF
    The paper presents some advanced high performance (HPC) and parallel computing (PC) methodologies for solving a large space complex problem involving the integrated difference research areas. About eight interdisciplinary problems will be accurately solved on multiple computers communicating over the local area network. The mathematical modeling and a large sparse simulation of the interdisciplinary effort involve the area of science, engineering, biomedical, nanotechnology, software engineering, agriculture, image processing and urban planning. The specific methodologies of PC software under consideration include PVM, MPI, LUNA, MDC, OpenMP, CUDA and LINDA integrated with COMSOL and C++/C. There are different communication models of parallel programming, thus some definitions of parallel processing, distributed processing and memory types are explained for understanding the main contribution of this paper. The matching between the methodology of PC and the large sparse application depends on the domain of solution, the dimension of the targeted area, computational and communication pattern, the architecture of distributed parallel computing systems (DPCS), the structure of computational complexity and communication cost. The originality of this paper lies in obtaining the complex numerical model dealing with a large scale partial differential equation (PDE), discretization of finite difference (FDM) or finite element (FEM) methods, numerical simulation, high-performance simulation and performance measurement. The simulation of PDE will perform by sequential and parallel algorithms to visualize the complex model in high-resolution quality. In the context of a mathematical model, various independent and dependent parameters present the complex and real phenomena of the interdisciplinary application. As a model executes, these parameters can be manipulated and changed. As an impact, some chemical or mechanical properties can be predicted based on the observation of parameter changes. The methodologies of parallel programs build on the client-server model, slave-master model and fragmented model. HPC of the communication model for solving the interdisciplinary problems above will be analyzed using a flow of the algorithm, numerical analysis and the comparison of parallel performance evaluations. In conclusion, the integration of HPC, communication model, PC software, performance and numerical analysis happens to be an important approach to fulfill the matching requirement and optimize the solution of complex interdisciplinary problems

    Concurrent Classifier Error Detection (CCED) in Large Scale Machine Learning Systems

    Full text link
    The complexity of Machine Learning (ML) systems increases each year, with current implementations of large language models or text-to-image generators having billions of parameters and requiring billions of arithmetic operations. As these systems are widely utilized, ensuring their reliable operation is becoming a design requirement. Traditional error detection mechanisms introduce circuit or time redundancy that significantly impacts system performance. An alternative is the use of Concurrent Error Detection (CED) schemes that operate in parallel with the system and exploit their properties to detect errors. CED is attractive for large ML systems because it can potentially reduce the cost of error detection. In this paper, we introduce Concurrent Classifier Error Detection (CCED), a scheme to implement CED in ML systems using a concurrent ML classifier to detect errors. CCED identifies a set of check signals in the main ML system and feeds them to the concurrent ML classifier that is trained to detect errors. The proposed CCED scheme has been implemented and evaluated on two widely used large-scale ML models: Contrastive Language Image Pretraining (CLIP) used for image classification and Bidirectional Encoder Representations from Transformers (BERT) used for natural language applications. The results show that more than 95 percent of the errors are detected when using a simple Random Forest classifier that is order of magnitude simpler than CLIP or BERT. These results illustrate the potential of CCED to implement error detection in large-scale ML models

    A parallel self-organizing community detection algorithm based on swarm intelligence for large scale complex networks

    Get PDF
    Community detection is a critical task for complex network analysis. It helps us to understand the properties of the system that a complex network represents and has significance to a wide range of applications. Nowadays, the challenges faced by community detection algorithms include overlapping community structure detection, large scale network analysis, dynamic changing of analyzed network topology and many more. In this paper a self-organizing community detection algorithm, based on the idea of swarm intelligence, was proposed and its parallel algorithm was designed on Giraph++ which is a semi-asynchronous parallel graph computation framework running on distributed environment. In the algorithm, a network of large size is firstly divided into a number of small sub-networks. Then, each sub-network is modeled as a self-evolving swarm intelligence sub-system, while each vertex within the sub-network acts iteratively to join into or leave from communities based on a set of predefined vertex action rules. Meanwhile, the local communities of a sub-network are sent to other sub-networks to make their members have a chance to join into, therefore connecting these self-evolving swarm intelligence sub-systems together as a whole, large and evolving, system. The vertex actions during evolution of a sub-network are sent as well to keep multiple community replicas being consistent. Thus network communication efficiency has a great impact on the algorithm’s performance. While there is no vertex changing in its belonging communities anymore, an optimal community structure of the whole network will have emerged as a result. In the algorithm it is natural that a vertex can join into multiple communities simultaneously, thus can be used for overlapping community detection. The algorithm deals with vertex and edge adding or deleting in the same way as the algorithm running, therefore inherently supports dynamic network analysis. The algorithm can be used for the analysis of large scale networks with its parallel version running on distributed environment. A variety of experiments conducted on synthesized networks have shown that the proposed algorithm can effectively detect community structures and its performance is much better than certain popular community detection algorithms

    Evaluating the communications capabilities of the generalized hypercube interconnection network

    Get PDF
    This thesis presents results of evaluating the communications capabilities of the generalized hypercube interconnection network. The generalized hypercube has outstanding topological properties, but it has not been implemented in a large scale because of its very high wiring complexity. For this reason, this network has not been studied extensively in the past. However, recent and expected technological advancements will soon render this network viable for massively parallel systems. We first present implementations of randomized many-to-all broadcasting and multicasting on generalized hypercubes, using as the basis the one-to-all broadcast algorithm presented in [3]. We test the proposed implementations under realistic communication traffic patterns and message generations, for the all-port model of communication. Our results show that the size of the intermediate message buffers has a significant effect on the total communication time, and this effect becomes very dramatic for large systems with large numbers of dimensions. We also propose a modification of this multicast algorithm that applies congestion control to improve its performance. The results illustrate a significant improvement in the total execution time and a reduction in the number of message contentions, and also prove that the generalized hypercube is a very versatile interconnection network
    corecore