5 research outputs found

    Out of kernel tuning and optimizations for portable large-scale docking experiments on GPUs

    Get PDF
    Virtual screening is an early stage in the drug discovery process that selects the most promising candidates. In the urgent computing scenario, finding a solution in the shortest time frame is critical. Any improvement in the performance of a virtual screening application translates into an increase in the number of candidates evaluated, thereby raising the probability of finding a drug. In this paper, we show how we can improve application throughput using Out-of-kernel optimizations. They use input features, kernel requirements, and architectural features to rearrange the kernel inputs, executing them out of order, to improve the computation efficiency. These optimizations’ implementations are designed on an extreme-scale virtual screening application, named LiGen, that can hinge on CUDA and SYCL kernels to carry out the computation on modern supercomputer nodes. Even if they are tailored to a single application, they might also be of interest for applications that share a similar design pattern. The experimental results show how these optimizations can increase kernel performance by 2 X, respectively, up to 2.2X in CUDA and up to 1.9X, in SYCL. Moreover, the reported speedup can be achieved with the best-proposed parameterization, as shown by the data we collected and reported in this manuscript

    A fixed-point based distributed method for energy flow calculation in multi-energy systems

    Get PDF
    Multi-energy flow calculation (M-EFC) is an essential tool for the coordinated analysis of strongly coupled electricity-gas-heating systems. However, the separate management of these subsystems poses a considerable challenge for designing a fast and reliable M-EFC method. In this paper, a fixed-point based distributed method is proposed for the M-EFC problem. The proposed method can preserve the autonomy of subsystems due to limited information exchange during the solution process. Moreo-ver, the fast and reliable convergence is achieved according to the proposed sufficient conditions based on the fixed-point theory. Besides, the proposed method is availa-ble for multi-energy systems (MES) with various coupling relationships and different structures of information ex-change. Simulations on a MES demonstrate that the pro-posed method has remarkable superiority compared to the unified Newton-Raphson method in computation time, accuracy and robustness against data loss

    GPU-Accelerated Batch-ACPF Solution for N-1 Static Security Analysis

    No full text
    Graphics processing unit (GPU) has been applied successfully in many scientific computing realms due to its superior performances on float-pointing calculation and memory bandwidth, and has great potential in power system applications. The N-1 static security analysis (SSA) appears to be a candidate application in which massive alternating current power flow (ACPF) problems need to be solved. However, when applying existing GPU-accelerated algorithms to solve N-1 SSA problem, the degree of parallelism is limited because existing researches have been devoted to accelerating the solution of a single ACPF. This paper therefore proposes a GPU-accelerated solution that creates an additional layer of parallelism among batch ACPFs and consequently achieves a much higher level of overall parallelism. First, this paper establishes two basic principles for determining well-designed GPU algorithms, through which the limitation of GPU-accelerated sequential-ACPF solution is demonstrated. Next, being the first of its kind, this paper proposes a novel GPU-accelerated batch-QR solver, which packages massive number of QR tasks to formulate a new larger-scale problem and then achieves higher level of parallelism and better coalesced memory accesses. To further improve the efficiency of solving SSA, a GPU-accelerated batch-Jacobian-Matrix generating and contingency screening is developed and carefully optimized. Lastly, the complete process of the proposed GPU-accelerated batch-ACPF solution for SSA is presented. Case studies on an 8503-bus system show dramatic computation time reduction is achieved compared with all reported existing GPU-accelerated methods. In comparison to UMFPACK-library-based single-CPU counterpart using Intel Xeon E5-2620, the proposed GPU-accelerated SSA framework using NVIDIA K20C achieves up to 57.6 times speedup. It can even achieve four times speedup when compared to one of the fastest multi-core CPU parallel computing solution using KLU library. The proposed batch-solving method is practically very promising and lays a critical foundation for many other power system applications that need to deal with massive subtasks, such as Monte-Carlo simulation and probabilistic power flow

    GPU-Accelerated Batch-ACPF Solution for N-1 Static Security Analysis

    No full text
    corecore