83 research outputs found

    Hardware Accelerated Molecular Docking: A Survey

    Get PDF

    GPU acceleration of a production molecular docking code

    Full text link
    Abstract: Modeling the interactions of biological molecules, or docking, is critical to both understand-ing basic life processes and to designing new drugs. Here we describe the GPU-based acceleration of a recently developed, complex, production docking code. We show how the various functions can be mapped to the GPU and present numerous optimizations. We find which parts of the problem domain are best suited to the different correlation methods. The GPU-accelerated system achieves a speedup of at least 16x for all likely problems sizes. This makes it competitive with FPGA-based systems for small molecule docking, and superior for protein-protein docking.

    GPU optimizations for a production molecular docking code

    Full text link
    Thesis (M.Sc.Eng.) -- Boston UniversityScientists have always felt the desire to perform computationally intensive tasks that surpass the capabilities of conventional single core computers. As a result of this trend, Graphics Processing Units (GPUs) have come to be increasingly used for general computation in scientific research. This field of GPU acceleration is now a vast and mature discipline. Molecular docking, the modeling of the interactions between two molecules, is a particularly computationally intensive task that has been the subject of research for many years. It is a critical simulation tool used for the screening of protein compounds for drug design and in research of the nature of life itself. The PIPER molecular docking program was previously accelerated using GPUs, achieving a notable speedup over conventional single core implementation. Since its original release the development of the CPU based PIPER has not ceased, and it is now a mature and fast parallel code. The GPU version, however, still contains many potential points for optimization. In the current work, we present a new version of GPU PIPER that attains a 3.3x speedup over a parallel MPI version of PIPER running on an 8 core machine and using the optimized Intel Math Kernel Library. We achieve this speedup by optimizing existing kernels for modern GPU architectures and migrating critical code segments to the GPU. In particular, we both improve the runtime of the filtering and scoring stages by more than an order of magnitude, and move all molecular data permanently to the GPU to improve data locality. This new speedup is obtained while retaining a computational accuracy virtually identical to the CPU based version. We also demonstrate that, due to the algorithmic dependencies of the PIPER algorithm on the 3D Fast Fourier Transform, our GPU PIPER will likely remain proportionally faster than equivalent CPU based implementations, and with little room for further optimizations. This new GPU accelerated version of PIPER is integrated as part of the ClusPro molecular docking and analysis server at Boston University. ClusPro has over 4000 registered users and more than 50000 jobs run over the past 4 years

    Bioinformatics

    Get PDF
    This book is divided into different research areas relevant in Bioinformatics such as biological networks, next generation sequencing, high performance computing, molecular modeling, structural bioinformatics, molecular modeling and intelligent data analysis. Each book section introduces the basic concepts and then explains its application to problems of great relevance, so both novice and expert readers can benefit from the information and research works presented here

    Mapping applications onto FPGA-centric clusters

    Full text link
    High Performance Computing (HPC) is becoming increasingly important throughout science and engineering as ever more complex problems must be solved through computational simulations. In these large computational applications, the latency of communication between processing nodes is often the key factor that limits performance. An emerging alternative computer architecture that addresses the latency problem is the FPGA-centric cluster (FCC); in these systems, the devices (FPGAs) are directly interconnected and thus many layers of hardware and software are avoided. The result can be scalability not currently achievable with other technologies. In FCCs, FPGAs serve multiple functions: accelerator, network interface card (NIC), and router. Moreover, because FPGAs are configurable, there is substantial opportunity to tailor the router hardware to the application; previous work has demonstrated that such application-aware configuration can effect a substantial improvement in hardware efficiency. One constraint of FCCs is that it is convenient for their interconnect to be static, direct, and have a two or three dimensional mesh topology. Thus, applications that are naturally of a different dimensionality (have a different logical topology) from that of the FCC must be remapped to obtain optimal performance. In this thesis we study various aspects of the mapping problem for FCCs. There are two major research thrusts. The first is finding the optimal mapping of logical to physical topology. This problem has received substantial attention by both the theory community, where topology mapping is referred to as graph embedding, and by the High Performance Computing (HPC) community, where it is a question of process placement. We explore the implications of the different mapping strategies on communication behavior in FCCs, especially on resulting load imbalance. The second major research thrust is built around the hypothesis that applications that need to be remapped (due to differing logical and physical topologies) will have different optimal router configurations from those applications that do not. For example, due to remapping, some virtual or physical communication links may have little occupancy; therefore fewer resources should be allocated to them. Critical here is the creation of a new set of parameterized hardware features that can be configured to best handle load imbalances caused by remapping. These two thrusts form a codesign loop: certain mapping algorithms may be differentially optimal due to application-aware router reconfiguration that accounts for this mapping. This thesis has four parts. The first part introduces the background and previous work related to communication in general and, in particular, how it is implemented in FCCs. We build on previous work on application-aware router configuration. The second part introduces topology mapping mechanisms including those derived from graph embeddings and a greedy algorithm commonly used in HPC. In the third part, topology mappings are evaluated for performance and imbalance; we note that different mapping strategies lead to different imbalances both in the overall network and in each node. The final part introduces reconfigure router design that allocates resources based on different imbalance situations caused by different mapping behaviors

    High performance <i>in silico</i> virtual drug screening on many-core processors

    Get PDF
    Drug screening is an important part of the drug development pipeline for the pharmaceutical industry. Traditional, lab-based methods are increasingly being augmented with computational methods, ranging from simple molecular similarity searches through more complex pharmacophore matching to more computationally intensive approaches, such as molecular docking. The latter simulates the binding of drug molecules to their targets, typically protein molecules. In this work, we describe BUDE, the Bristol University Docking Engine, which has been ported to the OpenCL industry standard parallel programming language in order to exploit the performance of modern many-core processors. Our highly optimized OpenCL implementation of BUDE sustains 1.43 TFLOP/s on a single Nvidia GTX 680 GPU, or 46% of peak performance. BUDE also exploits OpenCL to deliver effective performance portability across a broad spectrum of different computer architectures from different vendors, including GPUs from Nvidia and AMD, Intel’s Xeon Phi and multi-core CPUs with SIMD instruction sets

    Cooperative high-performance computing with FPGAs - matrix multiply case-study

    Get PDF
    In high-performance computing, there is great opportunity for systems that use FPGAs to handle communication while also performing computation on data in transit in an ``altruistic'' manner--that is, using resources for computation that might otherwise be used for communication, and in a way that improves overall system performance and efficiency. We provide a specific definition of \textbf{Computing in the Network} that captures this opportunity. We then outline some overall requirements and guidelines for cooperative computing that include this ability, and make suggestions for specific computing capabilities to be added to the networking hardware in a system. We then explore some algorithms running on a network so equipped for a few specific computing tasks: dense matrix multiplication, sparse matrix transposition and sparse matrix multiplication. In the first instance we give limits of problem size and estimates of performance that should be attainable with present-day FPGA hardware

    2020 NASA Technology Taxonomy

    Get PDF
    This document is an update (new photos used) of the PDF version of the 2020 NASA Technology Taxonomy that will be available to download on the OCT Public Website. The updated 2020 NASA Technology Taxonomy, or "technology dictionary", uses a technology discipline based approach that realigns like-technologies independent of their application within the NASA mission portfolio. This tool is meant to serve as a common technology discipline-based communication tool across the agency and with its partners in other government agencies, academia, industry, and across the world
    • …
    corecore