Search CORE

125 research outputs found

Accelerating the pace of protein functional annotation with intel xeon phi coprocessors

Author: Brylinski Michal
Feinstein Wei P.
Jarrell Mark
Moreno Juana
Publication venue: LSU Digital Commons
Publication date: 01/06/2015
Field of study

© 2002-2011 IEEE. Intel Xeon Phi is a new addition to the family of powerful parallel accelerators. The range of its potential applications in computationally driven research is broad; however, at present, the repository of scientific codes is still relatively limited. In this study, we describe the development and benchmarking of a parallel version of {\mmb e}FindSite, a structural bioinformatics algorithm for the prediction of ligand-binding sites in proteins. Implemented for the Intel Xeon Phi platform, the parallelization of the structure alignment portion of {\mmb e}FindSite using pragma-based OpenMP brings about the desired performance improvements, which scale well with the number of computing cores. Compared to a serial version, the parallel code runs 11.8 and 10.1 times faster on the CPU and the coprocessor, respectively; when both resources are utilized simultaneously, the speedup is 17.6. For example, ligand-binding predictions for 501 benchmarking proteins are completed in 2.1 hours on a single Stampede node equipped with the Intel Xeon Phi card compared to 3.1 hours without the accelerator and 36.8 hours required by a serial version. In addition to the satisfactory parallel performance, porting existing scientific codes to the Intel Xeon Phi architecture is relatively straightforward with a short development time due to the support of common parallel programming models by the coprocessor. The parallel version of {\mmb e}FindSite is freely available to the academic community at www.brylinski.org/efindsite

Louisiana State University

Aceleración de algoritmos de procesamiento de imágenes para el análisis de partículas individuales con microscopia electrónica

Author: Strelák David
Publication venue
Publication date: 24/10/2022
Field of study

Tesis Doctoral inédita cotutelada por la Masaryk University (República Checa) y la Universidad Autónoma de Madrid, Escuela Politécnica Superior, Departamento de Ingeniería Informática. Fecha de Lectura: 24-10-2022Cryogenic Electron Microscopy (Cryo-EM) is a vital field in current structural biology. Unlike X-ray crystallography and Nuclear Magnetic Resonance, it can be used to analyze membrane proteins and other samples with overlapping spectral peaks. However, one of the significant limitations of Cryo-EM is the computational complexity. Modern electron microscopes can produce terabytes of data per single session, from which hundreds of thousands of particles must be extracted and processed to obtain a near-atomic resolution of the original sample. Many existing software solutions use high-Performance Computing (HPC) techniques to bring these computations to the realm of practical usability. The common approach to acceleration is parallelization of the processing, but in praxis, we face many complications, such as problem decomposition, data distribution, load scheduling, balancing, and synchronization. Utilization of various accelerators further complicates the situation, as heterogeneous hardware brings additional caveats, for example, limited portability, under-utilization due to synchronization, and sub-optimal code performance due to missing specialization. This dissertation, structured as a compendium of articles, aims to improve the algorithms used in Cryo-EM, esp. the SPA (Single Particle Analysis). We focus on the single-node performance optimizations, using the techniques either available or developed in the HPC field, such as heterogeneous computing or autotuning, which potentially needs the formulation of novel algorithms. The secondary goal of the dissertation is to identify the limitations of state-of-the-art HPC techniques. Since the Cryo-EM pipeline consists of multiple distinct steps targetting different types of data, there is no single bottleneck to be solved. As such, the presented articles show a holistic approach to performance optimization. First, we give details on the GPU acceleration of the specific programs. The achieved speedup is due to the higher performance of the GPU, adjustments of the original algorithm to it, and application of the novel algorithms. More specifically, we provide implementation details of programs for movie alignment, 2D classification, and 3D reconstruction that have been sped up by order of magnitude compared to their original multi-CPU implementation or sufficiently the be used on-the-fly. In addition to these three programs, multiple other programs from an actively used, open-source software package XMIPP have been accelerated and improved. Second, we discuss our contribution to HPC in the form of autotuning. Autotuning is the ability of software to adapt to a changing environment, i.e., input or executing hardware. Towards that goal, we present cuFFTAdvisor, a tool that proposes and, through autotuning, finds the best configuration of the cuFFT library for given constraints of input size and plan settings. We also introduce a benchmark set of ten autotunable kernels for important computational problems implemented in OpenCL or CUDA, together with the introduction of complex dynamic autotuning to the KTT tool. Third, we propose an image processing framework Umpalumpa, which combines a task-based runtime system, data-centric architecture, and dynamic autotuning. The proposed framework allows for writing complex workflows which automatically use available HW resources and adjust to different HW and data but at the same time are easy to maintainThe project that gave rise to these results received the support of a fellowship from the “la Caixa” Foundation (ID 100010434). The fellowship code is LCF/BQ/DI18/11660021. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No. 71367

Biblos-e Archivo

Simulating Nonlinear Neutrino Oscillations on Next-Generation Many-Core Architectures

Author: Noormofidi Seyed Vahid
Publication venue: UNM Digital Repository
Publication date: 09/06/2016
Field of study

In this work an astrophysical simulation code, XFLAT, is developed to study neutrino oscillations in supernovae. XFLAT is a hybrid modular code which was designed to utilize multiple levels of parallelism through MPI, OpenMP, and SIMD instructions (vectorization). It can run on both the CPU and the Xeon Phi co-processor, the latter of which is based on the Intel Many Integrated Core Architecture (MIC). The performance of XFLAT on various system configurations and physics scenarios has been analyzed. In addition, the impact of I/O and the multi-node configuration on the Xeon Phi-equipped heterogeneous supercomputers such as Stampede at the Texas Advanced Computing Center (TACC) was investigated

Execução eficiente do padrão de propagação de ondas irregulares na arquitetura Many Integrated Core

Author: Gomes Jeremias Moreira
Publication venue: 'Biblioteca Central da UNB'
Publication date: 29/01/2016
Field of study

Dissertação (mestrado)—Universidade de Brasília, Instituto de Ciências Exatas, Departamento de Ciência da Computação, Programa de Pós-Graducação em Informática, 2016.A execução eficiente de algoritmos de processamento de imagens é uma área ativa da Bioinformática. Uma das classes de algoritmos em processamento de imagens ou de padrão de computação comum nessa área é a Irregular Wavefront Propagation Pattern (IWPP). Nessa classe, elementos propagam informações para seus vizinhos em forma de ondas de propagação. Esse padrão de propagação resulta em acessos a dados e expansões irregulares. Por essa característica irregular, implementações paralelas atuais dessa classe de algoritmos necessitam de operações atômicas, o que acaba sendo muito custoso e também inviabiliza a implementação por meio de instruções Single Instruction, Multiple Data (SIMD) na arquitetura Many Integrated Core (MIC), que são fundamentais para atingir alto desempenho nessa arquitetura. O objetivo deste trabalho é reprojetar o algoritmo Irregular Wavefront Propagation Pattern, de forma a possibilitar sua eficiente execução em processadores com arquitetura Many Integrated Core que utilizem instruções SIMD. Neste trabalho, utilizando o Intel® Xeon Phi™, foram implementadas uma versão vetorizada, apresentando ganhos de até 5:63 em relação à versão não-vetorizada; uma versão paralela utilizando fila First In, First Out (FIFO) cuja escalabilidade demonstrou-se boa com speedups em torno de 55 em relação à um núcleo do coprocessador; uma versão utilizando fila de prioridades cuja velocidade foi de 1:62 mais veloz que a versão mais rápida em GPU conhecida na literatura, e uma versão cooperativa entre processadores heterogêneos que permitem processar imagens que ultrapassem a capacidade de memória do Intel® Xeon Phi™, e também possibilita a utilização de múltiplos dispositivos na execução do algoritmo.The efficient execution of image processing algorithms is an active area of Bioinformatics. In image processing, one of the classes of algorithms or computing pattern that works with irregular data structures is the Irregular Wavefront Propagation Pattern (IWPP). In this class, elements propagate information to neighbors in the form of wave propagation. This propagation results in irregular access to data and expansions. Due to this irregularity, current implementations of this class of algorithms requires atomic operations, which is very costly and also restrains implementations with Single Instruction, Multiple Data (SIMD) instructions in Many Integrated Core (MIC) architectures, which are critical to attain high performance on this processor. The objective of this study is to redesign the Irregular Wavefront Propagation Pattern algorithm in order to enable the efficient execution on processors with Many Integrated Core architecture using SIMD instructions. In this work, using the Intel® Xeon Phi™ coprocessor, we have implemented a vector version of IWPP with up to 5:63 gains on non-vectored version, a parallel version using First In, First Out (FIFO) queue that attained speedup up to 55 as compared to the single core version on the coprocessor, a version using priority queue whose performance was 1:62 better than the fastest version of GPU based implementation available in the literature, and a cooperative version between heterogeneous processors that allow to process images bigger than the Intel® Xeon Phi™ memory and also provides a way to utilize all the available devices in the computation

Repositório Institucional da Universidade de Brasília

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Proceedings, MSVSCC 2015

Author: Old Dominion University Department of Modeling, Simulation & Visualization Engineering
Old Dominion University Virginia Modeling, Analysis & Simulation Center
Publication venue: ODU Digital Commons
Publication date: 16/04/2015
Field of study

The Virginia Modeling, Analysis and Simulation Center (VMASC) of Old Dominion University hosted the 2015 Modeling, Simulation, & Visualization Student capstone Conference on April 16th. The Capstone Conference features students in Modeling and Simulation, undergraduates and graduate degree programs, and fields from many colleges and/or universities. Students present their research to an audience of fellow students, faculty, judges, and other distinguished guests. For the students, these presentations afford them the opportunity to impart their innovative research to members of the M&S community from academic, industry, and government backgrounds. Also participating in the conference are faculty and judges who have volunteered their time to impart direct support to their students’ research, facilitate the various conference tracks, serve as judges for each of the tracks, and provide overall assistance to this conference. 2015 marks the ninth year of the VMASC Capstone Conference for Modeling, Simulation and Visualization. This year our conference attracted a number of fine student written papers and presentations, resulting in a total of 51 research works that were presented. This year’s conference had record attendance thanks to the support from the various different departments at Old Dominion University, other local Universities, and the United States Military Academy, at West Point. We greatly appreciated all of the work and energy that has gone into this year’s conference, it truly was a highly collaborative effort that has resulted in a very successful symposium for the M&S community and all of those involved. Below you will find a brief summary of the best papers and best presentations with some simple statistics of the overall conference contribution. Followed by that is a table of contents that breaks down by conference track category with a copy of each included body of work. Thank you again for your time and your contribution as this conference is designed to continuously evolve and adapt to better suit the authors and M&S supporters. Dr.Yuzhong Shen Graduate Program Director, MSVE Capstone Conference Chair John ShullGraduate Student, MSVE Capstone Conference Student Chai

Old Dominion University