Search CORE

139 research outputs found

A Linear Algebra Approach to Fast DNA Mixture Analysis Using GPUs

Author: Helfer Brian
Kepner Jeremy
Reuther Albert
Ricke Darrell O.
Samsi Siddharth
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 03/07/2017
Field of study

Analysis of DNA samples is an important step in forensics, and the speed of analysis can impact investigations. Comparison of DNA sequences is based on the analysis of short tandem repeats (STRs), which are short DNA sequences of 2-5 base pairs. Current forensics approaches use 20 STR loci for analysis. The use of single nucleotide polymorphisms (SNPs) has utility for analysis of complex DNA mixtures. The use of tens of thousands of SNPs loci for analysis poses significant computational challenges because the forensic analysis scales by the product of the loci count and number of DNA samples to be analyzed. In this paper, we discuss the implementation of a DNA sequence comparison algorithm by re-casting the algorithm in terms of linear algebra primitives. By developing an overloaded matrix multiplication approach to DNA comparisons, we can leverage advances in GPU hardware and algoithms for Dense Generalized Matrix-Multiply (DGEMM) to speed up DNA sample comparisons. We show that it is possible to compare 2048 unknown DNA samples with 20 million known samples in under 6 seconds using a NVIDIA K80 GPU.Comment: Accepted for publication at the 2017 IEEE High Performance Extreme Computing conferenc

arXiv.org e-Print Archive

Crossref

GENIE: a software package for gene-gene interaction analysis in genetic association studies using multiple GPU or CPU cores

Author: BJ Keating
H Zhou
HJ Cordell
J He
J Marchini
JE Stone
Kai Wang
L Dematte
MC Schatz
Mingyao Li
NA Davis
S Purcell
Satish Chikkagoudar
T Schupbach
VW Lee
Publication venue: BioMed Central
Publication date: 01/05/2011
Field of study

Abstract Background Gene-gene interaction in genetic association studies is computationally intensive when a large number of SNPs are involved. Most of the latest Central Processing Units (CPUs) have multiple cores, whereas Graphics Processing Units (GPUs) also have hundreds of cores and have been recently used to implement faster scientific software. However, currently there are no genetic analysis software packages that allow users to fully utilize the computing power of these multi-core devices for genetic interaction analysis for binary traits. Findings Here we present a novel software package GENIE, which utilizes the power of multiple GPU or CPU processor cores to parallelize the interaction analysis. GENIE reads an entire genetic association study dataset into memory and partitions the dataset into fragments with non-overlapping sets of SNPs. For each fragment, GENIE analyzes: 1) the interaction of SNPs within it in parallel, and 2) the interaction between the SNPs of the current fragment and other fragments in parallel. We tested GENIE on a large-scale candidate gene study on high-density lipoprotein cholesterol. Using an NVIDIA Tesla C1060 graphics card, the GPU mode of GENIE achieves a speedup of 27 times over its single-core CPU mode run. Conclusions GENIE is open-source, economical, user-friendly, and scalable. Since the computing power and memory capacity of graphics cards are increasing rapidly while their cost is going down, we anticipate that GENIE will achieve greater speedups with faster GPU cards. Documentation, source code, and precompiled binaries can be downloaded from <url>http://www.cceb.upenn.edu/~mli/software/GENIE/</url>.</p

Crossref

Directory of Open Access Journals

PubMed Central

병렬화 용이한 통계계산 방법론과 현대 고성능 컴퓨팅 환경에의 적용

Author: 고세윤
Publication venue: 서울대학교 대학원
Publication date: 01/08/2020
Field of study

학위논문 (박사) -- 서울대학교 대학원 : 자연과학대학 통계학과, 2020. 8. 원중호.Technological advances in the past decade, hardware and software alike, have made access to high-performance computing (HPC) easier than ever. In this dissertation, easily-parallelizable, inversion-free, and variable-separated algorithms and their implementation in statistical computing are discussed. The first part considers statistical estimation problems under structured sparsity posed as minimization of a sum of two or three convex functions, one of which is a composition of non-smooth and linear functions. Examples include graph-guided sparse fused lasso and overlapping group lasso. Two classes of inversion-free primal-dual algorithms are considered and unified from a perspective of monotone operator theory. From this unification, a continuum of preconditioned forward-backward operator splitting algorithms amenable to parallel and distributed computing is proposed. The unification is further exploited to introduce a continuum of accelerated algorithms on which the theoretically optimal asymptotic rate of convergence is obtained. For the second part, easy-to-use distributed matrix data structures in PyTorch and Julia are presented. They enable users to write code once and run it anywhere from a laptop to a workstation with multiple graphics processing units (GPUs) or a supercomputer in a cloud. With these data structures, various parallelizable statistical applications, including nonnegative matrix factorization, positron emission tomography, multidimensional scaling, and ℓ1-regularized Cox regression, are demonstrated. The examples scale up to an 8-GPU workstation and a 720-CPU-core cluster in a cloud. As a case in point, the onset of type-2 diabetes from the UK Biobank with 400,000 subjects and about 500,000 single nucleotide polymorphisms is analyzed using the HPC ℓ1-regularized Cox regression. Fitting a half-million variate model took about 50 minutes, reconfirming known associations. To my knowledge, the feasibility of a joint genome-wide association analysis of survival outcomes at this scale is first demonstrated.지난 10년간의 하드웨어와 소프트웨어의 기술적인 발전은 고성능 컴퓨팅의 접근장벽을 그 어느 때보다 낮추었다. 이 학위논문에서는 병렬화 용이하고 역행렬 연산이 없는 변수 분리 알고리즘과 그 통계계산에서의 구현을 논의한다. 첫 부분은 볼록 함수 두 개 또는 세 개의 합으로 나타나는 구조화된 희소 통계 추정 문제에 대해 다룬다. 이 때 함수들 중 하나는 비평활 함수와 선형 함수의 합성으로 나타난다. 그 예시로는 그래프 구조를 통해 유도되는 희소 융합 Lasso 문제와 한 변수가 여러 그룹에 속할 수 있는 그룹 Lasso 문제가 있다. 이를 풀기 위해 역행렬 연산이 없는 두 종류의 원시-쌍대 (primal-dual) 알고리즘을 단조 연산자 이론 관점에서 통합하며 이를 통해 병렬화 용이한 precondition된 전방-후방 연산자 분할 알고리즘의 집합을 제안한다. 이 통합은 점근적으로 최적 수렴률을 갖는 가속 알고리즘의 집합을 구성하는 데 활용된다. 두 번째 부분에서는 PyTorch와 Julia를 통해 사용하기 쉬운 분산 행렬 자료 구조를 제시한다. 이 구조는 사용자들이 코드를 한 번 작성하면 이것을 노트북 한 대에서부터 여러 대의 그래픽 처리 장치 (GPU)를 가진 워크스테이션, 또는 클라우드 상에 있는 슈퍼컴퓨터까지 다양한 스케일에서 실행할 수 있게 해 준다. 아울러, 이 자료 구조를 비음 행렬 분해, 양전자 단층 촬영, 다차원 척 도법, ℓ1-벌점화 Cox 회귀 분석 등 다양한 병렬화 가능한 통계적 문제에 적용한다. 이 예시들은 8대의 GPU가 있는 워크스테이션과 720개의 코어가 있는 클라우드 상의 가상 클러스터에서 확장 가능했다. 한 사례로 400,000명의 대상과 500,000개의 단일 염기 다형성 정보가 있는 UK Biobank 자료에서의 제2형 당뇨병 (T2D) 발병 나이를 ℓ1-벌점화 Cox 회귀 모형을 통해 분석했다. 500,000개의 변수가 있는 모형을 적합시키는 데 50분 가량의 시간이 걸렸으며 알려진 T2D 관련 다형성들을 재확인할 수 있었다. 이러한 규모의 전유전체 결합 생존 분석은 최초로 시도된 것이다.Chapter1Prologue 1 1.1 Introduction 1 1.2 Accessible High-Performance Computing Systems 4 1.2.1 Preliminaries 4 1.2.2 Multiple CPU nodes: clusters, supercomputers, and clouds 7 1.2.3 Multi-GPU node 9 1.3 Highly Parallelizable Algorithms 12 1.3.1 MM algorithms 12 1.3.2 Proximal gradient descent 14 1.3.3 Proximal distance algorithm 16 1.3.4 Primal-dual methods 17 Chapter 2 Easily Parallelizable and Distributable Class of Algorithms for Structured Sparsity, with Optimal Acceleration 20 2.1 Introduction 20 2.2 Unification of Algorithms LV and CV (g ≡ 0) 30 2.2.1 Relation between Algorithms LV and CV 30 2.2.2 Unified algorithm class 34 2.2.3 Convergence analysis 35 2.3 Optimal acceleration 39 2.3.1 Algorithms 40 2.3.2 Convergence analysis 41 2.4 Stochastic optimal acceleration 45 2.4.1 Algorithm 45 2.4.2 Convergence analysis 47 2.5 Numerical experiments 50 2.5.1 Model problems 50 2.5.2 Convergence behavior 52 2.5.3 Scalability 62 2.6 Discussion 63 Chapter 3 Towards Unified Programming for High-Performance Statistical Computing Environments 66 3.1 Introduction 66 3.2 Related Software 69 3.2.1 Message-passing interface and distributed array interfaces 69 3.2.2 Unified array interfaces for CPU and GPU 69 3.3 Easy-to-use Software Libraries for HPC 70 3.3.1 Deep learning libraries and HPC 70 3.3.2 Case study: PyTorch versus TensorFlow 73 3.3.3 A brief introduction to PyTorch 76 3.3.4 A brief introduction to Julia 80 3.3.5 Methods and multiple dispatch 80 3.3.6 Multidimensional arrays 82 3.3.7 Matrix multiplication 83 3.3.8 Dot syntax for vectorization 86 3.4 Distributed matrix data structure 87 3.4.1 Distributed matrices in PyTorch: distmat 87 3.4.2 Distributed arrays in Julia: MPIArray 90 3.5 Examples 98 3.5.1 Nonnegative matrix factorization 100 3.5.2 Positron emission tomography 109 3.5.3 Multidimensional scaling 113 3.5.4 L1-regularized Cox regression 117 3.5.5 Genome-wide survival analysis of the UK Biobank dataset 121 3.6 Discussion 126 Chapter 4 Conclusion 131 Appendix A Monotone Operator Theory 134 Appendix B Proofs for Chapter II 139 B.1 Preconditioned forward-backward splitting 139 B.2 Optimal acceleration 147 B.3 Optimal stochastic acceleration 158 Appendix C AWS EC2 and ParallelCluster 168 C.1 Overview 168 C.2 Glossary 169 C.3 Prerequisites 172 C.4 Installation 173 C.5 Configuration 173 C.6 Creating, accessing, and destroying the cluster 178 C.7 Installation of libraries 178 C.8 Running a job 179 C.9 Miscellaneous 180 Appendix D Code for memory-efficient L1-regularized Cox proportional hazards model 182 Appendix E Details of SNPs selected in L1-regularized Cox regression 184 Bibliography 188 국문초록 212Docto

SNU Open Repository and Archive

Ant Colony Optimization

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

Ant Colony Optimization (ACO) is the best example of how studies aimed at understanding and modeling the behavior of ants and other social insects can provide inspiration for the development of computational algorithms for the solution of difficult mathematical problems. Introduced by Marco Dorigo in his PhD thesis (1992) and initially applied to the travelling salesman problem, the ACO field has experienced a tremendous growth, standing today as an important nature-inspired stochastic metaheuristic for hard optimization problems. This book presents state-of-the-art ACO methods and is divided into two parts: (I) Techniques, which includes parallel implementations, and (II) Applications, where recent contributions of ACO to diverse fields, such as traffic congestion and control, structural optimization, manufacturing, and genomics are presented

Directory of Open Access Books (DOAB)

Manycore Algorithms for Genetic Linkage Analysis

Author: Medlar AJ
Publication venue: UCL (University College London)
Publication date: 28/10/2012
Field of study

Exact algorithms to perform linkage analysis scale exponentially with the size of the input. Beyond a critical point, the amount of work that needs to be done exceeds both available time and memory. In these circumstances, we are forced to either abbreviate the input in some manner or else use an approximation. Approximate methods, like Markov chain Monte Carlo (MCMC), though they make the problem tractable, can take an immense amount of time to converge. The problem of high convergence time is compounded by software which is single-threaded and, as computer processors are manufactured with increasing numbers of physical processing cores, are not designed to take advantage of the available processing power. In this thesis, we will describe our program SwiftLink that embodies our work adapting existing Gibbs samplers to modern computer processor architectures. The processor architectures we target are: multicore processors, that currently feature between 4–8 processor cores, and computer graphics cards (GPUs) that already feature hundreds of processor cores. We implemented parallel versions of the meiosis sampler, that mixes well with tightly linked markers but suffers from irreducibility issues, and the locus sampler which is guaranteed to be irreducible but mixes slowly with tightly linked markers. We evaluate SwiftLink’s performance on real-world datasets of large consanguineous families. We demonstrate that using four processor cores for a single analysis is 3–3.2x faster than the single-threaded implementation of SwiftLink. With respect to the existing MCMC-based programs: it achieves a 6.6–8.7x speedup compared to Morgan and a 66.4– 72.3x speedup compared to Simwalk. Utilising both a multicore processor and a GPU performs 7–7.9x faster than the single-threaded implementation, a 17.6–19x speedup compared to Morgan and a 145.5–192.3x speedup compared to Simwalk

UCL Discovery

Proceedings of the 1st Computer Science Student Workshop: Koc University Istinye Campus, Istanbul, Turkey, February 21, 2010

Author
Publication venue: Sabancı University
Publication date: 01/01/2010
Field of study

Sabanci University Research Database

A Hybrid-parallel Architecture for Applications in Bioinformatics

Author: Kässens Jan Christian
Publication venue: Universitatsbibliothek Kiel
Publication date: 01/01/2017
Field of study

Since the advent of Next Generation Sequencing (NGS) technology, the amount of data from whole genome sequencing has been rising fast. In turn, the availability of these resources led to the tapping of whole new research fields in molecular and cellular biology, producing even more data. On the other hand, the available computational power is only increasing linearly. In recent years though, special-purpose high-performance devices started to become prevalent in today’s scientific data centers, namely graphics processing units (GPUs) and, to a lesser extent, field-programmable gate arrays (FPGAs). Driven by the need for performance, developers started porting regular applications to GPU frameworks and FPGA configurations to exploit the special operations only these devices may perform in a timely manner. However, applications using both accelerator technologies are still rare. Major challenges in joint GPU/FPGA application development include the required deep knowledge of associated programming paradigms and the efficient communication both types of devices. In this work, two algorithms from bioinformatics are implemented on a custom hybrid-parallel hardware architecture and a highly concurrent software platform. It is shown that such a solution is not only possible to develop but also its ability to outperform implementations on similar- sized GPU or FPGA clusters in terms of both performance and energy consumption. Both algorithms analyze case/control data from genome- wide association studies to find interactions between two or three genes with different methods. Especially in the latter case, the newly available calculation power and method enables analyses of large data sets for the first time without occupying whole data centers for weeks. The success of the hybrid-parallel architecture proposal led to the development of a high- end array of FPGA/GPU accelerator pairs to provide even better runtimes and more possibilities

MACAU: Open Access Repository of Kiel University

Transcript assembly and abundance estimation with high-throughput RNA sequencing

Author: Trapnell Bruce Colston
Publication venue
Publication date: 01/01/2010
Field of study

We present algorithms and statistical methods for the reconstruction and abundance estimation of transcript sequences from high throughput RNA sequencing ("RNA-Seq"). We evaluate these approaches through large-scale experiments of a well studied model of muscle development. We begin with an overview of sequencing assays and outline why the short read alignment problem is fundamental to the analysis of these assays. We then describe two approaches to the contiguous alignment problem, one of which uses massively parallel graphics hardware to accelerate alignment, and one of which exploits an indexing scheme based on the Burrows-Wheeler transform. We then turn to the spliced alignment problem, which is fundamental to RNA-Seq, and present an algorithm, TopHat. TopHat is the first algorithm that can align the reads from an entire RNA-Seq experiment to a large genome without the aid of reference gene models. In the second part of the thesis, we present the first comparative RNA-Seq as- sembly algorithm, Cufflinks, which is adapted from a constructive proof of Dilworth's Theorem, a classic result in combinatorics. We evaluate Cufflinks by assembling the transcriptome from a time course RNA-Seq experiment of developing skeletal muscle cells. The assembly contains 13,689 known transcripts and 3,724 novel ones. Of the novel transcripts, 62% were strongly supported by earlier sequencing experiments or by homologous transcripts in other organisms. We further validated interesting genes with isoform-specific RT-PCR. We then present a statistical model for RNA-Seq included in Cufflinks and with which we estimate abundances of transcripts from RNA-seq data. Simulation studies demonstrate that the model is highly accurate. We apply this model to the muscle data, and track the abundances of individual isoforms over development. Finally, we present significance tests for changes in relative and absolute abundances between time points, which we employ to uncover differential expression and differential regulation. By testing for relative abundance changes within and between transcripts sharing a transcription start site, we find significant shifts in the rates of alternative splicing and promoter preference in hundreds of genes, including those believed to regulate muscle development

Digital Repository at the University of Maryland

FPGAs in Bioinformatics: Implementation and Evaluation of Common Bioinformatics Algorithms in Reconfigurable Logic

Author: Wienbrandt Lars
Publication venue
Publication date: 01/01/2016
Field of study

Life. Much effort is taken to grant humanity a little insight in this fascinating and complex but fundamental topic. In order to understand the relations and to derive consequences humans have begun to sequence their genomes, i.e. to determine their DNA sequences to infer information, e.g. related to genetic diseases. The process of DNA sequencing as well as subsequent analysis presents a computational challenge for recent computing systems due to the large amounts of data alone. Runtimes of more than one day for analysis of simple datasets are common, even if the process is already run on a CPU cluster. This thesis shows how this general problem in the area of bioinformatics can be tackled with reconfigurable hardware, especially FPGAs. Three compute intensive problems are highlighted: sequence alignment, SNP interaction analysis and genotype imputation. In the area of sequence alignment the software BLASTp for protein database searches is exemplarily presented, implemented and evaluated.SNP interaction analysis is presented with three applications performing an exhaustive search for interactions including the corresponding statistical tests: BOOST, iLOCi and the mutual information measurement. All applications are implemented in FPGA-hardware and evaluated, resulting in an impressive speedup of more than in three orders of magnitude when compared to standard computers. The last topic of genotype imputation presents a two-step process composed of the phasing step and the actual imputation step. The focus lies on the phasing step which is targeted by the SHAPEIT2 application. SHAPEIT2 is discussed with its underlying mathematical methods in detail, and finally implemented and evaluated. A remarkable speedup of 46 is reached here as well

MACAU: Open Access Repository of Kiel University