11 research outputs found

    Fast network centrality analysis using GPUs

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>With the exploding volume of data generated by continuously evolving high-throughput technologies, biological network analysis problems are growing larger in scale and craving for more computational power. General Purpose computation on Graphics Processing Units (GPGPU) provides a cost-effective technology for the study of large-scale biological networks. Designing algorithms that maximize data parallelism is the key in leveraging the power of GPUs.</p> <p>Results</p> <p>We proposed an efficient data parallel formulation of the All-Pairs Shortest Path problem, which is the key component for shortest path-based centrality computation. A betweenness centrality algorithm built upon this formulation was developed and benchmarked against the most recent GPU-based algorithm. Speedup between 11 to 19% was observed in various simulated scale-free networks. We further designed three algorithms based on this core component to compute closeness centrality, eccentricity centrality and stress centrality. To make all these algorithms available to the research community, we developed a software package <it>gpu</it>-<it>fan </it>(GPU-based Fast Analysis of Networks) for CUDA enabled GPUs. Speedup of 10-50× compared with CPU implementations was observed for simulated scale-free networks and real world biological networks.</p> <p>Conclusions</p> <p><it>gpu</it>-<it>fan </it>provides a significant performance improvement for centrality computation in large-scale networks. Source code is available under the GNU Public License (GPL) at <url>http://bioinfo.vanderbilt.edu/gpu-fan/</url>.</p

    GPUmotif: An Ultra-Fast and Energy-Efficient Motif Analysis Program Using Graphics Processing Units

    Get PDF
    Computational detection of TF binding patterns has become an indispensable tool in functional genomics research. With the rapid advance of new sequencing technologies, large amounts of protein-DNA interaction data have been produced. Analyzing this data can provide substantial insight into the mechanisms of transcriptional regulation. However, the massive amount of sequence data presents daunting challenges. In our previous work, we have developed a novel algorithm called Hybrid Motif Sampler (HMS) that enables more scalable and accurate motif analysis. Despite much improvement, HMS is still time-consuming due to the requirement to calculate matching probabilities position-by-position. Using the NVIDIA CUDA toolkit, we developed a graphics processing unit (GPU)-accelerated motif analysis program named GPUmotif. We proposed a “fragmentation" technique to hide data transfer time between memories. Performance comparison studies showed that commonly-used model-based motif scan and de novo motif finding procedures such as HMS can be dramatically accelerated when running GPUmotif on NVIDIA graphics cards. As a result, energy consumption can also be greatly reduced when running motif analysis using GPUmotif. The GPUmotif program is freely available at http://sourceforge.net/projects/gpumotif

    gpusvcalibration: A R Package for Fast Stochastic Volatility Model Calibration Using GPUs

    Get PDF
    In this paper we describe the gpusvcalibration R package for accelerating stochastic volatility model calibration on GPUs. The package is designed for use with existing CRAN packages for optimization such as DEOptim and nloptr. Stochastic volatility models are used extensively across the capital markets for pricing and risk management of exchange traded financial options. However, there are many challenges to calibration, including comparative assessment of the robustness of different models and optimization routines. For example, we observe that when fitted to sub-minute level midmarket quotes, models require frequent calibration every few minutes and the quality of the fit is routine sensitive. The R statistical software environment is popular with quantitative analysts in the financial industry partly because it facilitates application design space exploration. However, a typical R based implementation of a stochastic volatility model calibration on a CPU does not meet the performance requirements for sub-minute level trading, i.e. mid to high frequency trading.We identified the most computationally intensive part of the calibration process in R and off-loaded that to the GPU.We created a map-reduce interface to the computationally intensive kernel so that it can be easily integrated in a variety of R based calibration codes using our package. We demonstrate that the new R based implementation using our package is comparable in performance to aC=C++ GPU based calibration code

    병렬화 용이한 통계계산 방법론과 현대 고성능 컴퓨팅 환경에의 적용

    Get PDF
    학위논문 (박사) -- 서울대학교 대학원 : 자연과학대학 통계학과, 2020. 8. 원중호.Technological advances in the past decade, hardware and software alike, have made access to high-performance computing (HPC) easier than ever. In this dissertation, easily-parallelizable, inversion-free, and variable-separated algorithms and their implementation in statistical computing are discussed. The first part considers statistical estimation problems under structured sparsity posed as minimization of a sum of two or three convex functions, one of which is a composition of non-smooth and linear functions. Examples include graph-guided sparse fused lasso and overlapping group lasso. Two classes of inversion-free primal-dual algorithms are considered and unified from a perspective of monotone operator theory. From this unification, a continuum of preconditioned forward-backward operator splitting algorithms amenable to parallel and distributed computing is proposed. The unification is further exploited to introduce a continuum of accelerated algorithms on which the theoretically optimal asymptotic rate of convergence is obtained. For the second part, easy-to-use distributed matrix data structures in PyTorch and Julia are presented. They enable users to write code once and run it anywhere from a laptop to a workstation with multiple graphics processing units (GPUs) or a supercomputer in a cloud. With these data structures, various parallelizable statistical applications, including nonnegative matrix factorization, positron emission tomography, multidimensional scaling, and ℓ1-regularized Cox regression, are demonstrated. The examples scale up to an 8-GPU workstation and a 720-CPU-core cluster in a cloud. As a case in point, the onset of type-2 diabetes from the UK Biobank with 400,000 subjects and about 500,000 single nucleotide polymorphisms is analyzed using the HPC ℓ1-regularized Cox regression. Fitting a half-million variate model took about 50 minutes, reconfirming known associations. To my knowledge, the feasibility of a joint genome-wide association analysis of survival outcomes at this scale is first demonstrated.지난 10년간의 하드웨어와 소프트웨어의 기술적인 발전은 고성능 컴퓨팅의 접근장벽을 그 어느 때보다 낮추었다. 이 학위논문에서는 병렬화 용이하고 역행렬 연산이 없는 변수 분리 알고리즘과 그 통계계산에서의 구현을 논의한다. 첫 부분은 볼록 함수 두 개 또는 세 개의 합으로 나타나는 구조화된 희소 통계 추정 문제에 대해 다룬다. 이 때 함수들 중 하나는 비평활 함수와 선형 함수의 합성으로 나타난다. 그 예시로는 그래프 구조를 통해 유도되는 희소 융합 Lasso 문제와 한 변수가 여러 그룹에 속할 수 있는 그룹 Lasso 문제가 있다. 이를 풀기 위해 역행렬 연산이 없는 두 종류의 원시-쌍대 (primal-dual) 알고리즘을 단조 연산자 이론 관점에서 통합하며 이를 통해 병렬화 용이한 precondition된 전방-후방 연산자 분할 알고리즘의 집합을 제안한다. 이 통합은 점근적으로 최적 수렴률을 갖는 가속 알고리즘의 집합을 구성하는 데 활용된다. 두 번째 부분에서는 PyTorch와 Julia를 통해 사용하기 쉬운 분산 행렬 자료 구조를 제시한다. 이 구조는 사용자들이 코드를 한 번 작성하면 이것을 노트북 한 대에서부터 여러 대의 그래픽 처리 장치 (GPU)를 가진 워크스테이션, 또는 클라우드 상에 있는 슈퍼컴퓨터까지 다양한 스케일에서 실행할 수 있게 해 준다. 아울러, 이 자료 구조를 비음 행렬 분해, 양전자 단층 촬영, 다차원 척 도법, ℓ1-벌점화 Cox 회귀 분석 등 다양한 병렬화 가능한 통계적 문제에 적용한다. 이 예시들은 8대의 GPU가 있는 워크스테이션과 720개의 코어가 있는 클라우드 상의 가상 클러스터에서 확장 가능했다. 한 사례로 400,000명의 대상과 500,000개의 단일 염기 다형성 정보가 있는 UK Biobank 자료에서의 제2형 당뇨병 (T2D) 발병 나이를 ℓ1-벌점화 Cox 회귀 모형을 통해 분석했다. 500,000개의 변수가 있는 모형을 적합시키는 데 50분 가량의 시간이 걸렸으며 알려진 T2D 관련 다형성들을 재확인할 수 있었다. 이러한 규모의 전유전체 결합 생존 분석은 최초로 시도된 것이다.Chapter1Prologue 1 1.1 Introduction 1 1.2 Accessible High-Performance Computing Systems 4 1.2.1 Preliminaries 4 1.2.2 Multiple CPU nodes: clusters, supercomputers, and clouds 7 1.2.3 Multi-GPU node 9 1.3 Highly Parallelizable Algorithms 12 1.3.1 MM algorithms 12 1.3.2 Proximal gradient descent 14 1.3.3 Proximal distance algorithm 16 1.3.4 Primal-dual methods 17 Chapter 2 Easily Parallelizable and Distributable Class of Algorithms for Structured Sparsity, with Optimal Acceleration 20 2.1 Introduction 20 2.2 Unification of Algorithms LV and CV (g ≡ 0) 30 2.2.1 Relation between Algorithms LV and CV 30 2.2.2 Unified algorithm class 34 2.2.3 Convergence analysis 35 2.3 Optimal acceleration 39 2.3.1 Algorithms 40 2.3.2 Convergence analysis 41 2.4 Stochastic optimal acceleration 45 2.4.1 Algorithm 45 2.4.2 Convergence analysis 47 2.5 Numerical experiments 50 2.5.1 Model problems 50 2.5.2 Convergence behavior 52 2.5.3 Scalability 62 2.6 Discussion 63 Chapter 3 Towards Unified Programming for High-Performance Statistical Computing Environments 66 3.1 Introduction 66 3.2 Related Software 69 3.2.1 Message-passing interface and distributed array interfaces 69 3.2.2 Unified array interfaces for CPU and GPU 69 3.3 Easy-to-use Software Libraries for HPC 70 3.3.1 Deep learning libraries and HPC 70 3.3.2 Case study: PyTorch versus TensorFlow 73 3.3.3 A brief introduction to PyTorch 76 3.3.4 A brief introduction to Julia 80 3.3.5 Methods and multiple dispatch 80 3.3.6 Multidimensional arrays 82 3.3.7 Matrix multiplication 83 3.3.8 Dot syntax for vectorization 86 3.4 Distributed matrix data structure 87 3.4.1 Distributed matrices in PyTorch: distmat 87 3.4.2 Distributed arrays in Julia: MPIArray 90 3.5 Examples 98 3.5.1 Nonnegative matrix factorization 100 3.5.2 Positron emission tomography 109 3.5.3 Multidimensional scaling 113 3.5.4 L1-regularized Cox regression 117 3.5.5 Genome-wide survival analysis of the UK Biobank dataset 121 3.6 Discussion 126 Chapter 4 Conclusion 131 Appendix A Monotone Operator Theory 134 Appendix B Proofs for Chapter II 139 B.1 Preconditioned forward-backward splitting 139 B.2 Optimal acceleration 147 B.3 Optimal stochastic acceleration 158 Appendix C AWS EC2 and ParallelCluster 168 C.1 Overview 168 C.2 Glossary 169 C.3 Prerequisites 172 C.4 Installation 173 C.5 Configuration 173 C.6 Creating, accessing, and destroying the cluster 178 C.7 Installation of libraries 178 C.8 Running a job 179 C.9 Miscellaneous 180 Appendix D Code for memory-efficient L1-regularized Cox proportional hazards model 182 Appendix E Details of SNPs selected in L1-regularized Cox regression 184 Bibliography 188 국문초록 212Docto

    High-Performance Statistical Computing in the Computing Environments of the 2020s

    Full text link
    Technological advances in the past decade, hardware and software alike, have made access to high-performance computing (HPC) easier than ever. We review these advances from a statistical computing perspective. Cloud computing makes access to supercomputers affordable. Deep learning software libraries make programming statistical algorithms easy and enable users to write code once and run it anywhere -- from a laptop to a workstation with multiple graphics processing units (GPUs) or a supercomputer in a cloud. Highlighting how these developments benefit statisticians, we review recent optimization algorithms that are useful for high-dimensional models and can harness the power of HPC. Code snippets are provided to demonstrate the ease of programming. We also provide an easy-to-use distributed matrix data structure suitable for HPC. Employing this data structure, we illustrate various statistical applications including large-scale positron emission tomography and 1\ell_1-regularized Cox regression. Our examples easily scale up to an 8-GPU workstation and a 720-CPU-core cluster in a cloud. As a case in point, we analyze the onset of type-2 diabetes from the UK Biobank with 200,000 subjects and about 500,000 single nucleotide polymorphisms using the HPC 1\ell_1-regularized Cox regression. Fitting this half-million-variate model takes less than 45 minutes and reconfirms known associations. To our knowledge, this is the first demonstration of the feasibility of penalized regression of survival outcomes at this scale.Comment: Accepted for publication in Statistical Scienc

    Graphenrekonstruktion anhand abhängiger Zeitreihen in biologischen Netzwerken

    Get PDF
    Die Biologie befasst sich mit dem Aufbau und der Organisation von Lebewesen. Bei beiden Aspekten finden sich auf verschiedenen Abstraktionsebenen Phänomene, die sich als Netzwerke interpretieren lassen. Ein makroskopisches Beispiel dafür sind Räuber-BeuteBeziehungen (z. B. Größe einer Fuchspopulation in Abhängigkeit von ihren Beutetieren wie Kaninchen, Hühnern, etc.). Es ist leicht ersichtlich, dass die Größen der Populationen jeweils voneinander abhängen und eine wechselseitige Dynamik widerspiegeln. Auf molekularer Ebene gibt es ebenfalls Beispiele für Interaktionen, die sich über ein dynamisches Netzwerk beschreiben lassen, etwa bei zellulären Prozessen. Ein Beispiel hierfür ist die Katalyse einer chemischen Reaktion mittels eines Enzyms. Die Konzentration des Enzyms und der beteiligten Substanzen beeinflussen dabei die Geschwindigkeit, mit welcher der Stoffwechselprozess abläuft. Mit dieser (makro)molekularen Ebene beschäftigt sich diese Arbeit. Wie wichtig ein funktionierendes Netzwerk ist, wird deutlich wenn man ein gestörtes System betrachtet, etwa wenn eingeschleppte Arten ein Ökosystem aus dem Gleichgewicht bringen. Ein aktuelles Beispiel dazu ist der amerikanische Kalikokrebs (Orconectes immunis ), der sich derzeit in Europa schnell ausbreitet, da ihm natürliche Feinde fehlen. Gleichzeitig bedroht er durch seinen Ressourcenverbrauch Tierarten wie Libellen, Amphibien und einheimische Krebse. Auf zellulärer Ebene kann eine Störung des Netzwerks der DNA-Reparatur und der Zellzykluskontrolle zu der Entstehung von Krebs führen. Die DNA-Reparatur stellt ein komplexes System aus verschiedenen Proteinen und DNA dar. Der Ausfall eines Bestandteils dieses Systems kann für den Reparaturprozess verheerende Folgen haben. Es wird deutlich wie wichtig das Verständnis der Dynamik dieser Systeme ist, um Analysen und Prognosen für den Zustand dieser Systeme zu erstellen. In den beiden genannten Beispielen kann es helfen die Entstehung von Krebs besser vorherzusagen, bzw. bedrohte Tier- und Pflanzenarten zu schützen. Anhand von Netzwerken, die die Interaktion von Proteinen, DNA und RNA darstellen, ist das Ziel dieser Arbeit, den messbaren Informationsfluss zwischen verschiedenen beteiligten Elementen zu erkennen und mit dessen Hilfe die Struktur des Netzwerks zu rekonstruieren. Zu diesem Zweck werden die Zeitreihen der einzelnen Knoten mittels verschiedener statistischer und informationstheoretischer Maße miteinander in Beziehung gesetzt. Bei der Auswahl der verschiedenen Maße greife ich sowohl auf klassische statistische Maße (z. B. Korrelationskoeffizienten), als auch auf informationstheoretische (auf Shannon-Entropie basierende) Methoden zurück, die in den letzten Jahren im Bereich der Biologie populärer gewordenen sind. Der Vergleich dieser Methoden findet durch mehrere Beispielsysteme statt, die ich in drei verschiedene Kategorien eingeteilt habe. Allen Beispielen gemein ist die zeitliche Simulation, um ein dynamisches, veränderliches System abzubilden. Mit Hilfe der Messung des Zusammenhangs der einzelnen Knoten über die Zeit, soll im Umkehrschluss auf die Topologie des zugrunde liegenden Netzwerks zurück geschlossen werden. In die erste Kategorie fällt ein einfaches Differentialgleichungssystem, welches zwei Feedback-Schleifen miteinander koppelt. Die Parametrisierung des Netzwerks sorgt für eine stabile Schwingung der beiden Schleifen um ihren jeweiligen Mittelwert. Als nächste Kategorie werden zwei verschiedene Typen von Zufallsgraphen erzeugt. Der erste wird durch einem von mir entworfenen Algorithmus erstellt, der eine bestimmte Menge an Knoten erzeugt, die mit einer bestimmten Anzahl von Eingangskanten und Ausgangskanten verbunden sind. Der zweite Typus ist ein sogenanntes skalenfreies Netz. Diese Netzwerktopologie kann in vielen Systemen wieder gefunden werden. Dazu gehören sowohl biologische als auch auch digitale soziale Netzwerke. In der letzten Kategorie wende ich die genannten Methoden auf verschiedene Beispiele aus der BioModels Database an. Diese Datenbank bietet sich aufgrund der umfangreichen Datensätze an und enthält viele biochemische Netzwerke, z. B. Protein-ProteinInteraktion, Protein-RNA-Interaktion usw. Abschließend diskutiere ich die vorgelegten Ergebnisse und gebe einen Ausblick auf die Möglichkeiten diese Ansätze weiter zu verfolgen und auszubauen. Des Weiteren wurden im Zuge dieser Arbeit verschiedene Software Tools von mir entwickelt, bzw. studentische Arbeiten zur Entwicklung betreut, die für die Durchführung der hier gezeigten Analysen wichtig waren. Diese werden in einem getrennten Abschnitt besprochen

    The gputools package enables GPU computing in R

    No full text

    Germline Mutation Detection in Next Generation Sequencing Data and TP53 Mutation Carrier Probability Estimation for Li-Fraumeni Syndrome

    Get PDF
    Next generation sequencing technology has been widely used in genomic analysis, but its application has been compromised by the missing true variants, especially when these variants are rare. We proposed a family-based variant calling method, FamSeq, integrating Mendelian transmission information with de novo mutation and sequencing data to improve the variant calling accuracy. We investigated the factors impacting the improvement of family-based variant calling in simulation data and validated it in real sequencing data. In both simulation and real data, FamSeq works better than the single individual based method. In FamSeq, we implemented four different methods for the Mendelian genetic model to accommodate variations in data complexity. We parallelized the Bayesian network algorithm on an NVIDIA graphics processing unit to make the algorithm 10 times faster for relatively large families. Our simulation shows that Elston-Stewart algorithm performs the best when there is no loop in the pedigree. If there are loops, we recommend the Bayesian network method, which provides exact answers. The next generation sequencing technology has been developed over ten years. Many different sequencing platforms have been created to generate the sequencing data. Although all these platforms have their own strengths and weaknesses, people usually focus on one latest platform. Here we propose a method based on Bayesian hierarchical model to combine the sequencing data from multiple platforms. Our method was applied to both the simulation and real data. The result showed that our method reduced the variant calling error rate comparing to single platform method. Besides the application of Mendelian transmission in sequencing data analysis, we also use it to estimate the TP53 mutation carrier probability for Li-Fraumeni syndrome (LFS). LFS is an autosomal dominant hereditary disorder. People with LFS have high risk of developing early onset cancers. We proposed LFSpro that is built on a Mendelian model and estimates TP53 mutation probability, incorporating de novo mutation rates. With independent validation data from 765 families, we compared estimations using LFSpro versus classic LFS and Chompret clinical criteria. LFSpro outperformed Chompret and classic criteria in the pediatric sarcoma cohort and was comparable to Chompret criteria in the adult sarcoma cohort
    corecore