Search CORE

275,600 research outputs found

Recommended from our members

Accelerated Iterative Algorithms with Asynchronous Accumulative Updates on a Heterogeneous Cluster

Author: Gubbi Virupaksha Sandesh
Publication venue: ScholarWorks@UMass Amherst
Publication date: 23/03/2016
Field of study

In recent years with the exponential growth in web-based applications the amount of data generated has increased tremendously. Quick and accurate analysis of this \u27big data\u27 is indispensable to make better business decisions and reduce operational cost. The challenges faced by modern day data centers to process big data are multi fold: to keep up the pace of processing with increased data volume and increased data velocity, deal with system scalability and reduce energy costs. Today\u27s data centers employ a variety of distributed computing frameworks running on a cluster of commodity hardware which include general purpose processors to process big data. Though better performance in terms of big data processing speed has been achieved with existing distributed computing frameworks, there is still an opportunity to increase processing speed further. FPGAs, which are designed for computationally intensive tasks, are promising processing elements that can increase processing speed. In this thesis, we discuss how FPGAs can be integrated into a cluster of general purpose processors running iterative algorithms and obtain high performance. In this thesis, we designed a heterogeneous cluster comprised of FPGAs and CPUs and ran various benchmarks such as PageRank, Katz and Connected Components to measure the performance of the cluster. Performance improvement in terms of execution time was evaluated against a homogeneous cluster of general purpose processors and a homogeneous cluster of FPGAs. We built multiple four-node heterogeneous clusters with different configurations by varying the number of CPUs and FPGAs. We studied the effects of load balancing between CPUs and FPGAs. We obtained a speedup of 20X, 11.5X and 2X for PageRank, Katz and Connected Components benchmarks on a cluster cluster configuration of 2 CPU + 2 FPGA for an unbalancing ratio against a 4-node homogeneous CPU cluster. We studied the effect of input graph partitioning, and showed that when the input is a Multilevel-KL partitioned graph we obtain an improvement of 11%, 26% and 9% over randomly partitioned graph for Katz, PageRank and Connected Components benchmarks on a 2 CPU + 2 FPGA cluster

ScholarWorks@UMass Amherst

GPU-Accelerated BWT Construction for Large Collection of Short Reads

Author: Lam Tak-Wah
Liu Chi-Man
Luo Ruibang
Publication venue
Publication date: 29/01/2014
Field of study

Advances in DNA sequencing technology have stimulated the development of algorithms and tools for processing very large collections of short strings (reads). Short-read alignment and assembly are among the most well-studied problems. Many state-of-the-art aligners, at their core, have used the Burrows-Wheeler transform (BWT) as a main-memory index of a reference genome (typical example, NCBI human genome). Recently, BWT has also found its use in string-graph assembly, for indexing the reads (i.e., raw data from DNA sequencers). In a typical data set, the volume of reads is tens of times of the sequenced genome and can be up to 100 Gigabases. Note that a reference genome is relatively stable and computing the index is not a frequent task. For reads, the index has to computed from scratch for each given input. The ability of efficient BWT construction becomes a much bigger concern than before. In this paper, we present a practical method called CX1 for constructing the BWT of very large string collections. CX1 is the first tool that can take advantage of the parallelism given by a graphics processing unit (GPU, a relative cheap device providing a thousand or more primitive cores), as well as simultaneously the parallelism from a multi-core CPU and more interestingly, from a cluster of GPU-enabled nodes. Using CX1, the BWT of a short-read collection of up to 100 Gigabases can be constructed in less than 2 hours using a machine equipped with a quad-core CPU and a GPU, or in about 43 minutes using a cluster with 4 such machines (the speedup is almost linear after excluding the first 16 minutes for loading the reads from the hard disk). The previously fastest tool BRC is measured to take 12 hours to process 100 Gigabases on one machine; it is non-trivial how BRC can be parallelized to take advantage a cluster of machines, let alone GPUs.Comment: 11 page

arXiv.org e-Print Archive

CiteSeerX

The volume and Chern-Simons invariant of a Dehn-filled manifold

Author: 윤석범
Publication venue: 서울대학교 대학원
Publication date: 01/02/2019
Field of study

학위논문 (박사)-- 서울대학교 대학원 : 자연과학대학 수리과학부, 2019. 2. 박종일.Based on the work of Neumann, Zickert gave a simplicial formula for computing the volume and Chern-Simons invariant of a boundary-parabolic \psl-representation of a compact 3-manifold with non-empty boundary. Main aim of this thesis is to introduce a notion of deformed Ptolemy assignments (or varieties) and generalize the formula of Zickert to a representation of a Dehn-filled manifold. We also generalize the potential function of Cho and Murakami by applying our formula to an octahedral decomposition of a link complement in the 3-sphere. Also, motivated from the work of Hikami and Inoue, we clarify the relation between Ptolemy assignments and cluster variables when a link is given in a braid position. The last work is a joint work with Jinseok Cho and Christian Zickert.1 Introduction 1 1.1 Deformed Ptolemy assignments . . . . . . . . . . . . . . . . . . . 1 1.1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2 Potential functions . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3 Cluster variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2 Preliminaries 12 2.1 Cocycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2 Obstruction classes . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3 Ptolemy varieties 16 3.1 Formulas of Neumann . . . . . . . . . . . . . . . . . . . . . . . . 16 3.2 Deformed Ptolemy varieties . . . . . . . . . . . . . . . . . . . . . 19 3.2.1 Isomorphisms . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.2.2 Pseudo-developing maps . . . . . . . . . . . . . . . . . . . 27 3.3 Flattenings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.3.1 Main theorem . . . . . . . . . . . . . . . . . . . . . . . . . 36 4 Potential functions 43 4.1 Generalized potential functions . . . . . . . . . . . . . . . . . . . 43 4.1.1 Proof of Theorem 4.1.1 . . . . . . . . . . . . . . . . . . . 45 4.2 Relation with a Ptolemy assignment . . . . . . . . . . . . . . . . 50 4.2.1 Proof of Theorem 4.2.1 . . . . . . . . . . . . . . . . . . . 54 4.3 Complex volume formula . . . . . . . . . . . . . . . . . . . . . . . 57 4.3.1 Proof of Theorem 4.3.1 . . . . . . . . . . . . . . . . . . . 59 5 Cluster variables 70 5.1 The Hikami-Inoue cluster variables . . . . . . . . . . . . . . . . . 70 5.1.1 The octahedral decomposition . . . . . . . . . . . . . . . 70 5.1.2 The Hikami-Inoue cluster variables . . . . . . . . . . . . . 71 5.1.3 The obstruction cocycle . . . . . . . . . . . . . . . . . . . 74 5.1.4 Proof of Theorem 1.3.2 . . . . . . . . . . . . . . . . . . . 75 5.2 The existence of a non-degenerate solution . . . . . . . . . . . . . 79 5.2.1 Proof of Proposition 5.2.1 . . . . . . . . . . . . . . . . . . 81 5.2.2 Explicit computation from a representation . . . . . . . . 83Docto

SNU Open Repository and Archive

Realfast: Real-Time, Commensal Fast Transient Surveys with the Very Large Array

Author: Bower G. C.
Burke-Spolaor S.
Butler B. J.
Demorest P.
Halle A.
Khudikyan S.
Law C. J.
Lazio T. J. W.
Pokorny M.
Robnett J.
Rupen M.
Publication venue: 'American Astronomical Society'
Publication date: 08/02/2018
Field of study

Radio interferometers have the ability to precisely localize and better characterize the properties of sources. This ability is having a powerful impact on the study of fast radio transients, where a few milliseconds of data is enough to pinpoint a source at cosmological distances. However, recording interferometric data at millisecond cadence produces a terabyte-per-hour data stream that strains networks, computing systems, and archives. This challenge mirrors that of other domains of science, where the science scope is limited by the computational architecture as much as the physical processes at play. Here, we present a solution to this problem in the context of radio transients: realfast, a commensal, fast transient search system at the Jansky Very Large Array. Realfast uses a novel architecture to distribute fast-sampled interferometric data to a 32-node, 64-GPU cluster for real-time imaging and transient detection. By detecting transients in situ, we can trigger the recording of data for those rare, brief instants when the event occurs and reduce the recorded data volume by a factor of 1000. This makes it possible to commensally search a data stream that would otherwise be impossible to record. This system will search for millisecond transients in more than 1000 hours of data per year, potentially localizing several Fast Radio Bursts, pulsars, and other sources of impulsive radio emission. We describe the science scope for realfast, the system design, expected outcomes, and ways real-time analysis can help in other fields of astrophysics.Comment: Accepted to ApJS Special Issue on Data; 11 pages, 4 figure

arXiv.org e-Print Archive

NRC Publications Archive

GANDALF - Graphical Astrophysics code for N-body Dynamics And Lagrangian Fluids

Author: Aarseth
Aarseth
Artymowicz
Barnes
Bate
Batten
Binney
Booth
Boss
Cullen
D. A. Hubber
de
de
Deng
Dipierro
Flebbe
G. P. Rosotti
Gaburov
Gafton
Gamma
Gingold
Grassi
Gresho
Hernquist
Heß
Hopkins
Hopkins
Hopkins
Hubber
Hubber
Hubber
Hubber
Hubber
Hut
Inutsuka
Kley
Laibe
Laibe
Lanson
Lorén-Aguilar
Lovelace
Lucy
Makino
Mignone
Monaghan
Monaghan
Monaghan
Morris
Morris
Murray
Muñoz
Portegies Zwart
Price
Price
Price
Price
Price
Price
R. A. Booth
Rosswog
Saitoh
Saitoh
Sijacki
Springel
Springel
Springel
Stone
Toro
Toro
van Leer
Wadsley
Wadsley
Wetzstein
Whitworth
Wünsch
Publication venue: Monthly Notices of the Royal Astronomical Society
Publication date: 13/09/2017
Field of study

GANDALF is a new hydrodynamics and N-body dynamics code designed for investigating planet formation, star formation and star cluster problems. GANDALF is written in C++, parallelised with both OpenMP and MPI and contains a python library for analysis and visualisation. The code has been written with a fully object-oriented approach to easily allow user-defined implementations of physics modules or other algorithms. The code currently contains implementations of Smoothed Particle Hydrodynamics, Meshless Finite-Volume and collisional N-body schemes, but can easily be adapted to include additional particle schemes. We present in this paper the details of its implementation, results from the test suite, serial and parallel performance results and discuss the planned future development. The code is freely available as an open source project on the code-hosting website github at https://github.com/gandalfcode/gandalf and is available under the GPLv2 license.This research was supported by the DFG cluster of excellence "Origin and Structure of the Universe", DFG Projects 841797-4, 841798-2 (DAH, GPR), the DISCSIM project, grant agreement 341137 funded by the European Research Council under ERC-2013-ADG (GPR, RAB). Some development of the code and simulations have been carried out on the computing facilities of the Computational centre for Particle and Astrophysics (C2PAP) and on the DiRAC Data Analytic system at the University of Cambridge, operated by the University of Cambridge High Performance Computing Service on behalf of the STFC DiRAC HPC Facility (www.dirac.ac.uk); the equipment was funded by BIS National E-infrastructure capital grant (ST/K001590/1), STFC capital grants ST/H008861/1 and ST/H00887X/1, and STFC DiRAC Operations grant ST/K00333X/1

arXiv.org e-Print Archive

Crossref

Apollo (Cambridge)