Search CORE

10 research outputs found

ApproxTrain: Fast Simulation of Approximate Multipliers for DNN Training and Inference

Author: Gamaarachchi Hasindu
Gong Jing
Hu Xiaobo Sharon
Javaid Haris
Parameswaran Sri
Saadat Hassaan
Publication venue
Publication date: 23/09/2022
Field of study

Edge training of Deep Neural Networks (DNNs) is a desirable goal for continuous learning; however, it is hindered by the enormous computational power required by training. Hardware approximate multipliers have shown their effectiveness for gaining resource-efficiency in DNN inference accelerators; however, training with approximate multipliers is largely unexplored. To build resource efficient accelerators with approximate multipliers supporting DNN training, a thorough evaluation of training convergence and accuracy for different DNN architectures and different approximate multipliers is needed. This paper presents ApproxTrain, an open-source framework that allows fast evaluation of DNN training and inference using simulated approximate multipliers. ApproxTrain is as user-friendly as TensorFlow (TF) and requires only a high-level description of a DNN architecture along with C/C++ functional models of the approximate multiplier. We improve the speed of the simulation at the multiplier level by using a novel LUT-based approximate floating-point (FP) multiplier simulator on GPU (AMSim). ApproxTrain leverages CUDA and efficiently integrates AMSim into the TensorFlow library, in order to overcome the absence of native hardware approximate multiplier in commercial GPUs. We use ApproxTrain to evaluate the convergence and accuracy of DNN training with approximate multipliers for small and large datasets (including ImageNet) using LeNets and ResNets architectures. The evaluations demonstrate similar convergence behavior and negligible change in test accuracy compared to FP32 and bfloat16 multipliers. Compared to CPU-based approximate multiplier simulations in training and inference, the GPU-accelerated ApproxTrain is more than 2500x faster. Based on highly optimized closed-source cuDNN/cuBLAS libraries with native hardware multipliers, the original TensorFlow is only 8x faster than ApproxTrain.Comment: 14 pages, 12 figure

arXiv.org e-Print Archive

Comprehensive genetic diagnosis of tandem repeat expansion disorders with programmable targeted nanopore sequencing

Author: Chintalaphani Sanjog R
Cortese Andrea
Davis Mark R
Deveson Ira W
Dobson-Stone Carol
Fellner Avi
Ferguson James M
Fitzpatrick Lauren
Fung Victor
Gamaarachchi Hasindu
Halliday Glenda
Houlden Henry
Kennerson Marina
Kumar Kishore R
Laing Nigel G
Ng Karl
Pineda Sandy S
Ravenscroft Gianina
Scriba Carolin K
Stevanovski Igor
Tchan Michel
Publication venue: 'American Association for the Advancement of Science (AAAS)'
Publication date: 04/03/2022
Field of study

More than 50 neurological and neuromuscular diseases are caused by short tandem repeat (STR) expansions, with 37 different genes implicated to date. We describe the use of programmable targeted long-read sequencing with Oxford Nanopore's ReadUntil function for parallel genotyping of all known neuropathogenic STRs in a single assay. Our approach enables accurate, haplotype-resolved assembly and DNA methylation profiling of STR sites, from a list of predetermined candidates. This correctly diagnoses all individuals in a small cohort (n = 37) including patients with various neurogenetic diseases (n = 25). Targeted long-read sequencing solves large and complex STR expansions that confound established molecular tests and short-read sequencing and identifies noncanonical STR motif conformations and internal sequence interruptions. We observe a diversity of STR alleles of known and unknown pathogenicity, suggesting that long-read sequencing will redefine the genetic landscape of repeat disorders. Last, we show how the inclusion of pharmacogenomic genes as secondary ReadUntil targets can further inform patient care

UCL Discovery

PubMed Central

Cache Friendly Optimisation of de Bruijn Graph based Local Re-assembly in Variant Calling

Author: Arash Bayat
Bruno Gaeta
Hasindu Gamaarachchi
Sri Parameswaran
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

Crossref

Computer Architecture-Aware Optimisation of DNA Analysis Systems

Author: Gamaarachchi Hasindu, Computer Science & Engineering, Faculty of Engineering, UNSW
Publication venue: University of New South Wales. Computer Science & Engineering
Publication date: 01/01/2020
Field of study

DNA sequencing---the process that converts chemically encoded data in DNA molecules into a computer-readable form---is revolutionising the field of medicine. DNA sequencers, the machines which perform DNA sequencing, have evolved from the size of a fridge to that of a mobile phone over the last two decades. The cost of sequencing a human genome also has reduced from billions of dollars to hundreds of dollars. Despite these improvements, DNA sequencers output hundreds or thousands of gigabytes of data that must be analysed on computers to discover meaningful information with biological implications. Unfortunately, the analysis techniques have not kept the pace with rapidly improving sequencing technologies. Consequently, even today, the process of DNA analysis is performed on high-performance computers, just as it was a couple of decades ago. Such high-performance computers are not portable. Consequently, the full utility of an ultra-portable sequencer for sequencing in-the-field or at the point-of-care is limited by the lack of portable lightweight analytic techniques.This thesis proposes computer architecture-aware optimisation of DNA analysis software. DNA analysis software is inevitably convoluted due to the complexity associated with biological data. Modern computer architectures are also complex. Performing architecture-aware optimisations requires the synergistic use of knowledge from both domains, (i.e, DNA sequence analysis and computer architecture). This thesis aims to draw the two domains together. In this thesis, gold-standard DNA sequence analysis workflows (a workflow is a few software tools executed sequentially where each software tool is a complex system of dozens of algorithms) are systematically examined for algorithmic components that cause performance bottlenecks. Identified bottlenecks are resolved through architecture-aware optimisations at different levels, i.e., memory, cache, register and processor. The optimised software tools are used in complete end-to-end analysis workflows and their efficacy is demonstrated by running on prototypical embedded systems. The embedded systems are not only fully functional, but the performance is also comparable to an unoptimised workflow on a high-performance computer. Such low cost, energy-efficient, sufficiently fast and portable embedded systems enable complete DNA analysis at the point-of-care or in-the-field

arXiv.org e-Print Archive

UNSWorks

hasindu2008/minimap2-arm: long read alignment using partitioned reference indexes

Author: Allison Penner Regier
Carlos de Lannoy
cjw85
Hasindu Gamaarachchi
Hasindu Gamaarachchi
Heng Li
Ilya Kolpakov
Jiading Guo
Marius van den Beek
martinghunt
Riku Walve
Shane McCarthy
Simon Harris
Stefan W. von Deylen
Publication venue
Publication date
Field of study

An extended version of minimap2 with better support for partitioned reference indexes. Used for citation in the paper "long read alignment using partitioned reference indexes". command line option --multi-prefix idxtools for efficient partition index construction readme for idxtools evaluation scripts and program

ZENODO

GPU accelerated adaptive banded event alignment for rapid comparative nanopore signal analysis

Author: Gamaarachchi Hasindu
Jayatilaka Gihan
Lam Chun W
Parameswaran Sri
Samarakoon Hiruna
Simpson Jared T
Smith Martin A
Publication venue: University of Toronto
Publication date: 09/08/2020
Field of study

Abstract Background Nanopore sequencing enables portable, real-time sequencing applications, including point-of-care diagnostics and in-the-field genotyping. Achieving these outcomes requires efficient bioinformatic algorithms for the analysis of raw nanopore signal data. However, comparing raw nanopore signals to a biological reference sequence is a computationally complex task. The dynamic programming algorithm called Adaptive Banded Event Alignment (ABEA) is a crucial step in polishing sequencing data and identifying non-standard nucleotides, such as measuring DNA methylation. Here, we parallelise and optimise an implementation of the ABEA algorithm (termed f5c) to efficiently run on heterogeneous CPU-GPU architectures. Results By optimising memory, computations and load balancing between CPU and GPU, we demonstrate how f5c can perform ∼3-5 × faster than an optimised version of the original CPU-only implementation of ABEA in the Nanopolish software package. We also show that f5c enables DNA methylation detection on-the-fly using an embedded System on Chip (SoC) equipped with GPUs. Conclusions Our work not only demonstrates that complex genomics analyses can be performed on lightweight computing systems, but also benefits High-Performance Computing (HPC). The associated source code for f5c along with GPU optimised ABEA is available at https://github.com/hasindu2008/f5c

University of Toronto Research Repository

Extensive DNA methylome rearrangement during early lamprey embryogenesis

Author: Allegra Angeloni
Deniz Kaya
Hasindu Gamaarachchi
Ira W. Deveson
Jillian M. Hammond
Ozren Bogdanovic
Robert J. Klose
Skye Fissette
Weiming Li
Xiaotian Zhang
Publication venue: Nature Portfolio
Publication date: 01/03/2024
Field of study

Abstract DNA methylation (5mC) is a repressive gene regulatory mark widespread in vertebrate genomes, yet the developmental dynamics in which 5mC patterns are established vary across species. While mammals undergo two rounds of global 5mC erasure, teleosts, for example, exhibit localized maternal-to-paternal 5mC remodeling. Here, we studied 5mC dynamics during the embryonic development of sea lamprey, a jawless vertebrate which occupies a critical phylogenetic position as the sister group of the jawed vertebrates. We employed 5mC quantification in lamprey embryos and tissues, and discovered large-scale maternal-to-paternal epigenome remodeling that affects ~30% of the embryonic genome and is predominantly associated with partially methylated domains. We further demonstrate that sequences eliminated during programmed genome rearrangement (PGR), are hypermethylated in sperm prior to the onset of PGR. Our study thus unveils important insights into the evolutionary origins of vertebrate 5mC reprogramming, and how this process might participate in diverse developmental strategies

Directory of Open Access Journals

Oxford University Research Archive

GPU accelerated adaptive banded event alignment for rapid comparative nanopore signal analysis

Author: A Bird
Chun Wai Lam
FJ Rang
Gihan Jayatilaka
H Gamaarachchi
H Li
H Lu
H Suzuki
Hasindu Gamaarachchi
Hiruna Samarakoon
Jared T. Simpson
JT Simpson
K-M Chao
M Jain
Martin A. Smith
NJ Loman
RR Wick
SA Manavski
Sri Parameswaran
Y Liu
Y Liu
Z Feng
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref