Search CORE

217 research outputs found

The Parallelism Motifs of Genomic Data Analysis

Author: Awan Muaaz
Azad Ariful
Brock Benjamin
Buluc Aydin
Egan Rob
Ekanayake Saliya
Ellis Marquita
Georganas Evangelos
Guidi Giulia
Hofmeyr Steven
Oliker Leonid
Selvitopi Oguz
Teodoropol Cristina
Yelick Katherine
Publication venue: 'The Royal Society'
Publication date: 20/01/2020
Field of study

Genomic data sets are growing dramatically as the cost of sequencing continues to decline and small sequencing devices become available. Enormous community databases store and share this data with the research community, but some of these genomic data analysis problems require large scale computational platforms to meet both the memory and computational requirements. These applications differ from scientific simulations that dominate the workload on high end parallel systems today and place different requirements on programming support, software libraries, and parallel architectural design. For example, they involve irregular communication patterns such as asynchronous updates to shared data structures. We consider several problems in high performance genomics analysis, including alignment, profiling, clustering, and assembly for both single genomes and metagenomes. We identify some of the common computational patterns or motifs that help inform parallelization strategies and compare our motifs to some of the established lists, arguing that at least two key patterns, sorting and hashing, are missing

arXiv.org e-Print Archive

eScholarship - University of California

Communication-Efficient Jaccard Similarity for High-Performance Distributed Genome Comparisons

Author: Besta Maciej
Hoefler Torsten
Kanakagiri Raghavendra
Karasikov Mikhail
Mustafa Harun
Rätsch Gunnar
Solomonik Edgar
Publication venue
Publication date: 11/11/2020
Field of study

The Jaccard similarity index is an important measure of the overlap of two sets, widely used in machine learning, computational genomics, information retrieval, and many other areas. We design and implement SimilarityAtScale, the first communication-efficient distributed algorithm for computing the Jaccard similarity among pairs of large datasets. Our algorithm provides an efficient encoding of this problem into a multiplication of sparse matrices. Both the encoding and sparse matrix product are performed in a way that minimizes data movement in terms of communication and synchronization costs. We apply our algorithm to obtain similarity among all pairs of a set of large samples of genomes. This task is a key part of modern metagenomics analysis and an evergrowing need due to the increasing availability of high-throughput DNA sequencing data. The resulting scheme is the first to enable accurate Jaccard distance derivations for massive datasets, using largescale distributed-memory systems. We package our routines in a tool, called GenomeAtScale, that combines the proposed algorithm with tools for processing input sequences. Our evaluation on real data illustrates that one can use GenomeAtScale to effectively employ tens of thousands of processors to reach new frontiers in large-scale genomic and metagenomic analysis. While GenomeAtScale can be used to foster DNA research, the more general underlying SimilarityAtScale algorithm may be used for high-performance distributed similarity computations in other data analytics application domains

arXiv.org e-Print Archive

Crossref

Low-Impact Profiling of Streaming, Heterogeneous Applications

Author: Lancaster Joseph
Publication venue: Washington University Open Scholarship
Publication date: 01/01/2011
Field of study

Computer engineers are continually faced with the task of translating improvements in fabrication process technology: i.e., Moore\u27s Law) into architectures that allow computer scientists to accelerate application performance. As feature-size continues to shrink, architects of commodity processors are designing increasingly more cores on a chip. While additional cores can operate independently with some tasks: e.g. the OS and user tasks), many applications see little to no improvement from adding more processor cores alone. For many applications, heterogeneous systems offer a path toward higher performance. Significant performance and power gains have been realized by combining specialized processors: e.g., Field-Programmable Gate Arrays, Graphics Processing Units) with general purpose multi-core processors. Heterogeneous applications need to be programmed differently than traditional software. One approach, stream processing, fits these systems particularly well because of the segmented memories and explicit expression of parallelism. Unfortunately, debugging and performance tools that support streaming, heterogeneous applications do not exist. This dissertation presents TimeTrial, a performance measurement system that enables performance optimization of streaming applications by profiling the application deployed on a heterogeneous system. TimeTrial performs low-impact measurements by dedicating computing resources to monitoring and by aggressively compressing performance traces into statistical summaries guided by user specification of the performance queries of interest

Washington University St. Louis: Open Scholarship

Fast sampling from Wiener posteriors for image data with dataflow engines

Author: Fortio PD
Heavens AF
Jeffrey N
Publication venue: 'Elsevier BV'
Publication date: 01/10/2018
Field of study

We use Dataflow Engines (DFE) to construct an efficient Wiener filter of noisy and incomplete image data, and to quickly draw probabilistic samples of the compatible true underlying images from the Wiener posterior. Dataflow computing is a powerful approach using reconfigurable hardware, which can be deeply pipelined and is intrinsically parallel. The unique Wiener-filtered image is the minimum-variance linear estimate of the true image (if the signal and noise covariances are known) and the most probable true image (if the signal and noise are Gaussian distributed). However, many images are compatible with the data with different probabilities, given by the analytic posterior probability distribution referred to as the Wiener posterior. The DFE code also draws large numbers of samples of true images from this posterior, which allows for further statistical analysis. Naive computation of the Wiener-filtered image is impractical for large datasets, as it scales as [Formula presented], where [Formula presented] is the number of pixels. We use a messenger field algorithm, which is well suited to a DFE implementation, to draw samples from the Wiener posterior, that is, with the correct probability we draw samples of noiseless images that are compatible with the observed noisy image. The Wiener-filtered image can be obtained by a trivial modification of the algorithm. We demonstrate a lower bound on the speed-up, from drawing [Formula presented] samples of a [Formula presented] image, of 11.3 ± 0.8 with 8 DFEs in a 1U MPC-X box when compared with a 1U server presenting 32 CPU threads. We also discuss a potential application in astronomy, to provide better dark matter maps and improved determination of the parameters of the Universe

arXiv.org e-Print Archive

UCL Discovery

Spiral - Imperial College Digital Repository

Parameterized Implementation of K-means Clustering on Reconfigurable Systems

Author: Bhaskaran Venkatesh
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/05/2004
Field of study

Processing power of pattern classification algorithms on conventional platforms has not been able to keep up with exponentially growing datasets. However, algorithms such as k-means clustering include significant potential parallelism that could be exploited to enhance processing speed on conventional platforms. A better and effective solution to speed-up the algorithm performance is the use of a hardware assist since parallel kernels can be partitioned and concurrently run on hardware as opposed to the sequential software flow. A parameterized hardware implementation of k-means clustering is presented as a proof of concept on the Pilchard Reconfigurable computing system. The hardware implementation is shown to have speedups of about 500 over conventional implementations on a general-purpose processor. A scalability analysis is done to provide a future direction to take the current implementation of 3 classes and scale it to over N classes

University of Tennessee, Knoxville: Trace

Reconfiguration of field programmable logic in embedded systems

Author: Kennedy Irwin O.
Publication venue: The University of Edinburgh
Publication date: 01/01/2005
Field of study

Edinburgh Research Archive

Advanced photonic and electronic systems WILGA 2018

Author: Romaniuk Ryszard S.
Publication venue: Electronics and Telecommunications Committee
Publication date: 01/01/2017
Field of study

WILGA annual symposium on advanced photonic and electronic systems has been organized by young scientist for young scientists since two decades. It traditionally gathers around 400 young researchers and their tutors. Ph.D students and graduates present their recent achievements during well attended oral sessions. Wilga is a very good digest of Ph.D. works carried out at technical universities in electronics and photonics, as well as information sciences throughout Poland and some neighboring countries. Publishing patronage over Wilga keep Elektronika technical journal by SEP, IJET and Proceedings of SPIE. The latter world editorial series publishes annually more than 200 papers from Wilga. Wilga 2018 was the XLII edition of this meeting. The following topical tracks were distinguished: photonics, electronics, information technologies and system research. The article is a digest of some chosen works presented during Wilga 2018 symposium. WILGA 2017 works were published in Proc. SPIE vol.10445. WILGA 2018 works were published in Proc. SPIE vol.10808

Biblioteka Nauki - repozytorium artykuÅÃ³w

International Journal of Electronics and Telecommunications (Warsaw University of Technology)

Accelerating Sequence Alignments Based on FM-Index Using the Intel KNL Processor

Author: Alastruey J.
Gonzalez Navarro S.
Herruzo J.M.
Ibañez P.
Plata O.
Viñals Yufera V.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

FM-index is a compact data structure suitable for fast matches of short reads to large reference genomes. The matching algorithm using this index exhibits irregular memory access patterns that cause frequent cache misses, resulting in a memory bound problem. This paper analyzes different FM-index versions presented in the literature, focusing on those computing aspects related to the data access. As a result of the analysis, we propose a new organization of FM-index that minimizes the demand for memory bandwidth, allowing a great improvement of performance on processors with high-bandwidth memory, such as the second-generation Intel Xeon Phi (Knights Landing, or KNL), integrating ultra high-bandwidth stacked memory technology. As the roofline model shows, our implementation reaches 95% of the peak random access bandwidth limit when executed on the KNL and almost all the available bandwidth when executed on other Intel Xeon architectures with conventional DDR memory. In addition, the obtained throughput in KNL is much higher than the results reported for GPUs in the literature. IEE

Repositorio Universidad de Zaragoza

Bioinformatic Challenges Detecting Genetic Variation in Precision Medicine Programs

Author: Field Matt A.
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2022
Field of study

Precision medicine programs to identify clinically relevant genetic variation have been revolutionized by access to increasingly affordable high-throughput sequencing technologies. A decade of continual drops in per-base sequencing costs means it is now feasible to sequence an individual patient genome and interrogate all classes of genetic variation for < $1,000 USD. However, while advances in these technologies have greatly simplified the ability to obtain patient sequence information, the timely analysis and interpretation of variant information remains a challenge for the rollout of large-scale precision medicine programs. This review will examine the challenges and potential solutions that exist in identifying predictive genetic biomarkers and pharmacogenetic variants in a patient and discuss the larger bioinformatic challenges likely to emerge in the future. It will examine how both software and hardware development are aiming to overcome issues in short read mapping, variant detection and variant interpretation. It will discuss the current state of the art for genetic disease and the remaining challenges to overcome for complex disease. Success across all types of disease will require novel statistical models and software in order to ensure precision medicine programs realize their full potential now and into the future

ResearchOnline at James Cook University