95 research outputs found
QuASeR -- Quantum Accelerated De Novo DNA Sequence Reconstruction
In this article, we present QuASeR, a reference-free DNA sequence
reconstruction implementation via de novo assembly on both gate-based and
quantum annealing platforms. Each one of the four steps of the implementation
(TSP, QUBO, Hamiltonians and QAOA) is explained with simple proof-of-concept
examples to target both the genomics research community and quantum application
developers in a self-contained manner. The details of the implementation are
discussed for the various layers of the quantum full-stack accelerator design.
We also highlight the limitations of current classical simulation and available
quantum hardware systems. The implementation is open-source and can be found on
https://github.com/prince-ph0en1x/QuASeR.Comment: 24 page
DOPA: GPU-based protein alignment using database and memory access optimizations
Background Smith-Waterman (S-W) algorithm is an optimal sequence alignment method for biological databases, but its computational complexity makes it too slow for practical purposes. Heuristics based approximate methods like FASTA and BLAST provide faster solutions but at the cost of reduced accuracy. Also, the expanding volume and varying lengths of sequences necessitate performance efficient restructuring of these databases. Thus to come up with an accurate and fast solution, it is highly desired to speed up the S-W algorithm. Findings This paper presents a high performance protein sequence alignment implementation for Graphics Processing Units (GPUs). The new implementation improves performance by optimizing the database organization and reducing the number of memory accesses to eliminate bandwidth bottlenecks. The implementation is called Database Optimized Protein Alignment (DOPA) and it achieves a performance of 21.4 Giga Cell Updates Per Second (GCUPS), which is 1.13 times better than the fastest GPU implementation to date. Conclusions In the new GPU-based implementation for protein sequence alignment (DOPA), the database is organized in equal length sequence sets. This equally distributes the workload among all the threads on the GPU's multiprocessors. The result is an improved performance which is better than the fastest available GPU implementation.MicroelectronicsElectrical Engineering, Mathematics and Computer Scienc
Benchmarking Apache Arrow Flight -- A wire-speed protocol for data transfer, querying and microservices
Moving structured data between different big data frameworks and/or data
warehouses/storage systems often cause significant overhead. Most of the time
more than 80\% of the total time spent in accessing data is elapsed in
serialization/de-serialization step. Columnar data formats are gaining
popularity in both analytics and transactional databases. Apache Arrow, a
unified columnar in-memory data format promises to provide efficient data
storage, access, manipulation and transport. In addition, with the introduction
of the Arrow Flight communication capabilities, which is built on top of gRPC,
Arrow enables high performance data transfer over TCP networks. Arrow Flight
allows parallel Arrow RecordBatch transfer over networks in a platform and
language-independent way, and offers high performance, parallelism and security
based on open-source standards.
In this paper, we bring together some recently implemented use cases of Arrow
Flight with their benchmarking results. These use cases include bulk Arrow data
transfer, querying subsystems and Flight as a microservice integration into
different frameworks to show the throughput and scalability results of this
protocol. We show that Flight is able to achieve up to 6000 MB/s and 4800 MB/s
throughput for DoGet() and DoPut() operations respectively. On Mellanox
ConnectX-3 or Connect-IB interconnect nodes Flight can utilize upto 95\% of the
total available bandwidth. Flight is scalable and can use upto half of the
available system cores efficiently for a bidirectional communication. For query
systems like Dremio, Flight is order of magnitude faster than ODBC and turbodbc
protocols. Arrow Flight based implementation on Dremio performs 20x and 30x
better as compared to turbodbc and ODBC connections respectively
DFL: High-Performance Blockchain-Based Federated Learning
Many researchers are trying to replace the aggregation server in federated
learning with a blockchain system to achieve better privacy, robustness and
scalability. In this case, clients will upload their updated models to the
blockchain ledger, and use a smart contract on the blockchain system to perform
model averaging. However, running machine learning applications on the
blockchain is almost impossible because a blockchain system, which usually
takes over half minute to generate a block, is extremely slow and unable to
support machine learning applications.
This paper proposes a completely new public blockchain architecture called
DFL, which is specially optimized for distributed federated machine learning.
This architecture inherits most traditional blockchain merits and achieves
extremely high performance with low resource consumption by waiving global
consensus. To characterize the performance and robustness of our architecture,
we implement the architecture as a prototype and test it on a physical
four-node network. To test more nodes and more complex situations, we build a
simulator to simulate the network. The LeNet results indicate our system can
reach over 90% accuracy for non-I.I.D. datasets even while facing model
poisoning attacks, with the blockchain consuming less than 5% of hardware
resources.Comment: 11 pages, 17 figure
- …