83 research outputs found

    FPGA Acceleration of Gene Rearrangement Analysis

    Get PDF
    In this paper we present our work toward FPGA acceleration of phylogenetic reconstruction, a type of analysis that is commonly performed in the fields of systematic biology and comparative genomics. In our initial study, we have targeted a specific application that reconstructs maximum-parsimony (MP) phylogenies for gene-rearrangement data. Like other prevalent applications in computational biology, this application relies on a control-dependent, memory-intensive, and non-arithmetic combinatorial optimization algorithm. To achieve hardware acceleration, we developed an FPGA core design that implements the application\u27s primary bottleneck computation. Because our core is lightweight, we are able to synthesize multiple cores on a single FPGA. By using several cores in parallel, we have achieved a 25X end-to-end application speedup using simulated input data

    High-Performance Heterogeneous Computing with the Convey HC-1

    Get PDF
    Unlike other socket-based reconfigurable coprocessors, the Convey HC-1 contains nearly 40 field-programmable gate arrays, scatter-gather memory modules, a high-capacity crossbar switch, and a fully coherent memory system

    Exploiting Matrix Symmetry to Improve FPGAAccelerated Conjugate Gradient

    Get PDF
    In this paper we describe a new approach for accelerating the Conjugate Gradient (CG) method using an FPGA co-processor. As in previous approaches, our co-processor performs a double-precision sparse matrix-vector multiplication. However, our implementation doubles the amount of computation per unit of input data by exploiting the symmetry of the input matrix and computing the upper and lower triangle of the input matrix in parallel. Using a Virtex-2 Pro 100 FPGA, we have achieved an observed computational throughput of 1155 MFLOPS

    Large-Scale Pairwise Sequence Alignments on a Large-Scale GPU Cluster

    Get PDF
    This paper presents design of a GPU kernel for performing pairwise sequence alignments for large-scale short sequence datasets generated by nextgeneration sequencers. This kernel principally performs batch Needleman– Wunsch global alignments. When used with its MPI-based host software, the kernel is scalable and is capable of achieving high throughput alignment when run on a CPU-GPU cluster

    A Heuristic Scheduler for Port-Constrained Floating-Point Pipelines

    Get PDF
    We describe a heuristic scheduling approach for optimizing floating-point pipelines subject to input port constraints. The objective of our technique is to maximize functional unit reuse while minimizing the following performance metrics in the generated circuit: (1) maximum multiplexer fanin, (2) datapath fanout, (3) number of multiplexers, and (4) number of registers. For a set of systems biology markup language (SBML) benchmark expressions, we compare the resource usages given by our method to those given by a branch-and-bound enumeration of all valid schedules. Compared with the enumeration results, our heuristic requires on average 33.4% less multiplexer bits and 32.9% less register bits than the worse case, while only requiring 14% more multiplexer bits and 4.5% more register bits than the optimal case. We also compare our results against those given by the state-of-art high-level synthesis tool Xilinx AutoESL. For the most complex of our benchmark expressions, our synthesis technique requires 20% less FPGA slices than AutoESL

    A Special-Purpose Architecture for Solving the Breakpoint Median Problem

    Get PDF
    In this paper, we describe the design for a co-processor for whole-genome phylogenetic reconstruction. Our current design performs a parallelized breakpoint median computation, which is an expensive component of the overall application. When implemented on a field-programmable gate array (FPGA), our hardware breakpoint median achieves a maximum speedup of 1005times over software. When the coprocessor is used to accelerate the entire reconstruction procedure, we achieve a maximum application speedup of 417times. The results in this paper suggest that FPGA-based acceleration is a promising approach for computationally expensive phylogenetic problems, in spite of the fact that the involved algorithms are based on complex, control-dependent combinatorial optimization

    Integrated Circuit Implementation for a GaN HFET Driver Circuit

    Get PDF
    This paper presents the design and implementation of a new integrated circuit (IC) that is suitable for driving the new generation of high-frequency GaN HFETs. The circuit, based upon a resonant switching transition technique, is first briefly described and then discussed in detail, focusing on the design process practical considerations. A new level-shifter topology, used to generate the zero and negative gate-source voltages required to switch the GaN HFET, is introduced and analyzed. The experimental measurements included in this paper report the results of tests carried out on an IC designed and fabricated as part of the multiproject die in high-voltage process H35B4 of Austriamicrosystems. They fully demonstrate the performance of the proposed driver that opens the possibility of fully exploiting the wide capabilities and advantages of GaN devices for use in power electronics applications

    Audio-Based Wildfire Detection on Embedded Systems

    Get PDF
    The occurrence of wildfires often results in significant fatalities. As wildfires are notorious for their high speed of spread, the ability to identify wildfire at its early stage is essential in quickly obtaining control of the fire and in reducing property loss and preventing loss of life. This work presents a machine learning wildfire detecting data pipeline that can be deployed on embedded systems in remote locations. The proposed data pipeline consists of three main steps: audio preprocessing, feature engineering, and classification. Experiments show that the proposed data pipeline is capable of detecting wildfire effectively with high precision and is capable of detecting wildfire sound over the forest’s background soundscape. When being deployed on a Raspberry Pi 4, the proposed data pipeline takes 66 milliseconds to process a 1 s sound clip. To the knowledge of the author, this is the first edge-computing implementation of an audio-based wildfire detection syste

    Audio-Based Wildfire Detection on Embedded Systems

    Get PDF
    The occurrence of wildfires often results in significant fatalities. As wildfires are notorious for their high speed of spread, the ability to identify wildfire at its early stage is essential in quickly obtaining control of the fire and in reducing property loss and preventing loss of life. This work presents a machine learning wildfire detecting data pipeline that can be deployed on embedded systems in remote locations. The proposed data pipeline consists of three main steps: audio preprocessing, feature engineering, and classification. Experiments show that the proposed data pipeline is capable of detecting wildfire effectively with high precision and is capable of detecting wildfire sound over the forest’s background soundscape. When being deployed on a Raspberry Pi 4, the proposed data pipeline takes 66 milliseconds to process a 1 s sound clip. To the knowledge of the author, this is the first edge-computing implementation of an audio-based wildfire detection system

    A Reconfigurable Distributed Computing Fabric Exploiting Multilevel Parallelism

    Get PDF
    This paper presents a novel reconfigurable data flow processing architecture that promises high performance by explicitly targeting both fine- and course-grained parallelism. This architecture is based on multiple FPGAs organized in a scalable direct network that is substantially more interconnectefficient than currently used crossbar technology. In addition, we discuss several ancillary issues and propose solutions required to support this architecture and achieve maximal performance for general-purpose applications; these include supporting IP, mapping techniques, and routing policies that enable greater flexibility for architectural evolution and code portability. 1
    • …
    corecore