1,072 research outputs found

    Design and Development of an FPGA-based Distributed Computing Processing Platform

    Get PDF
    This thesis presents two frameworks- a software framework and a hardware core manager framework- which, together, can be used to develop a processing platform using a distributed system of field-programmable gate array (FPGA) boards. The software framework providesusers with the ability to easily develop applications that exploit the processing power of FPGAs while the hardware core manager framework gives users the ability to configure and interact with multiple FPGA boards and/or hardware cores. This thesis describes the design and development of these frameworks and analyzes the performance of a system that was constructed using the frameworks. The performance analysis included measuring the effect of incorporating additional hardware components into the system and comparing the system to a software-only implementation. This work draws conclusions based on the provided results of the performance analysis and offers suggestions for future work

    FPGA acceleration of sequence analysis tools in bioinformatics

    Full text link
    Thesis (Ph.D.)--Boston UniversityWith advances in biotechnology and computing power, biological data are being produced at an exceptional rate. The purpose of this study is to analyze the application of FPGAs to accelerate high impact production biosequence analysis tools. Compared with other alternatives, FPGAs offer huge compute power, lower power consumption, and reasonable flexibility. BLAST has become the de facto standard in bioinformatic approximate string matching and so its acceleration is of fundamental importance. It is a complex highly-optimized system, consisting of tens of thousands of lines of code and a large number of heuristics. Our idea is to emulate the main phases of its algorithm on FPGA. Utilizing our FPGA engine, we quickly reduce the size of the database to a small fraction, and then use the original code to process the query. Using a standard FPGA-based system, we achieved 12x speedup over a highly optimized multithread reference code. Multiple Sequence Alignment (MSA)--the extension of pairwise Sequence Alignment to multiple Sequences--is critical to solve many biological problems. Previous attempts to accelerate Clustal-W, the most commonly used MSA code, have directly mapped a portion of the code to the FPGA. We use a new approach: we apply prefiltering of the kind commonly used in BLAST to perform the initial all-pairs alignments. This results in a speedup of from 8Ox to 190x over the CPU code (8 cores). The quality is comparable to the original according to a commonly used benchmark suite evaluated with respect to multiple distance metrics. The challenge in FPGA-based acceleration is finding a suitable application mapping. Unfortunately many software heuristics do not fall into this category and so other methods must be applied. One is restructuring: an entirely new algorithm is applied. Another is to analyze application utilization and develop accuracy/performance tradeoffs. Using our prefiltering approach and novel FPGA programming models we have achieved significant speedup over reference programs. We have applied approximation, seeding, and filtering to this end. The bulk of this study is to introduce the pros and cons of these acceleration models for biosequence analysis tools

    Proceedings of the Second International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2015) Krakow, Poland

    Get PDF
    Proceedings of: Second International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2015). Krakow (Poland), September 10-11, 2015

    The case for a Hardware Filesystem

    Get PDF
    As secondary storage devices get faster with flash based solid state drives (SSDs) and emerging technologies like phase change memories (PCM), overheads in system software like operating system (OS) and filesystem become prominent and may limit the potential performance improvements. Moreover, with rapidly increasing on-chip core count, monolithic operating systems will face scalability issues on these many-core chips. Future operating systems are likely to have a distributed nature, with a separation of operating system services amongst cores. Also, general purpose processors are known to be both performance and power inefficient while executing operating system code. In the domain of High Performance Computing with FPGAs too, relying on the OS for file I/O transactions using slow embedded processors, hinders performance. Migrating the filesystem into a dedicated hardware core, has the potential of improving the performance of data-intensive applications by bypassing the OS stack to provide higher bandwdith and reduced latency while accessing disks. To test the feasibility of this idea, an FPGA-based Hardware Filesystem (HWFS) was designed with five basic operations (open, read, write, delete and seek). Furthermore, multi-disk and RAID-0 (striping) support has been implemented as an option in the filesystem. In order to reduce design complexity and facilitate easier testing of the HWFS, a RAM disk was used initially. The filesystem core has been integrated and tested with a hardware application core (BLAST) as well as a multi-node FPGA network to provide remote-disk access. Finally, a SATA IP core was developed and directly integrated with HWFS to test with SSDs. For evaluation, HWFS's performance was compared to an Ext2 filesystem, both on an FPGA-based soft processor as well as a modern AMD Opteron Linux server with sequential and random workloads. Results prove that the Hardware Filesystem and supporting infrastructure provide substantial performance improvement over software only systems. The system is also resource efficient consuming less than 3% of logic and 5% of the Block RAMs of a Xilinx Virtex-6 chip

    ROACH accelerated BLAST

    Get PDF
    Includes abstract.Includes bibliographical references (p. 115-118).Reconfigurable computing, in recent years, has been taking great strides in becoming part of mainstream computing largely due to the rapid growth in the size of FPGAs and their ability to adapt to certain complex applications efficiently. This dissertation investigates the reuse of application specific hardware developed for radio astronomy in accelerating a popular bioinformatics algorithm

    Parallel and Distributed Computing

    Get PDF
    The 14 chapters presented in this book cover a wide variety of representative works ranging from hardware design to application development. Particularly, the topics that are addressed are programmable and reconfigurable devices and systems, dependability of GPUs (General Purpose Units), network topologies, cache coherence protocols, resource allocation, scheduling algorithms, peertopeer networks, largescale network simulation, and parallel routines and algorithms. In this way, the articles included in this book constitute an excellent reference for engineers and researchers who have particular interests in each of these topics in parallel and distributed computing

    Exploration of Power-Performance Tradeoffs through Parameterization of FPGA-based Multiprocessor Systems

    Get PDF
    The design space of FPGA-based processor systems is huge, because many parameters can be modified at design- and runtime to achieve an efficient system solution in terms of performance, power and energy consumption. Such parameters are, for example, the number of processors and their configurations, the clock frequencies at design time, the use of dynamic frequency scaling at runtime, the application task distribution, and the FPGA type and size. The major contribution of this paper is the exploration of all these parameters and their impact on performance, power dissipation, and energy consumption for four different application scenarios. The goal is to introduce a first approach for a developer’s guideline, supporting the choice of an optimized and specific system parameterization for a target application on FPGA-based multiprocessor systems-on-chip. The FPGAs used for these explorations were Xilinx Virtex-4 and Xilinx Virtex-5. The performance results were measured on the FPGA while the power consumption was estimated using the Xilinx XPower Analyzer tool. Finally, a novel runtime adaptive multiprocessor architecture for dynamic clock frequency scaling is introduced and used for the performance, power and energy consumption evaluations

    Exploration of the power-performance tradeoff through parameterization of FPGA-based multiprocessor systems

    Get PDF
    The design space of FPGA-based processor systems is huge, because many parameters can be modified at design- and runtime to achieve an efficient system solution in terms of performance, power and energy consumption. Such parameters are, for example, the number of processors and their configurations, the clock frequencies at design time, the use of dynamic frequency scaling at runtime, the application task distribution, and the FPGA type and size. The major contribution of this paper is the exploration of all these parameters and their impact on performance, power dissipation, and energy consumption for four different application scenarios. The goal is to introduce a first approach for a developer's guideline, supporting the choice of an optimized and specific system parameterization for a target application on FPGA-based multiprocessor systems-on-chip. The FPGAs used for these explorations were Xilinx Virtex-4 and Xilinx Virtex-5. The performance results were measured on the FPGA while the power consumption was estimated using the Xilinx X Power Analyzer tool. Finally, a novel runtime adaptive multiprocessor architecture for dynamic clock frequency scaling is introduced and used for the performance, power and energy consumption evaluations

    MURAC: A unified machine model for heterogeneous computers

    Get PDF
    Includes bibliographical referencesHeterogeneous computing enables the performance and energy advantages of multiple distinct processing architectures to be efficiently exploited within a single machine. These systems are capable of delivering large performance increases by matching the applications to architectures that are most suited to them. The Multiple Runtime-reconfigurable Architecture Computer (MURAC) model has been proposed to tackle the problems commonly found in the design and usage of these machines. This model presents a system-level approach that creates a clear separation of concerns between the system implementer and the application developer. The three key concepts that make up the MURAC model are a unified machine model, a unified instruction stream and a unified memory space. A simple programming model built upon these abstractions provides a consistent interface for interacting with the underlying machine to the user application. This programming model simplifies application partitioning between hardware and software and allows the easy integration of different execution models within the single control ow of a mixed-architecture application. The theoretical and practical trade-offs of the proposed model have been explored through the design of several systems. An instruction-accurate system simulator has been developed that supports the simulated execution of mixed-architecture applications. An embedded System-on-Chip implementation has been used to measure the overhead in hardware resources required to support the model, which was found to be minimal. An implementation of the model within an operating system on a tightly-coupled reconfigurable processor platform has been created. This implementation is used to extend the software scheduler to allow for the full support of mixed-architecture applications in a multitasking environment. Different scheduling strategies have been tested using this scheduler for mixed-architecture applications. The design and implementation of these systems has shown that a unified abstraction model for heterogeneous computers provides important usability benefits to system and application designers. These benefits are achieved through a consistent view of the multiple different architectures to the operating system and user applications. This allows them to focus on achieving their performance and efficiency goals by gaining the benefits of different execution models during runtime without the complex implementation details of the system-level synchronisation and coordination
    corecore