1,478 research outputs found

    Document Classification Systems in Heterogeneous Computing Environments

    Get PDF
    Datacenter workloads demand high throughput, low cost and power efficient solutions. In most data centers the operating costs dominates the infrastructure cost. The ever growing amounts of data and the critical need for higher throughput, more energy efficient document classification solutions motivated us to investigate alternatives to the traditional homogeneous CPU based implementations of document classification systems. Several heterogeneous systems were investigated in the past where CPUs were combined with GPUs and FPGAs as system accelerators. The increasing complexity of FPGAs made them an interesting device in the heterogeneous computing environments and on the other hand difficult to program using Hardware Description languages. We explore the trade-offs when using high level synthesis and low level synthesis when programming FPGAs. Using low level synthesis results in less hardware resource usage on FPGAs and also offers the higher throughput compared to using HLS tool. While using HLS tool different heterogeneous computing devices such as multicore CPU and GPU targeted. Through our implementation experience and empirical results for data centric applications, we conclude that we can achieve power efficient results for these set of applications by either using low level synthesis or high level synthesis for programming FPGAs

    Automated optimization of reconfigurable designs

    Get PDF
    Currently, the optimization of reconfigurable design parameters is typically done manually and often involves substantial amount effort. The main focus of this thesis is to reduce this effort. The designer can focus on the implementation and design correctness, leaving the tools to carry out optimization. To address this, this thesis makes three main contributions. First, we present initial investigation of reconfigurable design optimization with the Machine Learning Optimizer (MLO) algorithm. The algorithm is based on surrogate model technology and particle swarm optimization. By using surrogate models the long hardware generation time is mitigated and automatic optimization is possible. For the first time, to the best of our knowledge, we show how those models can both predict when hardware generation will fail and how well will the design perform. Second, we introduce a new algorithm called Automatic Reconfigurable Design Efficient Global Optimization (ARDEGO), which is based on the Efficient Global Optimization (EGO) algorithm. Compared to MLO, it supports parallelism and uses a simpler optimization loop. As the ARDEGO algorithm uses multiple optimization compute nodes, its optimization speed is greatly improved relative to MLO. Hardware generation time is random in nature, two similar configurations can take vastly different amount of time to generate making parallelization complicated. The novelty is efficient use of the optimization compute nodes achieved through extension of the asynchronous parallel EGO algorithm to constrained problems. Third, we show how results of design synthesis and benchmarking can be reused when a design is ported to a different platform or when its code is revised. This is achieved through the new Auto-Transfer algorithm. A methodology to make the best use of available synthesis and benchmarking results is a novel contribution to design automation of reconfigurable systems.Open Acces

    FPGA ACCELERATION OF A CORTICAL AND A MATCHED FILTER-BASED ALGORITHM

    Get PDF
    Digital image processing is a widely used and diverse field. It is used in a broad array of areas such as tracking and detection, object avoidance, computer vision, and numerous other applications. For many image processing tasks, the computations can become time consuming. Therefore, a means for accelerating the computations would be beneficial. Using that as motivation, this thesis examines the acceleration of two distinctly different image processing applications. The first image processing application examined is a recent neocortex inspired cognitive model geared towards pattern recognition as seen in the visual cortex. For this model, both software and reconfigurable logic based FPGA implementations of the model are examined on a Cray XD1. Results indicate that hardware-acceleration can provide average throughput gains of 75 times over software-only implementations of the networks examined when utilizing the full resources of the Cray XD1. The second image processing application examined is matched filter-based position detection. This approach is at the heart of the automatic alignment algorithm currently being tested in the National Ignition Faculty presently under construction at the Lawrence Livermore National Laboratory. To reduce the processing time of the match filtering, a reconfigurable logic architecture was developed. Results show that the reconfigurable logic architecture provides a speedup of approximately 253 times over an optimized software implementation

    Implementation of the K-Means Algorithm on Heterogeneous Devices: A Use Case Based on an Industrial Dataset

    Get PDF
    This paper presents and analyzes a heterogeneous implementation of an industrial use case based on K-means that targets symmetric multiprocessing (SMP), GPUs and FPGAs. We present how the application can be optimized from an algorithmic point of view and how this optimization performs on two heterogeneous platforms. The presented implementation relies on the OmpSs programming model, which introduces a simplified pragma-based syntax for the communication between the main processor and the accelerators. Performance improvement can be achieved by the programmer explicitly specifying the data memory accesses or copies. As expected, the newer SMP+GPU system studied is more powerful than the older SMP+FPGA system. However the latter is enough to fulfill the requirements of our use case and we show that uses less energy when considering only the active power of the execution.This work is partially supported by the European Union H2020 project AXIOM (grant agreement n. 645496), HiPEAC (grant agreement n. 687698), and Mont-Blanc (grant agreements n. 288777, 610402 and 671697), the Spanish Government Programa Severo Ochoa (SEV-2015-0493), the Spanish Ministry of Science and Technology (TIN2015- 65316-P) and the Departament d’Innovació, Universitats i Empresa de la Generalitat de Catalunya, under project MPEXPAR: Models de Programaci´o i Entorns d’Execució Paral·lels (2014-SGR-1051).Peer ReviewedPostprint (author's final draft
    • …
    corecore