21 research outputs found

    A fully parameterized virtual coarse grained reconfigurable array for high performance computing applications

    Get PDF
    Field Programmable Gate Arrays (FPGAs) have proven their potential in accelerating High Performance Computing (HPC) Applications. Conventionally such accelerators predominantly use, FPGAs that contain fine-grained elements such as LookUp Tables (LUTs), Switch Blocks (SB) and Connection Blocks (CB) as basic programmable logic blocks. However, the conventional implementation suffers from high reconfiguration and development costs. In order to solve this problem, programmable logic components are defined at a virtual higher abstraction level. These components are called Processing Elements (PEs) and the group of PEs along with the inter-connection network form an architecture called a Virtual Coarse-Grained Reconfigurable Array (VCGRA). The abstraction helps to reconfigure the PEs faster at the intermediate level than at the lower-level of an FPGA. Conventional VCGRA implementations (built on top of the lower levels of the FPGA) use functional resources such as LUTs to establish required connections (intra-connect) within a PE. In this paper, we propose to use the parameterized reconfiguration technique to implement the intra-connections of each PE with the aim to reduce the FPGA resource utilization (LUTs). The technique is used to parameterize the intra-connections with parameters that only change their value infrequently (whenever a new VCGRA function has to be reconfigured) and that are implemented as constants. Since the design is optimized for these constants at every moment in time, this reduces the resource utilization. Further, interconnections (network between the multiple PEs) of the VCGRA grid can also be parameterized so that both the inter- and intraconnect network of the VCGRA grid can be mapped onto the physical switch blocks of the FPGA. For every change in parameter values a specialized bitstream is generated on the fly and the FPGA is reconfigured using the parameterized run-time reconfiguration technique. Our results show a drastic reduction in FPGA LUT resource utilization in the PE by at least 30% and in the intra-network of the PE by 31% when implementing an HPC application

    EDRA:A Hardware-assisted Decoupled Access/Execute Framework on the Digital Market

    Get PDF
    EDRA was an Horizon 2020 FET Launchpad project that focused on the commercialization of the Decoupled Access Execution Reconfigurable (DAER) framework - developed within the FET-HPC EXTRA project - on Amazon's Elastic Cloud (EC2) Compute FPGA-based infrastructure. The delivered framework encapsulates DAER into a EC2 virtual machine (VM), and uses a simple, directive-based, high-level application programming interface (API) to facilitate application mapping to the underlying hardware architecture. EDRA's Minimum Viable Product (MVP) is an accelerator for the Phylogenetic Likelihood Function (PLF), one of the cornerstone functions in most phylogenetic inference tools, achieving up to 8x performance improvement compared to optimized software implementations. Towards entering the market, research revealed that Europe is an extremely promising geographic region for focusing the project efforts on dissemination, MVP promotion and advertisement

    Accelerating Phylogenetics Using FPGAs in the Cloud

    Get PDF
    Phylogenetics study the evolutionary history of organisms using an iterative process of creating and evaluating phylogenetic trees. This process is very computationally intensive; constructing a large phylogenetic tree requires hundreds to thousands of CPU hours. In this article, we describe an FPGA-based system that can be deployed on AWS EC2 F1 cloud instances to accelerate phylogenetic analyses by boosting performance of the phylogenetic likelihood function, i.e., a widely employed tree-evaluation function that accounts for up to 95% of the overall analysis time. We exploit domain-specific knowledge to reduce the amount of transferred data that limits overall system performance. Our proof-of-concept implementation reveals that the effective accelerator throughput nearly quadruples with optimized data movement, reaching up to 75% of its theoretical peak and nearly 10× faster processing than a CPU using AVX2 extensions

    EXTRA: Towards an efficient open platform for reconfigurable High Performance Computing

    Get PDF
    To handle the stringent performance requirements of future exascale-class applications, High Performance Computing (HPC) systems need ultra-efficient heterogeneous compute nodes. To reduce power and increase performance, such compute nodes will require hardware accelerators with a high degree of specialization. Ideally, dynamic reconfiguration will be an intrinsic feature, so that specific HPC application features can be optimally accelerated, even if they regularly change over time. In the EXTRA project, we create a new and flexible exploration platform for developing reconfigurable architectures, design tools and HPC applications with run-time reconfiguration built-in as a core fundamental feature instead of an add-on. EXTRA covers the entire stack from architecture up to the application, focusing on the fundamental building blocks for run-time reconfigurable exascale HPC systems: new chip architectures with very low reconfiguration overhead, new tools that truly take reconfiguration as a central design concept, and applications that are tuned to maximally benefit from the proposed run-time reconfiguration techniques. Ultimately, this open platform will improve Europe's competitive advantage and leadership in the field

    Exploring Modern FPGA Platforms for Faster Phylogeny Reconstruction with RAxML

    Get PDF
    The Phylogenetic Likelihood Function (PLF) is one of the cornerstone functions in most phylogenetic inference tools; its execution represents the majority of time required to complete an analysis. This work proposes the acceleration of this function using reconfigurable hardware accelerators, focusing on system-on-chips that integrate Field Programmable Gate Array (FPGA) resources as well as traditional High Performance Computing (HPC) systems that use FPGA-based accelerator cards. Taking into account the specific properties of each platform in order to exploit their processing capabilities, the proposed solutions provide significant performance gains. The measured acceleration of PLF function is up to 8x while the overall time to complete a phylogenetic analysis using the popular RAxML software can be reduced up to 3.2 times (with respect to a pure software implementation on a high-end server processor). Compared to other similar solutions proposed in literature, our systems perform up to 65% faster

    RESENSE: An innovative, reconfigurable, powerful and energy efficient WSN node

    No full text
    Summarization: Wireless Sensor Networks (WSNs) have recently enjoyed a tremendous rise in popularity. The current WSN node offerings, however, need both increased processing power and lower energy consumption in order to enable the full potential of such networks. To address these requirements, we explore the benefits of an innovative platform which combines a standard wireless node with very low cost reconfigurable hardware. In order to evaluate the efficiency of this pioneering approach three different networking and security protocols have been implemented on the present system: a) Turbo coding, b) Blowfish encryption and c) XMesh routing. Our real-world experiments demonstrate that our prototype system provides comparable performance to the existing microcontroller-based schemes (while in its productized version it could potentially be much faster) whereas, and more importantly, its overall energy consumption is from 70% to 93% lower than that triggered when a very widely used commercial WSN node is executing the exact same processing tasks.Παρουσιάστηκε στο: IEEE International Conference on Communications (ICC

    An open-source extendable, highly-accurate and security aware simulator for cloud applications

    No full text
    In this demo, we present COSSIM, an open-source simulation framework for cloud applications. Our solution models the client and server computing devices as well as the network that comprise the overall system and thus provides cycle accurate results, realistic communications and power/energy consumption estimates based on the actual dynamic usage scenarios. The simulator provides the necessary hooks to security testing software and can be extended through an IEEE standardized interface to include additional tools, such as simulators of physical models. The application that will be used to demonstrate COSSIM is mobile visual search, where mobile nodes capture images, extract their compressed representation and dispatch a query to the cloud. A server compares the received query to a local database and sends back some of the corresponding results

    A CAD Open Platform for High Performance Reconfigurable Systems in the EXTRA Project

    No full text
    As the power wall has become one of the main limiting factors for the performance of general purpose processors, the trend in High Performance Computing (HPC) is moving towards application-specific accelerators in order to meet the stringent performance requirements for exascale computing while still satisfying power budget constraints. Within this context, reconfigurable devices, and more specifically FPGA-based systems, represent a promising solution able to achieve highly energy efficient computations without jeopardizing performance. Nevertheless, the exploitation of reconfigurable hardware is still limited due to the hardware-software co-design challenges that it poses, the time consuming design space exploration process and the programming complexity. To overcome these challenges, the EXTRA European project addresses the reconfigurability of such devices as a first-class feature, covering the entire stack from the system architecture up to the application. Within this paper, we present the effort of the EXTRA project towards the definition of an adaptive open platform for the optimization and implementation of applications on high performance reconfigurable architectures. The underlying infrastructure of the platform is here presented, highlighting its capability to integrate modules from different developers in order to stimulate external contributions and open research

    HEAP: A Highly Efficient Adaptive multi-Processor framework

    Get PDF
    Writing parallel code is difficult, especially when starting from a sequential reference implementation. Our research efforts, as demonstrated in this paper, face this challenge directly by providing an innovative toolset that helps software developers profile and parallelize an existing sequential implementation, by exploiting top-level pipeline-style parallelism. The innovation of our approach is based on the facts that (a) we use both automatic and profile-driven estimates of the available parallelism, (b) we refine those estimates using metric-driven verification techniques, and (c) we support dynamic recovery of excessively optimistic parallelization. The proposed toolset has been utilized to find an efficient parallel code organization for a number of real-world representative applications, and a version of the toolset is provided in an open-source manner

    COSSIM: An open-source integrated solution to address the simulator gap for systems of systems

    No full text
    In an era of complex networked heterogeneous systems, simulating independently only parts, components or attributes of a system under design is not a viable, accurate or efficient option. The interactions are too many and too complicated to produce meaningful results and the optimization opportunities are severely limited when considering each part of a system in an isolated manner. The presented COSSIM simulation framework is the first known open-source, high-performance simulator that can handle holistically system-of-systems including processors, peripherals and networks; such an approach is very appealing to both CPS/IoT and Highly Parallel Heterogeneous Systems designers and application developers. Our highly integrated approach is further augmented with accurate power estimation and security sub-tools that can tap on all system components and perform security and robustness analysis of the overall networked system. Additionally, a GUI has been developed to provide easy simulation set-up, execution and visualization of results. COSSIM has been evaluated using real-world applications representing cloud (mobile visual search) and CPS systems (building management) demonstrating high accuracy and performance that scales almost linearly with the number of CPUs dedicated to the simulator
    corecore