24 research outputs found

    RETHINK big: European roadmap for hardware anc networking optimizations for big data

    Get PDF
    This paper discusses the results of the RETHINK big Project, a 2-year Collaborative Support Action funded by the European Commission in order to write the European Roadmap for Hardware and Networking optimizations for Big Data. This industry-driven project was led by the Barcelona Supercomputing Center (BSC), and it included large industry partners, SMEs and academia. The roadmap identifies business opportunities from 89 in-depth interviews with 70 European industry stakeholders in the area of Big Data and predicts the future technologies that will disrupt the state of the art in Big Data processing in terms of hardware and networking optimizations. Moreover, it presents coordinated technology development recommendations (focused on optimizations in networking and hardware) that would be in the best interest of European Big Data companies to undertake in concert as a matter of competitive advantage.This project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement n° 619788. It has also been supported by the Spanish Government (grant SEV2015-0493 of the Severo Ochoa Program), by the Spanish Ministry of Science and Innovation (contract TIN2015-65316) and by Generalitat de Catalunya (contracts 2014-SGR-1051 and 2014-SGR-1272).Peer ReviewedPostprint (author's final draft

    ALU for mbedTLS Diffie-Hellman Parameters Generator on FPGA Embedded Processor System

    Get PDF
    Safe prime is a unique subset of the general prime number where both p and (p-1)/2 are primes. Commonly used Public Key encryption scheme Diffie-Hellman key exchange algorithm utilizes ultra large safe primes as the private key. In practice, crypto software libraries implement a specific Diffie-Hellman parameters generator that searches for safe primes with Rabin-Miller probabilistic primality test algorithm. Without any proven theory to predict their occurrences among natural numbers, generator programs generally start at a randomly seeded odd positive integer of a predetermined size; and perform primality tests in iterations over incrementing candidates until success. The staggeringly low density of safe primes causes a prohibitive amount of computing resources to be dedicated in the generation process. As the result, power conscious mobile and embedded devices can no longer compute the standard 2048-bit safe primes without causing prolonged disruption to the overall system performance. Based on the hot path analysis of the generator program, a parallelized and pipelined ALU is proposed and implemented on the FPGA embedded processor system. Utilizing merely 3% of LUT (584/17600) and 20% of DSP (16/80) available from the Xilinx Zynq 7010 All Programmable SoC, the suggested design is theoretically capable of offsetting more than 90% of CPU utilizations needed for the entire safe prime generation process. Such results demonstrate the deficiency of today's general purpose CPU in handling certain complex and resource intensive computations. Such scenarios greatly incentivize the integration of programmable hardware with fixed design CPU. Additional research is suggested to focus in the area of automating the processes of locating the specific CPU intensive task, translating such task onto programmable hardware, and providing software accessible interface to enable fast development and deployment of the hot function based programmable hardware design. From there, programmable hardware assisted computing platforms can be further enhanced to dynamically program hardware modules based on real-time utilizations to achieve even greater overall system performance. A new system design paradigm can potentially be introduced as the result

    ALU for mbedTLS Diffie-Hellman Parameters Generator on FPGA Embedded Processor System

    Get PDF
    Safe prime is a unique subset of the general prime number where both p and (p-1)/2 are primes. Commonly used Public Key encryption scheme Diffie-Hellman key exchange algorithm utilizes ultra large safe primes as the private key. In practice, crypto software libraries implement a specific Diffie-Hellman parameters generator that searches for safe primes with Rabin-Miller probabilistic primality test algorithm. Without any proven theory to predict their occurrences among natural numbers, generator programs generally start at a randomly seeded odd positive integer of a predetermined size; and perform primality tests in iterations over incrementing candidates until success. The staggeringly low density of safe primes causes a prohibitive amount of computing resources to be dedicated in the generation process. As the result, power conscious mobile and embedded devices can no longer compute the standard 2048-bit safe primes without causing prolonged disruption to the overall system performance. Based on the hot path analysis of the generator program, a parallelized and pipelined ALU is proposed and implemented on the FPGA embedded processor system. Utilizing merely 3% of LUT (584/17600) and 20% of DSP (16/80) available from the Xilinx Zynq 7010 All Programmable SoC, the suggested design is theoretically capable of offsetting more than 90% of CPU utilizations needed for the entire safe prime generation process. Such results demonstrate the deficiency of today's general purpose CPU in handling certain complex and resource intensive computations. Such scenarios greatly incentivize the integration of programmable hardware with fixed design CPU. Additional research is suggested to focus in the area of automating the processes of locating the specific CPU intensive task, translating such task onto programmable hardware, and providing software accessible interface to enable fast development and deployment of the hot function based programmable hardware design. From there, programmable hardware assisted computing platforms can be further enhanced to dynamically program hardware modules based on real-time utilizations to achieve even greater overall system performance. A new system design paradigm can potentially be introduced as the result

    OmpSs@cloudFPGA: An FPGA task-based programming model with message passing

    Get PDF
    Nowadays, a new parallel paradigm for energy-efficient heterogeneous hardware infrastructures is required to achieve better performance at a reasonable cost on high-performance computing applications. Under this new paradigm, some application parts are offloaded to specialized accelerators that run faster or are more energy-efficient than CPUs. Field-Programmable Gate Arrays (FPGA) are one of those types of accelerators that are becoming widely available in data centers. This paper proposes OmpSs@cloudFPGA, which includes novel extensions to parallel task-based programming models that enable easy and efficient programming of heterogeneous clusters with FPGAs. The programmer only needs to annotate, with OpenMP-like pragmas, the tasks of the application that should be accelerated in the cluster of FPGAs. Next, the proposed programming model framework automatically extracts parts annotated with High-Level Synthesis (HLS) pragmas and synthesizes them into hardware accelerator cores for FPGAs. Additionally, our extensions include and support two novel features: 1) FPGA-to-FPGA direct communication using a Message Passing Interface (MPI) similar Application Programming Interface (API) with one-to-one and collective communications to alleviate host communication channel bottleneck, and 2) creating and spawning work from inside the FPGAs to their own accelerator cores based on an MPI rank-like identification. These features break the classical host-accelerator model, where the host (typically the CPU) generates all the work and distributes it to each accelerator. We also present an evaluation of OmpSs@cloudFPGA for different parallel strategies of the N-Body application on the IBM cloudFPGA research platform. Results show that for cluster sizes up to 56 FPGAs, the performance scales linearly. To the best of our knowledge, this is the best performance obtained for N-body over FPGA platforms, reaching 344 Gpairs/s with 56 FPGAs. Finally, we compare the performance and power consumption of the proposed approach with the ones obtained by a classical execution on the MareNostrum 4 supercomputer, demonstrating that our FPGA approach reduces power consumption by an order of magnitude.This work has been done in the context of the IBM/BSC Deep Learning Center initiative. This work has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 754337 (EuroEXA), from Spanish Government (PID2019-107255GBC21/AEI/10.13039/501100011033), and from Generalitat de Catalunya (2017-SGR-1414 and 2017-SGR-1328).Peer ReviewedPostprint (author's final draft

    Myths and Legends in High-Performance Computing

    Full text link
    In this thought-provoking article, we discuss certain myths and legends that are folklore among members of the high-performance computing community. We gathered these myths from conversations at conferences and meetings, product advertisements, papers, and other communications such as tweets, blogs, and news articles within and beyond our community. We believe they represent the zeitgeist of the current era of massive change, driven by the end of many scaling laws such as Dennard scaling and Moore's law. While some laws end, new directions are emerging, such as algorithmic scaling or novel architecture research. Nevertheless, these myths are rarely based on scientific facts, but rather on some evidence or argumentation. In fact, we believe that this is the very reason for the existence of many myths and why they cannot be answered clearly. While it feels like there should be clear answers for each, some may remain endless philosophical debates, such as whether Beethoven was better than Mozart. We would like to see our collection of myths as a discussion of possible new directions for research and industry investment

    Multi-Tenant Cloud FPGA: A Survey on Security

    Full text link
    With the exponentially increasing demand for performance and scalability in cloud applications and systems, data center architectures evolved to integrate heterogeneous computing fabrics that leverage CPUs, GPUs, and FPGAs. FPGAs differ from traditional processing platforms such as CPUs and GPUs in that they are reconfigurable at run-time, providing increased and customized performance, flexibility, and acceleration. FPGAs can perform large-scale search optimization, acceleration, and signal processing tasks compared with power, latency, and processing speed. Many public cloud provider giants, including Amazon, Huawei, Microsoft, Alibaba, etc., have already started integrating FPGA-based cloud acceleration services. While FPGAs in cloud applications enable customized acceleration with low power consumption, it also incurs new security challenges that still need to be reviewed. Allowing cloud users to reconfigure the hardware design after deployment could open the backdoors for malicious attackers, potentially putting the cloud platform at risk. Considering security risks, public cloud providers still don't offer multi-tenant FPGA services. This paper analyzes the security concerns of multi-tenant cloud FPGAs, gives a thorough description of the security problems associated with them, and discusses upcoming future challenges in this field of study
    corecore