41 research outputs found

    Lessons learned from contrasting a BLAS kernel implementations

    Get PDF
    This work reviews the experience of implementing different versions of the SSPR rank-one update operation of the BLAS library. The main objective was to contrast CPU versus GPU implementation effort and complexity of an optimized BLAS routine, not considering performance. This work contributes with a sample procedure to compare BLAS kernel implementations, how to start using GPU libraries and offloading, how to analyze their performance and the issues faced and how they were solved.WPDP- XIII Workshop procesamiento distribuido y paraleloRed de Universidades con Carreras en Informática (RedUNCI

    Lessons learned from contrasting a BLAS kernel implementations

    Get PDF
    This work reviews the experience of implementing different versions of the SSPR rank-one update operation of the BLAS library. The main objective was to contrast CPU versus GPU implementation effort and complexity of an optimized BLAS routine, not considering performance. This work contributes with a sample procedure to compare BLAS kernel implementations, how to start using GPU libraries and offloading, how to analyze their performance and the issues faced and how they were solved.WPDP- XIII Workshop procesamiento distribuido y paraleloRed de Universidades con Carreras en Informática (RedUNCI

    Modified honey encryption scheme for encoding natural language message

    Get PDF
    Conventional encryption schemes are susceptible to brute-force attacks. This is because bytes encode utf8 (or ASCII) characters. Consequently, an adversary that intercepts a ciphertext and tries to decrypt the message by brute-forcing with an incorrect key can filter out some of the combinations of the decrypted message by observing that some of the sequences are a combination of characters which are distributed non-uniformly and form no plausible meaning. Honey encryption (HE) scheme was proposed to curtail this vulnerability of conventional encryption by producing ciphertexts yielding valid-looking, uniformly distributed but fake plaintexts upon decryption with incorrect keys. However, the scheme works for only passwords and PINS. Its adaptation to support encoding natural language messages (e-mails, human-generated documents) has remained an open problem. Existing proposals to extend the scheme to support encoding natural language messages reveals fragments of the plaintext in the ciphertext, hence, its susceptibility to chosen ciphertext attacks (CCA). In this paper, we modify the HE schemes to support the encoding of natural language messages using Natural Language Processing techniques. Our main contribution was creating a structure that allowed a message to be encoded entirely in binary. As a result of this strategy, most binary string produces syntactically correct messages which will be generated to deceive an attacker who attempts to decrypt a ciphertext using incorrect keys. We evaluate the security of our proposed scheme

    Virtualized FPGA accelerators for efficient cloud computing

    Get PDF
    Hardware accelerators implement custom architectures to significantly speed up computations in a wide range of domains. As performance scaling in server-class CPUs slows, we propose the integration of hardware accelerators in the cloud as a way to maintain a positive performance trend. Field programmable gate arrays (FPGAs) represent the ideal way to integrate accelerators in the cloud, since they can be reprogrammed as needs change and allow multiple accelerators to share optimised communication infrastructure. We discuss a framework that integrates reconfigurable accelerators in a standard server with virtualised resource management and communication. We then present a case study that quantifies the efficiency benefits and break-even point for integrating FPGAs in the cloud

    Порівняльний аналіз baseband-процесорів для реалізації SDR-трансиверів

    Get PDF
    Проведено порівняльний аналіз обчислювальних засобів для їх подальшого використання у ролі baseband-процесору для трансивера типу Software Defined Radio. Встановлено, що для цієї ролі можна використати процесори загального користування у парі з графічними процесорами, що мають високу гнучкість у проектуванні, але низьку швидкодію та високе енергоспоживання. Використання спеціальних процесорів обробки сигналів надає перевагу у кращому енергоспоживанні, що надає можливість використовувати їх для швидкої розробки портативних трансиверів з достатньо низкою ціною. Для високопродуктивних трансиверів краще за все використовувати програмовані логічні інтегральні схеми, що за рахунок високого паралелізму надують суттєвий виграш у швидкодії. Запропонована власна архітектура трансивера з використанням системи-на-кристалі та радіочастотного трансивера для побудови гнучкої системи передачі інформації по безпровідному каналу зв’язку.Software-defined Radio is a programmable transceiver with the capability of operating various wireless communication protocols without the need to change or update the hardware. Consequently, Software-defined Radio has earned a lot of attention and is of great significance to both academia, military and aerospace industry. Components of Softwaredefined Radio (e.g. mixers, filters, amplifiers, modulators/demodulators, detectors, etc.) implemented by means of software on a personal computer or embedded system. Operation of signal processing are handed over to the baseband processor, rather than being done in special electronic circuits. Baseband processors are implemented through employing various types of hardware platforms, such as General Purpose Processors, Graphics Processing Units, Digital Signal Processors, and Field Programmable Gate Arrays. Each of these platforms is associated with their own set of advantages and disadvantages. In this paper was proposed a comparison of the state-of-the-art hardware platforms in the context of implementation Softwaredefined Radio transceivers. For comparison was determined as follow criteria: computational power of hardware platform, power consumption, complexity of developing, and cost of tools and equipment. First approaches to realizing baseband processors is using a General Purpose Processor and accelerating by Graphics Processing Units. But General Purpose Processor and Graphics Processing Units execute software instructions in the sequential order. For this reason, General Purpose Processors are not convenient for high-throughput computing with real-time requirements. Also this hardware platforms have increased power consumption. This aspect does not allow use General Purpose Processor and Graphics Processing Units in small and portable Software-defined Radio transceivers. In other hand, General Purpose Processors are preferable hardware platform by researchers and beginners due to their flexibility and programmability. Therefore, General Purpose Processors and Graphics Processing Units is highly recommended for prototyping Software-defined Radio platforms. Digital Signal Processor was reviewed as alternative approach for implementing baseband processors. Digital Signal Processors is a particular type of General Purpose Processors that is optimized to process digital signals. Digital Signal Processors have similar disadvantage with insufficient computational power, but some manufacturer sell energy optimized Digital Signal Processors. Consequently, Digital Signal Processor is commonly used in small and portable Software-defined Radio transceivers. Field Programmable Gate Arrays and System-on-Chips with Field Programmable Gate Array are strongly recommended for high-performance Software-defined Radio platforms. This hardware platforms combine the flexibility of processors and efficiency of small Digital Signal Processor. Field Programmable Gate Arrays can achieve a high level of parallelism in executing digital signal processing. However, the designers must have a high degree in digital electronics and good acknowledgement of hardware description languages. After the research, was proposed own flexible architecture Softwaredefined Radio transceiver and methods for development

    Lessons learned from contrasting a BLAS kernel implementations

    Get PDF
    This work reviews the experience of implementing different versions of the SSPR rank-one update operation of the BLAS library. The main objective was to contrast CPU versus GPU implementation effort and complexity of an optimized BLAS routine, not considering performance. This work contributes with a sample procedure to compare BLAS kernel implementations, how to start using GPU libraries and offloading, how to analyze their performance and the issues faced and how they were solved.WPDP- XIII Workshop procesamiento distribuido y paraleloRed de Universidades con Carreras en Informática (RedUNCI

    Benchmarking micro-core architectures for detecting disasters at the edge

    Get PDF
    Leveraging real-time data to detect disasters such as wildfires, extreme weather, earthquakes, tsunamis, human health emergencies, or global diseases is an important opportunity. However, much of this data is generated in the field and the volumes involved mean that it is impractical for transmission back to a central data-centre for processing. Instead, edge devices are required to generate insights from sensor data streaming in, but an important question given the severe performance and power constraints that these must operate under is that of the most suitable CPU architecture. One class of device that we believe has a significant role to play here is that of micro-cores, which combine many simple low-power cores in a single chip. However, there are many to choose from, and an important question is which is most suited to what situation. This paper presents the Eithne framework, designed to simplify benchmarking of micro-core architectures. Three benchmarks, LINPACK, DFT and FFT, have been implemented atop of this framework and we use these to explore the key characteristics and concerns of common micro-core designs within the context of operating on the edge for disaster detection. The result of this work is an extensible framework that the community can use help develop and test these devices in the future.Comment: Preprint of paper accepted to IEEE/ACM Second International Workshop on the use of HPC for Urgent Decision Making (UrgentHPC
    corecore