2,316 research outputs found

    Mixing multi-core CPUs and GPUs for scientific simulation software

    Get PDF
    Recent technological and economic developments have led to widespread availability of multi-core CPUs and specialist accelerator processors such as graphical processing units (GPUs). The accelerated computational performance possible from these devices can be very high for some applications paradigms. Software languages and systems such as NVIDIA's CUDA and Khronos consortium's open compute language (OpenCL) support a number of individual parallel application programming paradigms. To scale up the performance of some complex systems simulations, a hybrid of multi-core CPUs for coarse-grained parallelism and very many core GPUs for data parallelism is necessary. We describe our use of hybrid applica- tions using threading approaches and multi-core CPUs to control independent GPU devices. We present speed-up data and discuss multi-threading software issues for the applications level programmer and o er some suggested areas for language development and integration between coarse-grained and ne-grained multi-thread systems. We discuss results from three common simulation algorithmic areas including: partial di erential equations; graph cluster metric calculations and random number generation. We report on programming experiences and selected performance for these algorithms on: single and multiple GPUs; multi-core CPUs; a CellBE; and using OpenCL. We discuss programmer usability issues and the outlook and trends in multi-core programming for scienti c applications developers

    A new TRNG based on coherent sampling with self-timed rings

    Get PDF
    Random numbers play a key role in applications such as industrial simulations, laboratory experimentation, computer games, and engineering problem solving. The design of new true random generators (TRNGs) has attracted the attention of the research community for many years. Designs with little hardware requirements and high throughput are demanded by new and powerful applications. In this paper, we introduce the design of a novel TRNG based on the coherent sampling (CS) phenomenon. Contrary to most designs based on this phenomenon, ours uses self-timed rings (STRs) instead of the commonly employed ring oscillators (ROs). Our design has two key advantages over existing proposals based on CS. It does not depend on the FPGA vendor used and does not need manual placement and routing in the manufacturing process, resulting in a highly portable generator. Our experiments show that the TRNG offers a very high throughput with a moderate cost in hardware. The results obtained with ENT, DIEHARD, and National Institute of Standards and Technology (NIST) statistical test suites evidence that the output bitstream behaves as a truly random variable.This work was supported in part by the Ministerio de Economia y Competitividad (MINECO), Security and Privacy in the Internet of You (SPINY), under Grant TIN2013-46469-R, and in part by the Comunidad de Madrid (CAM), Cybersecurity, Data, and Risks (CIBERDINE), underGrant S2013/ICE-3095

    Modelling Electronic Circuit Failures using a Xilinx FPGA System

    Get PDF
    FPGAs are a ubiquitous electronic component utilised in a wide range of electronic systems across many industries. Almost all modern FPGAs employ SRAM based configuration memory elements which are susceptible to radiation induced soft errors. In high altitude and space applications, as well as in the nuclear and defence industries, such circuits must operate reliably in radiation-rich environments. A range of soft error mitigation techniques have been proposed but testing and qualification of new fault tolerant circuits can be an expensive and time consuming process. A novel method for simulating radiation-induced soft errors is presented that operates entirely within a laboratory environment and requires no hazardous exposure to radiation or expensive airborne test rigs. A system utilising modular redundancy is then implemented and tested under the new method. The test system is further demonstrated in conjunction with a software flight simulator to test single electronic modules in the context of active service on board a passenger aircraft and the effects of failure under radiation induced soft errors are observed. Our research proposes a test regime in which design strategies for self-healing circuits can be compared and demonstrated to work

    True random number generator on FPGA

    Get PDF
    Tato práce se zabývá implementací hardwarového generátoru náhodných čísel na FPGA vývojové desce, který staví na páru kruhových oscilátorů a zahrnuje testování vlivu změn teploty a napájecího napětí na generovaný výstup. Vyhodnocení staví na NIST testech. Výsledky ukazují, že změny prostředí neovlivňují výstup generátoru žádným významným způsobem.This thesis deals with the implementation of a true random number generator on FPGA development board building on pair of ring oscillators and explores the influence of temperature and power supply changes on generated output, evaluated by NIST-inspired tests. The results show that environmental changes does not impact the ouput in any significant way

    A Hybrid Partially Reconfigurable Overlay Supporting Just-In-Time Assembly of Custom Accelerators on FPGAs

    Get PDF
    The state of the art in design and development flows for FPGAs are not sufficiently mature to allow programmers to implement their applications through traditional software development flows. The stipulation of synthesis as well as the requirement of background knowledge on the FPGAs\u27 low-level physical hardware structure are major challenges that prevent programmers from using FPGAs. The reconfigurable computing community is seeking solutions to raise the level of design abstraction at which programmers must operate, and move the synthesis process out of the programmers\u27 path through the use of overlays. A recent approach, Just-In-Time Assembly (JITA), was proposed that enables hardware accelerators to be assembled at runtime, all from within a traditional software compilation flow. The JITA approach presents a promising path to constructing hardware designs on FPGAs using pre-synthesized parallel programming patterns, but suffers from two major limitations. First, all variant programming patterns must be pre-synthesized. Second, conditional operations are not supported. In this thesis, I present a new reconfigurable overlay, URUK, that overcomes the two limitations imposed by the JITA approach. Similar to the original JITA approach, the proposed URUK overlay allows hardware accelerators to be constructed on FPGAs through software compilation flows. To this basic capability, URUK adds additional support to enable the assembly of presynthesized fine-grained computational operators to be assembled within the FPGA. This thesis provides analysis of URUK from three different perspectives; utilization, performance, and productivity. The analysis includes comparisons against High-Level Synthesis (HLS) and the state of the art approach to creating static overlays. The tradeoffs conclude that URUK can achieve approximately equivalent performance for algebra operations compared to HLS custom accelerators, which are designed with simple experience on FPGAs. Further, URUK shows a high degree of flexibility for runtime placement and routing of the primitive operations. The analysis shows how this flexibility can be leveraged to reduce communication overhead among tiles, compared to traditional static overlays. The results also show URUK can enable software programmers without any hardware skills to create hardware accelerators at productivity levels consistent with software development and compilation

    Delay Measurements and Self Characterisation on FPGAs

    No full text
    This thesis examines new timing measurement methods for self delay characterisation of Field-Programmable Gate Arrays (FPGAs) components and delay measurement of complex circuits on FPGAs. Two novel measurement techniques based on analysis of a circuit's output failure rate and transition probability is proposed for accurate, precise and efficient measurement of propagation delays. The transition probability based method is especially attractive, since it requires no modifications in the circuit-under-test and requires little hardware resources, making it an ideal method for physical delay analysis of FPGA circuits. The relentless advancements in process technology has led to smaller and denser transistors in integrated circuits. While FPGA users benefit from this in terms of increased hardware resources for more complex designs, the actual productivity with FPGA in terms of timing performance (operating frequency, latency and throughput) has lagged behind the potential improvements from the improved technology due to delay variability in FPGA components and the inaccuracy of timing models used in FPGA timing analysis. The ability to measure delay of any arbitrary circuit on FPGA offers many opportunities for on-chip characterisation and physical timing analysis, allowing delay variability to be accurately tracked and variation-aware optimisations to be developed, reducing the productivity gap observed in today's FPGA designs. The measurement techniques are developed into complete self measurement and characterisation platforms in this thesis, demonstrating their practical uses in actual FPGA hardware for cross-chip delay characterisation and accurate delay measurement of both complex combinatorial and sequential circuits, further reinforcing their positions in solving the delay variability problem in FPGAs

    A PUF based on transient effect ring oscillator and insensitive to locking phenomenon

    No full text
    International audienceThis paper presents a new silicon physical unclonable function (PUF) based on a transient effect ring oscillator (TERO). The proposed PUF has state of the art PUF characteristics with a good ratio of PUF response variability to response length. Unlike RO-PUF, it is not sensitive to the locking phenomenon, which challenges the use of ring oscillators for the design of both PUF and TRNG. The novel architecture using differential structures guarantees high stability of the TERO-PUF. The area of the TERO-PUF is relatively high, but is still comparable with other PUF designs. However, since the same piece of hardware can be used for both PUF and random number generation, the proposed principle offers an interesting low area mixed solution

    D2.1 - Report on Selected TRNG and PUF Principles

    Get PDF
    This report represents the final version of Deliverable 2.1 of the HECTOR work package WP2. It is a result of discussions and work on Task 2.1 of all HECTOR partners involved in WP2. The aim of the Deliverable 2.1 is to select principles of random number generators (RNGs) and physical unclonable functions (PUFs) that fulfill strict technology, design and security criteria. For example, the selected RNGs must be suitable for implementation in logic devices according to the German AIS20/31 standard. Correspondingly, the selected PUFs must be suitable for applying similar security approach. A standard PUF evaluation approach does not exist, yet, but it should be proposed in the framework of the project. Selected RNGs and PUFs should be then thoroughly evaluated from the point of view of security and the most suitable principles should be implemented in logic devices, such as Field Programmable Logic Arrays (FPGAs) and Application Specific Integrated Circuits (ASICs) during the next phases of the project
    corecore