    A Fast and Accurate Cost Model for FPGA Design Space Exploration in HPC Applications

    Heterogeneous High-Performance Computing (HPC) platforms present a significant programming challenge, especially because the key users of HPC resources are scientists, not parallel programmers. We contend that compiler technology has to evolve to automatically create the best program variant by transforming a given original program. We have developed a novel methodology based on type transformations for generating correct-by-construction design variants, and an associated light-weight cost model for evaluating these variants for implementation on FPGAs. In this paper we present a key enabler of our approach, the cost model. We discuss how we are able to quickly derive accurate estimates of performance and resource-utilization from the design’s representation in our intermediate language. We show results confirming the accuracy of our cost model by testing it on three different scientific kernels. We conclude with a case-study that compares a solution generated by our framework with one from a conventional high-level synthesis tool, showing better performance and power-efficiency using our cost model based approach

    MP-STREAM: A Memory Performance Benchmark for Design Space Exploration on Heterogeneous HPC Devices

    Sustained memory throughput is a key determinant of performance in HPC devices. Having an accurate estimate of this parameter is essential for manual or automated design space exploration for any HPC device. While there are benchmarks for measuring the sustained memory bandwidth for CPUs and GPUs, such a benchmark for FPGAs has been missing. We present MP-STREAM, an OpenCL-based synthetic micro-benchmark for measuring sustained memory bandwidth, optimized for FPGAs, but which can be used on multiple platforms. Our main contribution is the introduction of various generic as well as device-specific parameters that can be tuned to measure their effect on memory bandwidth. We present results of running our benchmark on a CPU, a GPU and two FPGA targets, and discuss our observations. The experiments underline the utility of our benchmark for optimizing HPC applications for FPGAs, and provide valuable optimization hints for FPGA programmers

    An Intermediate Language and Estimator for Automated Design Space Exploration on FPGAs

    We present the TyTra-IR, a new intermediate language intended as a compilation target for high-level language compilers and a front-end for HDL code generators. We develop the requirements of this new language based on the design-space of FPGAs that it should be able to express and the estimation-space in which each configuration from the design-space should be mappable in an automated design flow. We use a simple kernel to illustrate multiple configurations using the semantics of TyTra-IR. The key novelty of this work is the cost model for resource-costs and throughput for different configurations of interest for a particular kernel. Through the realistic example of a Successive Over-Relaxation kernel implemented both in TyTra-IR and HDL, we demonstrate both the expressiveness of the IR and the accuracy of our cost model.Comment: Pre-print and extended version of poster paper accepted at international symposium on Highly Efficient Accelerators and Reconfigurable Technologies (HEART2015) Boston, MA, USA, June 1-2, 201

    Type-driven automated program transformations and cost modelling for optimising streaming programs on FPGAs

    In this paper we present a novel approach to program optimisation based on compiler-based type-driven program transformations and a fast and accurate cost/performance model for the target architecture. We target streaming programs for the problem domain of scientific computing, such as numerical weather prediction. We present our theoretical framework for type-driven program transformation, our target high-level language and intermediate representation languages and the cost model and demonstrate the effectiveness of our approach by comparison with a commercial toolchain

    A coarse-grained dynamically reconfigurable MAC processor for power-sensitive multi-standard devices

    DRMP, a Dynamically Reconfigurable MAC Processor, is an innovative, dynamically reconfigurable System-on-Chip architecture. The architecture exploits substantial overlaps in the functionality of different wireless MAC layers. Its flexibility is specialized for addressing the requirements of the MAC layer of wireless standards. It is targeted at consumer, multi-standard, handheld devices, and its design is meant to address the balance of flexibility and power-efficiency that this target market demands. The DRMP reconfigures packet-by-packet on the fly, allowing execution of concurrent protocol modes on a single hardware co-processor. An interrupt-driven programming model has also been presented and shown to implement the protocol state-machine of the three protocols on a CPU. These features will allow the DRMP to replace three MAC processors in a hand-held device. The most innovative component of the DRMP architecture is its Interface and Reconfiguration Controller. It uses a combination of asynchronous controllers to dynamically reconfigure the functional units in the architecture and delegate MAC tasks to them. The architecture has been modeled in Simulink at cycle-approximate abstraction. Results of simulations involving transmission and reception of packets have been presented, showing that the platform concurrently handles three protocol streams, reconfigures dynamically, yet meets and exceeds the protocol timing constraints, all at a moderate frequency. Its heterogeneous and coarse-grained functional units, limited connectivity requirements between these units, and proportionally large time that these resources are idle, promise a very modest power-consumption, suitable for mobile devices, while offering flexibility to implement different MAC protocols

    A Reconfigurable Vector Instruction Processor for Accelerating a Convection Parametrization Model on FPGAs

    High Performance Computing (HPC) platforms allow scientists to model computationally intensive algorithms. HPC clusters increasingly use General-Purpose Graphics Processing Units (GPGPUs) as accelerators; FPGAs provide an attractive alternative to GPGPUs for use as co-processors, but they are still far from being mainstream due to a number of challenges faced when using FPGA-based platforms. Our research aims to make FPGA-based high performance computing more accessible to the scientific community. In this work we present the results of investigating the acceleration of a particular atmospheric model, Flexpart, on FPGAs. We focus on accelerating the most computationally intensive kernel from this model. The key contribution of our work is the architectural exploration we undertook to arrive at a solution that best exploits the parallelism available in the legacy code, and is also convenient to program, so that eventually the compilation of high-level legacy code to our architecture can be fully automated. We present the three different types of architecture, comparing their resource utilization and performance, and propose that an architecture where there are a number of computational cores, each built along the lines of a vector instruction processor, works best in this particular scenario, and is a promising candidate for a generic FPGA-based platform for scientific computation. We also present the results of experiments done with various configuration parameters of the proposed architecture, to show its utility in adapting to a range of scientific applications.Comment: This is an extended pre-print version of work that was presented at the international symposium on Highly Efficient Accelerators and Reconfigurable Technologies (HEART2014), Sendai, Japan, June 911, 201

    Automatic pipelining and vectorization of scientific code for FPGAs

    There is a large body of legacy scientific code in use today that could benefit from execution on accelerator devices like GPUs and FPGAs. Manual translation of such legacy code into device-specific parallel code requires significant manual effort and is a major obstacle to wider FPGA adoption. We are developing an automated optimizing compiler TyTra to overcome this obstacle. The TyTra flow aims to compile legacy Fortran code automatically for FPGA-based acceleration, while applying suitable optimizations. We present the flow with a focus on two key optimizations, automatic pipelining and vectorization. Our compiler frontend extracts patterns from legacy Fortran code that can be pipelined and vectorized. The backend first creates fine and coarse-grained pipelines and then automatically vectorizes both the memory access and the datapath based on a cost model, generating an OpenCL-HDL hybrid working solution for FPGA targets on the Amazon cloud. Our results show up to 4.2× performance improvement over baseline OpenCL code

    Grounded Ontology Methodology – Illustrating the Seed Ontology Creation

    This paper is an extension of a paper that suggested Grounded Ontology (GO) as a new methodology of ontology engineering. It adds an example of application of first two stages of GO Methodology to create an initial (seed) ontology to a summarized discussion from another paper on Grounded Ontology (GO) Methodology. Its efficacy in deriving entities and their relationships directly from the data along with ontologization is illustrated through a step-by-step example. The GO Methodology proposes that ‘a domain ontology developed using text-coding technique contributes in conceptualizing and representing state-of-the-art as given by published research in a particular domain.’ The motivation behind GO Methodology is to make the state-of-the art available to the researchers of a particular domain and help them come to common understanding through an ontology. Ontology developer are given a leading role by the existing ontology engineering methods. This has led to a general observation regarding dominating influence of personal perspective of ontology developer and/or expert on the resultant ontology. However, if coding of data is done such that entities and their relationships are directly obtained from and are closely linked to the text of the published research, the resultant ontology stands a better chance of being unbiased. Therefore, a new methodology (Grounded Ontology - GO) was proposed for deriving an ontology directly from text of published research. Such and ontology will not only help in bringing forth the research already done by other but can also help in highlighting areas where new research efforts are needed

    Sufi Method of Treatment & Physical Illness Healing in Hindu Pak Sufis

    Every aspect of human experience, including health and illness, has a spiritual component. Spirituality is now recognized as one of the key factors influencing health, and it is no longer just the domain of mysticism and religion. Spirituality has become a focus of neuroscience study in recent years, and it appears to have great promise for improving therapeutic therapies as well as our understanding of psychiatric morbidity. Sufism has been a well-known spiritual movement in Islam, drawing inspiration from major world faiths like Christianity and Hinduism and making a significant contribution to the spiritual health of many people both inside and outside the Muslim world.Sufism began in the early days of Islam and had many notable Sufis, but it wasn’t until the mediaeval era that it rose to its greatest height, culminating in a number of Sufi groups and its leading proponents. The Sufism promotes God as the sole source of genuine existence as well as the cause of all existence, and it seeks communication with God through spiritual realization, with the soul serving as the medium for this communion. It might offer a crucial connection for comprehending the origin of religious experience and how it affects mental health. In this connection author has attempted to address the Sufi of 18 century to 19 century, well-known Sufi Sain baba RA was benefited by haji Ali shah Buskhari
