38 research outputs found

    FPGA Acceleration of Monte Carlo-based Financial Simulation: Design Challenges and Lessons Learnt

    Get PDF
    The simulation of interest rate derivatives is a powerful tool to face the current market fluctuations. However, the complexity of the financial models and the way they are processed require exorbitant computation times, what is in clear conflict with the need of a processing time as short as possible to operate in the financial market. To shorten the computation time of financial derivatives the use of hardware accelerators becomes a must

    A Scalable Framework for Monte Carlo Simulation Using FPGA-based Hardware Accelerators with Application to SPECT Imaging A SCALABLE FRAMEWORK FOR MONTE CARLO SIMULATION USING FPGA-BASED HARDWARE ACCELERATORS WITH APPLICATION TO SPECT IMAGING TITLE: A Scal

    Get PDF
    Abstract As the number of transistors that are integrated onto a silicon die continues to increase, the compute power is becoming a commodity. This has enabled a whole host of new applications that rely on high-throughput computations. Recently, the need for faster and cost-effective applications in form-factor constrained environments has driven an interest in on-chip acceleration of algorithms based on Monte Carlo simula- Processor. Futhermore, we have created a framework for further increasing parallelism by scaling our architecture across multiple compute devices and by extending our original design to a multi-FPGA system nearly linear increase in acceleration with logic resources was achieved. iv Acknowledgements One could hardly put into words the contributions made to this work by the many wonderful people who surround me on a daily basis. I count myself blessed to have family, friends and colleagues that support and encourage me and to recognize each individually would be impossible. Nonetheless, there are some people without whose explicit mention this thesis would be incomplete

    Design Exploration of an FPGA-Based Multivariate Gaussian Random Number Generator

    No full text
    Monte Carlo simulation is one of the most widely used techniques for computationally intensive simulations in a variety of applications including mathematical analysis and modeling and statistical physics. A multivariate Gaussian random number generator (MVGRNG) is one of the main building blocks of such a system. Field Programmable Gate Arrays (FPGAs) are gaining increased popularity as an alternative means to the traditional general purpose processors targeting the acceleration of the computationally expensive random number generator block due to their fine grain parallelism and reconfigurability properties and lower power consumption. As well as the ability to achieve hardware designs with high throughput it is also desirable to produce designs with the flexibility to control the resource usage in order to meet given resource constraints. This work proposes a novel approach for mapping a MVGRNG onto an FPGA by optimizing the computational path in terms of hardware resource usage subject to an acceptable error in the approximation of the distribution of interest. An analysis on the impact of the error due to truncation/rounding operation along the computational path is performed and an analytical expression of the error inserted into the system is presented. Extra dimensionality is added to the feature of the proposed algorithm by introducing a novel methodology to map many multivariate Gaussian random number generators onto a single FPGA. The effective resource sharing techniques introduced in this thesis allows further reduction in hardware resource usage. The use of MVGNRG can be found in a wide range of application, especially in financial applications which involve many correlated assets. In this work it is demonstrated that the choice of the objective function employed for the hardware optimization of the MVRNG core has a considerable impact on the final performance of the application of interest. Two of the most important financial applications, Value-at-Risk estimation and option pricing are considered in this work

    MULTI-OBJECTIVE DESIGN AUTOMATION FOR RECONFIGURABLE MULTI-PROCESSOR SYSTEMS

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Approximate logic synthesis: a survey

    Get PDF
    Approximate computing is an emerging paradigm that, by relaxing the requirement for full accuracy, offers benefits in terms of design area and power consumption. This paradigm is particularly attractive in applications where the underlying computation has inherent resilience to small errors. Such applications are abundant in many domains, including machine learning, computer vision, and signal processing. In circuit design, a major challenge is the capability to synthesize the approximate circuits automatically without manually relying on the expertise of designers. In this work, we review methods devised to synthesize approximate circuits, given their exact functionality and an approximability threshold. We summarize strategies for evaluating the error that circuit simplification can induce on the output, which guides synthesis techniques in choosing the circuit transformations that lead to the largest benefit for a given amount of induced error. We then review circuit simplification methods that operate at the gate or Boolean level, including those that leverage classical Boolean synthesis techniques to realize the approximations. We also summarize strategies that take high-level descriptions, such as C or behavioral Verilog, and synthesize approximate circuits from these descriptions

    Performance Optimization Strategies for Transactional Memory Applications

    Get PDF
    This thesis presents tools for Transactional Memory (TM) applications that cover multiple TM systems (Software, Hardware, and hybrid TM) and use information of all different layers of the TM software stack. Therefore, this thesis addresses a number of challenges to extract static information, information about the run time behavior, and expert-level knowledge to develop these new methods and strategies for the optimization of TM applications

    Methodology for complex dataflow application development

    Get PDF
    This thesis addresses problems inherent to the development of complex applications for reconfig- urable systems. Many projects fail to complete or take much longer than originally estimated by relying on traditional iterative software development processes typically used with conventional computers. Even though designer productivity can be increased by abstract programming and execution models, e.g., dataflow, development methodologies considering the specific properties of reconfigurable systems do not exist. The first contribution of this thesis is a design methodology to facilitate systematic develop- ment of complex applications using reconfigurable hardware in the context of High-Performance Computing (HPC). The proposed methodology is built upon a careful analysis of the original application, a software model of the intended hardware system, an analytical prediction of performance and on-chip area usage, and an iterative architectural refinement to resolve identi- fied bottlenecks before writing a single line of code targeting the reconfigurable hardware. It is successfully validated using two real applications and both achieve state-of-the-art performance. The second contribution extends this methodology to provide portability between devices in two steps. First, additional tool support for contemporary multi-die Field-Programmable Gate Arrays (FPGAs) is developed. An algorithm to automatically map logical memories to hetero- geneous physical memories with special attention to die boundaries is proposed. As a result, only the proposed algorithm managed to successfully place and route all designs used in the evaluation while the second-best algorithm failed on one third of all large applications. Second, best practices for performance portability between different FPGA devices are collected and evaluated on a financial use case, showing efficient resource usage on five different platforms. The third contribution applies the extended methodology to a real, highly demanding emerging application from the radiotherapy domain. A Monte-Carlo based simulation of dose accumu- lation in human tissue is accelerated using the proposed methodology to meet the real time requirements of adaptive radiotherapy.Open Acces

    Hardware Acceleration of Electronic Design Automation Algorithms

    Get PDF
    With the advances in very large scale integration (VLSI) technology, hardware is going parallel. Software, which was traditionally designed to execute on single core microprocessors, now faces the tough challenge of taking advantage of this parallelism, made available by the scaling of hardware. The work presented in this dissertation studies the acceleration of electronic design automation (EDA) software on several hardware platforms such as custom integrated circuits (ICs), field programmable gate arrays (FPGAs) and graphics processors. This dissertation concentrates on a subset of EDA algorithms which are heavily used in the VLSI design flow, and also have varying degrees of inherent parallelism in them. In particular, Boolean satisfiability, Monte Carlo based statistical static timing analysis, circuit simulation, fault simulation and fault table generation are explored. The architectural and performance tradeoffs of implementing the above applications on these alternative platforms (in comparison to their implementation on a single core microprocessor) are studied. In addition, this dissertation also presents an automated approach to accelerate uniprocessor code using a graphics processing unit (GPU). The key idea is to partition the software application into kernels in an automated fashion, such that multiple instances of these kernels, when executed in parallel on the GPU, can maximally benefit from the GPU?s hardware resources. The work presented in this dissertation demonstrates that several EDA algorithms can be successfully rearchitected to maximally harness their performance on alternative platforms such as custom designed ICs, FPGAs and graphic processors, and obtain speedups upto 800X. The approaches in this dissertation collectively aim to contribute towards enabling the computer aided design (CAD) community to accelerate EDA algorithms on arbitrary hardware platforms
    corecore