156 research outputs found

    Hardware Acceleration of Electronic Design Automation Algorithms

    Get PDF
    With the advances in very large scale integration (VLSI) technology, hardware is going parallel. Software, which was traditionally designed to execute on single core microprocessors, now faces the tough challenge of taking advantage of this parallelism, made available by the scaling of hardware. The work presented in this dissertation studies the acceleration of electronic design automation (EDA) software on several hardware platforms such as custom integrated circuits (ICs), field programmable gate arrays (FPGAs) and graphics processors. This dissertation concentrates on a subset of EDA algorithms which are heavily used in the VLSI design flow, and also have varying degrees of inherent parallelism in them. In particular, Boolean satisfiability, Monte Carlo based statistical static timing analysis, circuit simulation, fault simulation and fault table generation are explored. The architectural and performance tradeoffs of implementing the above applications on these alternative platforms (in comparison to their implementation on a single core microprocessor) are studied. In addition, this dissertation also presents an automated approach to accelerate uniprocessor code using a graphics processing unit (GPU). The key idea is to partition the software application into kernels in an automated fashion, such that multiple instances of these kernels, when executed in parallel on the GPU, can maximally benefit from the GPU?s hardware resources. The work presented in this dissertation demonstrates that several EDA algorithms can be successfully rearchitected to maximally harness their performance on alternative platforms such as custom designed ICs, FPGAs and graphic processors, and obtain speedups upto 800X. The approaches in this dissertation collectively aim to contribute towards enabling the computer aided design (CAD) community to accelerate EDA algorithms on arbitrary hardware platforms

    Methoden und Beschreibungssprachen zur Modellierung und Verifikation vonSchaltungen und Systemen: MBMV 2015 - Tagungsband, Chemnitz, 03. - 04. März 2015

    Get PDF
    Der Workshop Methoden und Beschreibungssprachen zur Modellierung und Verifikation von Schaltungen und Systemen (MBMV 2015) findet nun schon zum 18. mal statt. Ausrichter sind in diesem Jahr die Professur Schaltkreis- und Systementwurf der Technischen Universität Chemnitz und das Steinbeis-Forschungszentrum Systementwurf und Test. Der Workshop hat es sich zum Ziel gesetzt, neueste Trends, Ergebnisse und aktuelle Probleme auf dem Gebiet der Methoden zur Modellierung und Verifikation sowie der Beschreibungssprachen digitaler, analoger und Mixed-Signal-Schaltungen zu diskutieren. Er soll somit ein Forum zum Ideenaustausch sein. Weiterhin bietet der Workshop eine Plattform für den Austausch zwischen Forschung und Industrie sowie zur Pflege bestehender und zur Knüpfung neuer Kontakte. Jungen Wissenschaftlern erlaubt er, ihre Ideen und Ansätze einem breiten Publikum aus Wissenschaft und Wirtschaft zu präsentieren und im Rahmen der Veranstaltung auch fundiert zu diskutieren. Sein langjähriges Bestehen hat ihn zu einer festen Größe in vielen Veranstaltungskalendern gemacht. Traditionell sind auch die Treffen der ITGFachgruppen an den Workshop angegliedert. In diesem Jahr nutzen zwei im Rahmen der InnoProfile-Transfer-Initiative durch das Bundesministerium für Bildung und Forschung geförderte Projekte den Workshop, um in zwei eigenen Tracks ihre Forschungsergebnisse einem breiten Publikum zu präsentieren. Vertreter der Projekte Generische Plattform für Systemzuverlässigkeit und Verifikation (GPZV) und GINKO - Generische Infrastruktur zur nahtlosen energetischen Kopplung von Elektrofahrzeugen stellen Teile ihrer gegenwärtigen Arbeiten vor. Dies bereichert denWorkshop durch zusätzliche Themenschwerpunkte und bietet eine wertvolle Ergänzung zu den Beiträgen der Autoren. [... aus dem Vorwort

    A SAT Based Test Generation Method for Delay Fault Testing of Macro Based Circuits

    Full text link

    Synthesis of FPGA-based accelerators implementing recursive algorithms

    Get PDF
    Doutoramento em Engenharia InformáticaO desenvolvimento de sistemas computacionais é um processo complexo, com múltiplas etapas, que requer uma análise profunda do problema, levando em consideração as limitações e os requisitos aplicáveis. Tal tarefa envolve a exploração de técnicas alternativas e de algoritmos computacionais para optimizar o sistema e satisfazer os requisitos estabelecidos. Neste contexto, uma das mais importantes etapas é a análise e implementação de algoritmos computacionais. Enormes avanços tecnológicos no âmbito das FPGAs (Field-Programmable Gate Arrays) tornaram possível o desenvolvimento de sistemas de engenharia extremamente complexos. Contudo, o número de transístores disponíveis por chip está a crescer mais rapidamente do que a capacidade que temos para desenvolver sistemas que tirem proveito desse crescimento. Esta limitação já bem conhecida, antes de se revelar com FPGAs, já se verificava com ASICs (Application-Specific Integrated Circuits) e tem vindo a aumentar continuamente. O desenvolvimento de sistemas com base em FPGAs de alta capacidade envolve uma grande variedade de ferramentas, incluindo métodos para a implementação eficiente de algoritmos computacionais. Esta tese pretende proporcionar uma contribuição nesta área, tirando partido da reutilização, do aumento do nível de abstracção e de especificações algorítmicas mais automatizadas e claras. Mais especificamente, é apresentado um estudo que foi levado a cabo no sentido de obter critérios relativos à implementação em hardware de algoritmos recursivos versus iterativos. Depois de serem apresentadas algumas das estratégias para implementar recursividade em hardware mais significativas, descreve-se, em pormenor, um conjunto de algoritmos para resolver problemas de pesquisa combinatória (considerados enquanto exemplos de aplicação). Versões recursivas e iterativas destes algoritmos foram implementados e testados em FPGA. Com base nos resultados obtidos, é feita uma cuidada análise comparativa. Novas ferramentas e técnicas de investigação que foram desenvolvidas no âmbito desta tese são também discutidas e demonstradas.Design of computational systems is a complex multistage process which requires a deep analysis of the problem, taking into account relevant limitations and constraints as well as software/hardware co-design. Such task involves exploring competitive techniques and computational algorithms, enabling the system to be optimized while satisfying given requirements. In this context, one of the most important stages is analysis and implementation of computational algorithms. Tremendous progress in the scope of FPGA (Field-Programmable Gate Array) technology has made it possible to design very complicated engineering systems. However, the number of available transistors grows faster than the ability to meaningfully design with them. This situation is a well known design productivity gap, which was inherited by FPGA from ASIC (Application-Specific Integrated Circuit) and which is increasing continuously. Developing engineering systems on the basis of high capacity FPGAs involves a wide variety of design tools, including methods for efficient implementation of computational algorithms. The thesis is intended to provide a contribution in this area by aiming at reuse, high level abstraction, automation, and clearness of algorithmic specifications. More specifically, it presents research studies which have been carried out in order to obtain criteria regarding implementation of recursive vs. iterative algorithms in hardware. After describing some of the most relevant strategies for implementing recursion in hardware, a selection of algorithms for solving combinatorial search problems (considered as application examples) are also described in detail. Iterative and recursive versions of these algorithms have been implemented and tested in FPGA. Taking into consideration the results obtained, a careful comparative analysis is given. New research-oriented tools and techniques for hardware design which have been developed in the scope of this thesis are also discussed and demonstrated

    Study of Fine-Grained, Irregular Parallel Applications on a Many-Core Processor

    Get PDF
    This dissertation demonstrates the possibility of obtaining strong speedups for a variety of parallel applications versus the best serial and parallel implementations on commodity platforms. These results were obtained using the PRAM-inspired Explicit Multi-Threading (XMT) many-core computing platform, which is designed to efficiently support execution of both serial and parallel code and switching between the two. Biconnectivity: For finding the biconnected components of a graph, we demonstrate speedups of 9x to 33x on XMT relative to the best serial algorithm using a relatively modest silicon budget. Further evidence suggests that speedups of 21x to 48x are possible. For graph connectivity, we demonstrate that XMT outperforms two contemporary NVIDIA GPUs of similar or greater silicon area. Prior studies of parallel biconnectivity algorithms achieved at most a 4x speedup, but we could not find biconnectivity code for GPUs to compare biconnectivity against them. Triconnectivity: We present a parallel solution to the problem of determining the triconnected components of an undirected graph. We obtain significant speedups on XMT over the only published optimal (linear-time) serial implementation of a triconnected components algorithm running on a modern CPU. To our knowledge, no other parallel implementation of a triconnected components algorithm has been published for any platform. Burrows-Wheeler compression: We present novel work-optimal parallel algorithms for Burrows-Wheeler compression and decompression of strings over a constant alphabet and their empirical evaluation. To validate these theoretical algorithms, we implement them on XMT and show speedups of up to 25x for compression, and 13x for decompression, versus bzip2, the de facto standard implementation of Burrows-Wheeler compression. Fast Fourier transform (FFT): Using FFT as an example, we examine the impact that adoption of some enabling technologies, including silicon photonics, would have on the performance of a many-core architecture. The results show that a single-chip many-core processor could potentially outperform a large high-performance computing cluster. Boosted decision trees: This chapter focuses on the hybrid memory architecture of the XMT computer platform, a key part of which is a flexible all-to-all interconnection network that connects processors to shared memory modules. First, to understand some recent advances in GPU memory architecture and how they relate to this hybrid memory architecture, we use microbenchmarks including list ranking. Then, we contrast the scalability of applications with that of routines. In particular, regardless of the scalability needs of full applications, some routines may involve smaller problem sizes, and in particular smaller levels of parallelism, perhaps even serial. To see how a hybrid memory architecture can benefit such applications, we simulate a computer with such an architecture and demonstrate the potential for a speedup of 3.3X over NVIDIA's most powerful GPU to date for XGBoost, an implementation of boosted decision trees, a timely machine learning approach. Boolean satisfiability (SAT): SAT is an important performance-hungry problem with applications in many problem domains. However, most work on parallelizing SAT solvers has focused on coarse-grained, mostly embarrassing parallelism. Here, we study fine-grained parallelism that can speed up existing sequential SAT solvers. We show the potential for speedups of up to 382X across a variety of problem instances. We hope that these results will stimulate future research

    Precision analysis for hardware acceleration of numerical algorithms

    No full text
    The precision used in an algorithm affects the error and performance of individual computations, the memory usage, and the potential parallelism for a fixed hardware budget. However, when migrating an algorithm onto hardware, the potential improvements that can be obtained by tuning the precision throughout an algorithm to meet a range or error specification are often overlooked; the major reason is that it is hard to choose a number system which can guarantee any such specification can be met. Instead, the problem is mitigated by opting to use IEEE standard double precision arithmetic so as to be ‘no worse’ than a software implementation. However, the flexibility in the number representation is one of the key factors that can be exploited on reconfigurable hardware such as FPGAs, and hence ignoring this potential significantly limits the performance achievable. In order to optimise the performance of hardware reliably, we require a method that can tractably calculate tight bounds for the error or range of any variable within an algorithm, but currently only a handful of methods to calculate such bounds exist, and these either sacrifice tightness or tractability, whilst simulation-based methods cannot guarantee the given error estimate. This thesis presents a new method to calculate these bounds, taking into account both input ranges and finite precision effects, which we show to be, in general, tighter in comparison to existing methods; this in turn can be used to tune the hardware to the algorithm specifications. We demonstrate the use of this software to optimise hardware for various algorithms to accelerate the solution of a system of linear equations, which forms the basis of many problems in engineering and science, and show that significant performance gains can be obtained by using this new approach in conjunction with more traditional hardware optimisations

    Hardware synthesis from high-level scenario specifications

    Get PDF
    PhD ThesisThe behaviour of many systems can be partitioned into scenarios. These facilitate engineers’ understanding of the specifications, and can be composed into efficient implementations via a form of high-level synthesis. In this work, we focus on highly concurrent systems, whose scenarios are typically described using concurrency models such as partial orders, Petri nets and data-flow structures. In this thesis, we study different aspects of hardware synthesis from high-level scenario specifications. We propose new formal models to simplify the specification of concurrent systems, and algorithms for hardware synthesis and verification of the scenario-based models of such systems. We also propose solutions for mapping scenariobased systems on silicon and evaluate their efficiency. Our experiments show that the proposed approaches improve the design of concurrent systems. The new formalisms can break down complex specifications into significantly simpler scenarios automatically, and can be used to fully model the dataflow of operations of reconfigurable event-driven systems. The proposed heuristics for mapping the scenarios of a system to a digital circuit supports encoding constraints, unlike existing methods, and can cope with specifications comprising hundreds of scenarios at the cost of only 5% of area overhead compared to exact algorithms. These experiments are driven by three case studies: (1) hardware synthesis of control architectures, e.g. microprocessor control units; (2) acceleration of the ordinal pattern encoding, i.e. an algorithm for detecting repetitive patterns within data streams; (3) and acceleration of computational drug discovery, i.e. computation of shortest paths in large protein-interaction networks. Our findings are employed to design two prototypes, which have a practical value for the considered case studies. The ordinal pattern encoding accelerator is asynchronous, highly resilient to unstable voltage supply, and designed to perform a range of computations via runtime reconfiguration. The drug discovery accelerator is synchronous, and up to three orders of magnitude faster than conventional software implementations

    A Survey of Recent Developments in Testability, Safety and Security of RISC-V Processors

    Get PDF
    With the continued success of the open RISC-V architecture, practical deployment of RISC-V processors necessitates an in-depth consideration of their testability, safety and security aspects. This survey provides an overview of recent developments in this quickly-evolving field. We start with discussing the application of state-of-the-art functional and system-level test solutions to RISC-V processors. Then, we discuss the use of RISC-V processors for safety-related applications; to this end, we outline the essential techniques necessary to obtain safety both in the functional and in the timing domain and review recent processor designs with safety features. Finally, we survey the different aspects of security with respect to RISC-V implementations and discuss the relationship between cryptographic protocols and primitives on the one hand and the RISC-V processor architecture and hardware implementation on the other. We also comment on the role of a RISC-V processor for system security and its resilience against side-channel attacks
    corecore