138 research outputs found

    High-Level Annotation of Routing Congestion for Xilinx Vivado HLS Designs

    Get PDF
    Ever since transistor cost stopped decreasing, customized programmable platforms, such as field-programmable gate arrays (FPGAs), became a major way to improve software execution performance and energy consumption. While software developers can use high-level synthesis (HLS) to speed up register-transfer level (RTL) code generation from C++ or OpenCL source code, placement and routing issues, such as congestion, can still prevent achieving an FPGA programming bitstream or dramatically reduce the FPGA implementation performance. Congestion reports from physical design tools refer to thousands of RTL signal names instead of developer-accessible identifiers and statements, considerably complicating the developer understanding and resolution of the issues at the source level. We propose a high-level back-annotation flow that summarizes the routing congestion issues at the source level by analyzing the reports from the FPGA physical design tools and the internal debugging files of the HLS tools. Our flow describes congestion using comments back-annotated on the source code and identifies if the congestion causes are the on-chip memories or the DSP units (multipliers/adders), which are the shared resources very often associated with routing problems on FPGAs. We demonstrate on realistic large designs how the information provided by our flow helps to quickly spot congestion causes at the source level and to solve them using appropriate HLS directives

    Hybrid FPGA: Architecture and Interface

    No full text
    Hybrid FPGAs (Field Programmable Gate Arrays) are composed of general-purpose logic resources with different granularities, together with domain-specific coarse-grained units. This thesis proposes a novel hybrid FPGA architecture with embedded coarse-grained Floating Point Units (FPUs) to improve the floating point capability of FPGAs. Based on the proposed hybrid FPGA architecture, we examine three aspects to optimise the speed and area for domain-specific applications. First, we examine the interface between large coarse-grained embedded blocks (EBs) and fine-grained elements in hybrid FPGAs. The interface includes parameters for varying: (1) aspect ratio of EBs, (2) position of the EBs in the FPGA, (3) I/O pins arrangement of EBs, (4) interconnect flexibility of EBs, and (5) location of additional embedded elements such as memory. Second, we examine the interconnect structure for hybrid FPGAs. We investigate how large and highdensity EBs affect the routing demand for hybrid FPGAs over a set of domain-specific applications. We then propose three routing optimisation methods to meet the additional routing demand introduced by large EBs: (1) identifying the best separation distance between EBs, (2) adding routing switches on EBs to increase routing flexibility, and (3) introducing wider channel width near the edge of EBs. We study and compare the trade-offs in delay, area and routability of these three optimisation methods. Finally, we employ common subgraph extraction to determine the number of floating point adders/subtractors, multipliers and wordblocks in the FPUs. The wordblocks include registers and can implement fixed point operations. We study the area, speed and utilisation trade-offs of the selected FPU subgraphs in a set of floating point benchmark circuits. We develop an optimised coarse-grained FPU, taking into account both architectural and system-level issues. Furthermore, we investigate the trade-offs between granularities and performance by composing small FPUs into a large FPU. The results of this thesis would help design a domain-specific hybrid FPGA to meet user requirements, by optimising for speed, area or a combination of speed and area

    Time-power-energy balance of BLAS kernels in modern FPGAs

    Get PDF
    Conference proceedings 2022High Performance Computing. 9th Latin American Conference, CARLA 2022, Porto Alegre, Brazil, 26-30 sep 2022, Revised Selected Papers.Numerical Linear Algebra (NLA) is a research field that in the last decades has been characterized by the use of kernel libraries that are de facto standards. One of the most remarkable examples, in particular in the HPC field, is the Basic Linear Algebra Subroutines (BLAS). Most BLAS operations are fundamental in multiple scientific algorithms because they generally constitute the most computationally expensive stage. For this reason, numerous efforts have been made to optimize such operations on various hardware platforms. There is a growing concern in the high-performance computing world about power consumption, making energy efficiency an extremely important quality when evaluating hardware platforms. Due to their greater energy efficiency, Field-Programmable Gate Arrays (FPGAs) are available today as an interesting alternative to other hardware platforms for the acceleration of this type of operation. Our study focuses on the evaluation of FPGAs to address dense NLA operations. Specifically, in this work we explore and evaluate the available options for two of the most representative kernels of BLAS, i.e. GEMV and GEMM. The experimental evaluation is carried out in an Alveo U50 accelerator card from Xilinx and an Intel Xeon Silver multicore CPU. Our findings show that even in kernels where the CPU reaches better runtimes, the FPGA counterpart is more energy efficient.Los investigadores contaron con el apoyo de la Universidad de la República y el PEDECIBA.Se agradece a la ANII – MPG Independent Research Groups : “Efficient Hetergenous Computing” - CSC grou

    Lexicographic path searches for FPGA routing

    Full text link
    This dissertation reports on studies of the application of lexicographic graph searches to solve problems in FPGA detailed routing. Our contributions include the derivation of iteration limits for scalar implementations of negotiation congestion for standard floating point types and the identification of pathological cases for path choice. In the study of the routability-driven detailed FPGA routing problem, we show universal detailed routability is NP-complete based on a related proof by Lee and Wong. We describe the design of a lexicographic composition operator of totally-ordered monoids as path cost metrics and show its optimality under an adapted A* search. Our new router, CornNC, based on lexicographic composition of congestion and wirelength, established a new minimum track count for the FPGA Place and Route Challenge. For the problem of long-path timing-driven FPGA detailed routing, we show that long-path budgeted detailed routability is NP-complete by reduction to universal detailed routability. We generalise the lexicographic composition to any finite length and verify its optimality under A* search. The application of the timing budget solution of Ghiasi et al. is used to solve the long-path timing budget problem for FPGA connections. Our delay-clamped spiral lexicographic composition design, SpiralRoute, ensures connection based budgets are always met, thus achieves timing closure when it successfully routes. For 113 test routing instances derived from standard benchmarks, SpiralRoute found 13 routable instances with timing closure that were unroutable by a scalar negotiated congestion router and achieved timing closure in another 27 cases when the scalar router did not, at the expense of increased runtime. We also study techniques to improve SpiralRoute runtimes, including a data structure of a trie augmented by data stacks for minimum element retrieval, and the technique of step tomonoid elimination in reducing the retrieval depth in a trie of stacks structure

    Fault and Defect Tolerant Computer Architectures: Reliable Computing With Unreliable Devices

    Get PDF
    This research addresses design of a reliable computer from unreliable device technologies. A system architecture is developed for a fault and defect tolerant (FDT) computer. Trade-offs between different techniques are studied and yield and hardware cost models are developed. Fault and defect tolerant designs are created for the processor and the cache memory. Simulation results for the content-addressable memory (CAM)-based cache show 90% yield with device failure probabilities of 3 x 10(-6), three orders of magnitude better than non fault tolerant caches of the same size. The entire processor achieves 70% yield with device failure probabilities exceeding 10(-6). The required hardware redundancy is approximately 15 times that of a non-fault tolerant design. While larger than current FT designs, this architecture allows the use of devices much more likely to fail than silicon CMOS. As part of model development, an improved model is derived for NAND Multiplexing. The model is the first accurate model for small and medium amounts of redundancy. Previous models are extended to account for dependence between the inputs and produce more accurate results

    FieldPlacer - A flexible, fast and unconstrained force-directed placement method for heterogeneous reconfigurable logic architectures

    Get PDF
    The field of placement methods for components of integrated circuits, especially in the domain of reconfigurable chip architectures, is mainly dominated by a handful of concepts. While some of these are easy to apply but difficult to adapt to new situations, others are more flexible but rather complex to realize. This work presents the FieldPlacer framework, a flexible, fast and unconstrained force-directed placement method for heterogeneous reconfigurable logic architectures, in particular for the ever important heterogeneous FPGAs. In contrast to many other force-directed placers, this approach is called ‘unconstrained’ as it does not require a priori fixed logic elements in order to calculate a force equilibrium as the solution to a system of equations. Instead, it is based on a free spring embedder simulation of a graph representation which includes all logic block types of a design simultaneously. The FieldPlacer framework offers a huge amount of flexibility in applying different distance norms (e. g., the Manhattan distance) for the force-directed layout and aims at creating adapted layouts for various objective functions, e. g., highest performance or improved routability. Depending on the individual situation, a runtime-quality trade-off can be considered to either produce a decent placement in a very short time or to generate an exceptionally good placement, which takes longer. An extensive comparison with the latest simulated annealing placement method from the well-known Versatile Place and Route (VPR) framework shows that the FieldPlacer approach can create placements of comparable quality much faster than VPR or, alternatively, generate better placements in the same time. The flexibility in defining arbitrary objective functions and the intuitive adaptability of the method, which, among others, includes different concepts from the field of graph drawing, should facilitate further developments with this framework, e. g., for new upcoming optimization targets like the energy consumption of an implemented design

    FPGA-Design unter besonderer Berücksichtigung von Multiplexer-Struktur und Verdrahtbarkeit

    Get PDF
    Veröffentlichung des Wilhelm-Schickhard-Institut Universität Tübinge

    Instrumentation of CdZnTe detectors for measuring prompt gamma-rays emitted during particle therapy

    Get PDF
    Background: The irradiation of cancer patients with charged particles, mainly protons and carbon ions, has become an established method for the treatment of specific types of tumors. In comparison with the use of X-rays or gamma-rays, particle therapy has the advantage that the dose distribution in the patient can be precisely controlled. Tissue or organs lying near the tumor will be spared. A verification of the treatment plan with the actual dose deposition by means of a measurement can be done through range assessment of the particle beam. For this purpose, prompt gamma-rays are detected, which are emitted by the affected target volume during irradiation. Motivation: The detection of prompt gamma-rays is a task related to radiation detection and measurement. Nuclear applications in medicine can be found in particular for in vivo diagnosis. In that respect the spatially resolved measurement of gamma-rays is an essential technique for nuclear imaging, however, technical requirements of radiation measurement during particle therapy are much more challenging than those of classical applications. For this purpose, appropriate instruments beyond the state-of-the-art need to be developed and tested for detecting prompt gamma-rays. Hence the success of a method for range assessment of particle beams is largely determined by the implementation of electronics. In practice, this means that a suitable detector material with adapted readout electronics, signal and information processing, and data interface must be utilized to solve the challenges. Thus, the parameters of the system (e.g. segmentation, time or energy resolution) can be optimized depending on the method (e.g. slit camera, time-of-flight measurement or Compton camera). Regardless of the method, the detector system must have a high count rate capability and a large measuring range (>7 MeV). For a subsequent evaluation of a suitable method for imaging, the mentioned parameters may not be restricted by the electronics. Digital signal processing is predestined for multipurpose tasks, and, in terms of the demands made, the performance of such an implementation has to be determined. Materials and methods: In this study, the instrumentation of a detector system for prompt gamma-rays emitted during particle therapy is limited to the use of a cadmium zinc telluride (CdZnTe, CZT) semiconductor detector. The detector crystal is divided into an 8x8 pixel array by segmented electrodes. Analog and digital signal processing are exemplarily tested with this type of detector and aims for application of a Compton camera to range assessment. The electronics are implemented with commercial off-the-shelf (COTS) components. If applicable, functional units of the detector system were digitalized and implemented in a field-programmable gate array (FPGA). An efficient implementation of the algorithms in terms of timing and logic utilization is fundamental to the design of digital circuits. The measurement system is characterized with radioactive sources to determine the measurement dynamic range and resolution. Finally, the performance is examined in terms of the requirements of particle therapy with experiments at particle accelerators. Results: A detector system based on a CZT pixel detector has been developed and tested. Although the use of an application-specific integrated circuit is convenient, this approach was rejected because there was no circuit available which met the requirements. Instead, a multichannel, compact, and low-noise analog amplifier circuit with COTS components has been implemented. Finally, the 65 information channels of a detector are digitized, processed and visualized. An advanced digital signal processing transforms the traditional approaches of nuclear electronics in algorithms and digital filter structures for an FPGA. With regard to the characteristic signals (e.g. varying rise times, depth-dependent energy measurement) of a CZT pixel detector, it could be shown that digital pulse processing results in a very good energy resolution (~2% FWHM at 511 keV), as well as permits a time measurement in the range of some tens of nanoseconds. Furthermore, the experimental results have shown that the dynamic range of the detector system could be significantly improved compared to the existing prototype of the Compton camera (~10 keV..7 MeV). Even count rates of ~100 kcps in a high-energy beam could be ultimately processed with the CZT pixel detector. But this is merely a limit of the detector due to its volume, and not related to electronics. In addition, the versatility of digital signal processing has been demonstrated with other detector materials (e.g. CeBr3). With foresight on high data throughput in a distributed data acquisition from multiple detectors, a Gigabit Ethernet link has been implemented as data interface. Conclusions: To fully exploit the capabilities of a CZT pixel detector, a digital signal processing is absolutely necessary. A decisive advantage of the digital approach is the ease of use in a multichannel system. Thus with digitalization, a necessary step has been done to master the complexity of a Compton camera. Furthermore, the benchmark of technology shows that a CZT pixel detector withstands the requirements of measuring prompt gamma-rays during particle therapy. The previously used orthogonal strip detector must be replaced by the pixel detector in favor of increased efficiency and improved energy resolution. With the integration of the developed digital detector system into a Compton camera, it must be ultimately proven whether this method is applicable for range assessment in particle therapy. Even if another method is more convenient in a clinical environment due to practical considerations, the detector system of that method may benefit from the shown instrumentation of a digital signal processing system for nuclear applications.:1. Introduction 1.1. Aim of this work 2. Analog front-end electronics 2.1. State-of-the-art 2.2. Basic design considerations 2.2.1. CZT detector assembly 2.2.2. Electrical characteristics of a CZT pixel detector 2.2.3. High voltage biasing and grounding 2.2.4. Signal formation in CZT detectors 2.2.5. Readout concepts 2.2.6. Operational amplifier 2.3. Circuit design of a charge-sensitive amplifier 2.3.1. Circuit analysis 2.3.2. Charge-to-voltage transfer function 2.3.3. Input coupling of the CSA 2.3.4. Noise 2.4. Implementation and Test 2.5. Results 2.5.1. Test pulse input 2.5.2. Pixel detector 2.6. Conclusion 3. Digital signal processing 3.1. Unfolding-synthesis technique 3.2. Digital deconvolution 3.2.1. Prior work 3.2.2. Discrete-time inverse amplifier transfer function 3.2.3. Application to measured signals 3.2.4. Implementation of a higher order IIR filter 3.2.5. Conclusion 3.3. Digital pulse synthesis 3.3.1. Prior work 3.3.2. FIR filter structures for FPGAs 3.3.3. Optimized fixed-point arithmetic 3.3.4. Conclusion 4. Data interface 4.1. State-of-the-art 4.2. Embedded Gigabit Ethernet protocol stack 4.3. Implementation 4.3.1. System overview 4.3.2. Media Access Control 4.3.3. Embedded protocol stack 4.3.4. Clock synchronization 4.4. Measurements and results 4.4.1. Throughput performance 4.4.2. Synchronization 4.4.3. Resource utilization 4.5. Conclusion 5. Experimental results 5.1. Digital pulse shapers 5.1.1. Spectroscopy application 5.1.2. Timing applications 5.2. Gamma-ray spectroscopy 5.2.1. Energy resolution of scintillation detectors 5.2.2. Energy resolution of a CZT pixel detector 5.3. Gamma-ray timing 5.3.1. Timing performance of scintillation detectors 5.3.2. Timing performance of CZT pixel detectors 5.4. Measurements with a particle beam 5.4.1. Bremsstrahlung Facility at ELBE 6. Discussion 7. Summary 8. ZusammenfassungHintergrund: Die Bestrahlung von Krebspatienten mit geladenen Teilchen, vor allem Protonen oder Kohlenstoffionen, ist mittlerweile eine etablierte Methode zur Behandlung von speziellen Tumorarten. Im Vergleich mit der Anwendung von Röntgen- oder Gammastrahlen hat die Teilchentherapie den Vorteil, dass die Dosisverteilung im Patienten präziser gesteuert werden kann. Dadurch werden um den Tumor liegendes Gewebe oder Organe geschont. Die messtechnische Verifikation des Bestrahlungsplans mit der tatsächlichen Dosisdeposition kann über eine Reichweitenkontrolle des Teilchenstrahls erfolgen. Für diesen Zweck werden prompte Gammastrahlen detektiert, die während der Bestrahlung vom getroffenen Zielvolumen emittiert werden. Fragestellung: Die Detektion von prompten Gammastrahlen ist eine Aufgabenstellung der Strahlenmesstechnik. Strahlenanwendungen in der Medizintechnik finden sich insbesondere in der in-vivo Diagnostik. Dabei ist die räumlich aufgelöste Messung von Gammastrahlen bereits zentraler Bestandteil der nuklearmedizinischen Bildgebung, jedoch sind die technischen Anforderungen der Strahlendetektion während der Teilchentherapie im Vergleich mit klassischen Anwendungen weitaus anspruchsvoller. Über den Stand der Technik hinaus müssen für diesen Zweck geeignete Instrumente zur Erfassung der prompten Gammastrahlen entwickelt und erprobt werden. Die elektrotechnische Realisierung bestimmt maßgeblich den Erfolg eines Verfahrens zur Reichweitenkontrolle von Teilchenstrahlen. Konkret bedeutet dies, dass ein geeignetes Detektormaterial mit angepasster Ausleseelektronik, Signal- und Informationsverarbeitung sowie Datenschnittstelle zur Problemlösung eingesetzt werden muss. Damit können die Parameter des Systems (z. B. Segmentierung, Zeit- oder Energieauflösung) in Abhängigkeit der Methode (z.B. Schlitzkamera, Flugzeitmessung oder Compton-Kamera) optimiert werden. Unabhängig vom Verfahren muss das Detektorsystem eine hohe Ratenfestigkeit und einen großen Messbereich (>7 MeV) besitzen. Für die anschließende Evaluierung eines geeigneten Verfahrens zur Bildgebung dürfen die genannten Parameter durch die Elektronik nicht eingeschränkt werden. Eine digitale Signalverarbeitung ist für universelle Aufgaben prädestiniert und die Leistungsfähigkeit einer solchen Implementierung soll hinsichtlich der gestellten Anforderungen bestimmt werden. Material und Methode: Die Instrumentierung eines Detektorsystems für prompte Gammastrahlen beschränkt sich in dieser Arbeit auf die Anwendung eines Cadmiumzinktellurid (CdZnTe, CZT) Halbleiterdetektors. Der Detektorkristall ist durch segmentierte Elektroden in ein 8x8 Pixelarray geteilt. Die analoge und digitale Signalverarbeitung wird beispielhaft mit diesem Detektortyp erprobt und zielt auf die Anwendung zur Reichweitenkontrolle mit einer Compton-Kamera. Die Elektronik wird mit seriengefertigten integrierten Schaltkreisen umgesetzt. Soweit möglich, werden die Funktionseinheiten des Detektorsystems digitalisiert und in einem field-programmable gate array (FPGA) implementiert. Eine effiziente Umsetzung der Algorithmen in Bezug auf Zeitverhalten und Logikverbrauch ist grundlegend für den Entwurf der digitalen Schaltungen. Das Messsystem wird mit radioaktiven Prüfstrahlern hinsichtlich Messbereichsdynamik und Auflösung charakterisiert. Schließlich wird die Leistungsfähigkeit hinsichtlich der Anforderungen der Teilchentherapie mit Experimenten am Teilchenbeschleuniger untersucht. Ergebnisse: Es wurde ein Detektorsystem auf Basis von CZT Pixeldetektoren entwickelt und erprobt. Obwohl der Einsatz einer anwendungsspezifischen integrierten Schaltung zweckmäßig wäre, wurde dieser Ansatz zurückgewiesen, da kein verfügbarer Schaltkreis die Anforderungen erfüllte. Stattdessen wurde eine vielkanalige, kompakte und rauscharme analoge Verstärkerschaltung mit seriengefertigten integrierten Schaltkreisen aufgebaut. Letztendlich werden die 65 Informationskanäle eines Detektors digitalisiert, verarbeitet und visualisiert. Eine fortschrittliche digitale Signalverarbeitung überführt die traditionellen Ansätze der Nuklearelektronik in Algorithmen und digitale Filterstrukturen für einen FPGA. Es konnte gezeigt werden, dass die digitale Pulsverarbeitung in Bezug auf die charakteristischen Signale (u.a. variierende Anstiegszeiten, tiefenabhängige Energiemessung) eines CZT Pixeldetektors eine sehr gute Energieauflösung (~2% FWHM at 511 keV) sowie eine Zeitmessung im Bereich von einigen 10 ns ermöglicht. Weiterhin haben die experimentellen Ergebnisse gezeigt, dass der Dynamikbereich des Detektorsystems im Vergleich zum bestehenden Prototyp der Compton-Kamera deutlich verbessert werden konnte (~10 keV..7 MeV). Nach allem konnten auch Zählraten von >100 kcps in einem hochenergetischen Strahl mit dem CZT Pixeldetektor verarbeitet werden. Dies stellt aber lediglich eine Begrenzung des Detektors aufgrund seines Volumens, nicht jedoch der Elektronik, dar. Zudem wurde die Vielseitigkeit der digitalen Signalverarbeitung auch mit anderen Detektormaterialen (u.a. CeBr3) demonstriert. Mit Voraussicht auf einen hohen Datendurchsatz in einer verteilten Datenerfassung von mehreren Detektoren, wurde als Datenschnittstelle eine Gigabit Ethernet Verbindung implementiert. Schlussfolgerung: Um die Leistungsfähigkeit eines CZT Pixeldetektors vollständig auszunutzen, ist eine digitale Signalverarbeitung zwingend notwendig. Ein entscheidender Vorteil des digitalen Ansatzes ist die einfache Handhabbarkeit in einem vielkanaligen System. Mit der Digitalisierung wurde ein notwendiger Schritt getan, um die Komplexität einer Compton-Kamera beherrschbar zu machen. Weiterhin zeigt die Technologiebewertung, dass ein CZT Pixeldetektor den Anforderungen der Teilchentherapie für die Messung prompter Gammastrahlen stand hält. Der bisher eingesetzte Streifendetektor muss zugunsten einer gesteigerten Effizienz und verbesserter Energieauflösung durch den Pixeldetektor ersetzt werden. Mit der Integration des entwickelten digitalen Detektorsystems in eine Compton-Kamera muss abschließend geprüft werden, ob dieses Verfahren für die Reichweitenkontrolle in der Teilchentherapie anwendbar ist. Auch wenn sich herausstellt, dass ein anderes Verfahren unter klinischen Bedingungen praktikabler ist, so kann auch dieses Detektorsystem von der gezeigten Instrumentierung eines digitalen Signalverarbeitungssystems profitieren.:1. Introduction 1.1. Aim of this work 2. Analog front-end electronics 2.1. State-of-the-art 2.2. Basic design considerations 2.2.1. CZT detector assembly 2.2.2. Electrical characteristics of a CZT pixel detector 2.2.3. High voltage biasing and grounding 2.2.4. Signal formation in CZT detectors 2.2.5. Readout concepts 2.2.6. Operational amplifier 2.3. Circuit design of a charge-sensitive amplifier 2.3.1. Circuit analysis 2.3.2. Charge-to-voltage transfer function 2.3.3. Input coupling of the CSA 2.3.4. Noise 2.4. Implementation and Test 2.5. Results 2.5.1. Test pulse input 2.5.2. Pixel detector 2.6. Conclusion 3. Digital signal processing 3.1. Unfolding-synthesis technique 3.2. Digital deconvolution 3.2.1. Prior work 3.2.2. Discrete-time inverse amplifier transfer function 3.2.3. Application to measured signals 3.2.4. Implementation of a higher order IIR filter 3.2.5. Conclusion 3.3. Digital pulse synthesis 3.3.1. Prior work 3.3.2. FIR filter structures for FPGAs 3.3.3. Optimized fixed-point arithmetic 3.3.4. Conclusion 4. Data interface 4.1. State-of-the-art 4.2. Embedded Gigabit Ethernet protocol stack 4.3. Implementation 4.3.1. System overview 4.3.2. Media Access Control 4.3.3. Embedded protocol stack 4.3.4. Clock synchronization 4.4. Measurements and results 4.4.1. Throughput performance 4.4.2. Synchronization 4.4.3. Resource utilization 4.5. Conclusion 5. Experimental results 5.1. Digital pulse shapers 5.1.1. Spectroscopy application 5.1.2. Timing applications 5.2. Gamma-ray spectroscopy 5.2.1. Energy resolution of scintillation detectors 5.2.2. Energy resolution of a CZT pixel detector 5.3. Gamma-ray timing 5.3.1. Timing performance of scintillation detectors 5.3.2. Timing performance of CZT pixel detectors 5.4. Measurements with a particle beam 5.4.1. Bremsstrahlung Facility at ELBE 6. Discussion 7. Summary 8. Zusammenfassun

    A Finite Domain Constraint Approach for Placement and Routing of Coarse-Grained Reconfigurable Architectures

    Get PDF
    Scheduling, placement, and routing are important steps in Very Large Scale Integration (VLSI) design. Researchers have developed numerous techniques to solve placement and routing problems. As the complexity of Application Specific Integrated Circuits (ASICs) increased over the past decades, so did the demand for improved place and route techniques. The primary objective of these place and route approaches has typically been wirelength minimization due to its impact on signal delay and design performance. With the advent of Field Programmable Gate Arrays (FPGAs), the same place and route techniques were applied to FPGA-based design. However, traditional place and route techniques may not work for Coarse-Grained Reconfigurable Architectures (CGRAs), which are reconfigurable devices offering wider path widths than FPGAs and more flexibility than ASICs, due to the differences in architecture and routing network. Further, the routing network of several types of CGRAs, including the Field Programmable Object Array (FPOA), has deterministic timing as compared to the routing fabric of most ASICs and FPGAs reported in the literature. This necessitates a fresh look at alternative approaches to place and route designs. This dissertation presents a finite domain constraint-based, delay-aware placement and routing methodology targeting an FPOA. The proposed methodology takes advantage of the deterministic routing network of CGRAs to perform a delay aware placement

    Efficient quadratic placement for FPGAs.

    Get PDF
    Field Programmable Gate Arrays (FPGAs) are widely used in industry because they can implement any digital circuit on site simply by specifying programmable logic and their interconnections. However, this rapid prototyping advantage may be adversely affected because of the long compile time, which is dominated by placement and routing. This issue is of great importance, especially as the logic capacities of FPGAs continue to grow. This thesis focuses on the placement phase of FPGA Computer Aided Design (CAD) flow and presents a fast, high quality, wirelength-driven placement algorithm for FPGAs that is based on the quadratic placement approach. In this thesis, multiple iterations of equation solving process together with a linear wirelength reduction technique are introduced. The proposed algorithm efficiently handles the main problems with the quadratic placement algorithm and produces a fast and high quality placement. Experimental results, using twenty benchmark circuits, show that this algorithm can achieve comparable total wirelength and, on average, 5X faster run time when compared to an existing, state-of-the-art placement tool. This thesis also shows that the proposed algorithm delivers promising preliminary results in minimizing the critical path delay while maintaining high placement quality.Dept. of Electrical and Computer Engineering. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2005 .X86. Source: Masters Abstracts International, Volume: 44-04, page: 1946. Thesis (M.A.Sc.)--University of Windsor (Canada), 2005
    corecore