389 research outputs found

    A bibliography on parallel and vector numerical algorithms

    Get PDF
    This is a bibliography of numerical methods. It also includes a number of other references on machine architecture, programming language, and other topics of interest to scientific computing. Certain conference proceedings and anthologies which have been published in book form are listed also

    Solution of partial differential equations on vector and parallel computers

    Get PDF
    The present status of numerical methods for partial differential equations on vector and parallel computers was reviewed. The relevant aspects of these computers are discussed and a brief review of their development is included, with particular attention paid to those characteristics that influence algorithm selection. Both direct and iterative methods are given for elliptic equations as well as explicit and implicit methods for initial boundary value problems. The intent is to point out attractive methods as well as areas where this class of computer architecture cannot be fully utilized because of either hardware restrictions or the lack of adequate algorithms. Application areas utilizing these computers are briefly discussed

    Proceedings for the ICASE Workshop on Heterogeneous Boundary Conditions

    Get PDF
    Domain Decomposition is a complex problem with many interesting aspects. The choice of decomposition can be made based on many different criteria, and the choice of interface of internal boundary conditions are numerous. The various regions under study may have different dynamical balances, indicating that different physical processes are dominating the flow in these regions. This conference was called in recognition of the need to more clearly define the nature of these complex problems. This proceedings is a collection of the presentations and the discussion groups

    Parallelization of dynamic programming recurrences in computational biology

    Get PDF
    The rapid growth of biosequence databases over the last decade has led to a performance bottleneck in the applications analyzing them. In particular, over the last five years DNA sequencing capacity of next-generation sequencers has been doubling every six months as costs have plummeted. The data produced by these sequencers is overwhelming traditional compute systems. We believe that in the future compute performance, not sequencing, will become the bottleneck in advancing genome science. In this work, we investigate novel computing platforms to accelerate dynamic programming algorithms, which are popular in bioinformatics workloads. We study algorithm-specific hardware architectures that exploit fine-grained parallelism in dynamic programming kernels using field-programmable gate arrays: FPGAs). We advocate a high-level synthesis approach, using the recurrence equation abstraction to represent dynamic programming and polyhedral analysis to exploit parallelism. We suggest a novel technique within the polyhedral model to optimize for throughput by pipelining independent computations on an array. This design technique improves on the state of the art, which builds latency-optimal arrays. We also suggest a method to dynamically switch between a family of designs using FPGA reconfiguration to achieve a significant performance boost. We have used polyhedral methods to parallelize the Nussinov RNA folding algorithm to build a family of accelerators that can trade resources for parallelism and are between 15-130x faster than a modern dual core CPU implementation. A Zuker RNA folding accelerator we built on a single workstation with four Xilinx Virtex 4 FPGAs outperforms 198 3 GHz Intel Core 2 Duo processors. Furthermore, our design running on a single FPGA is an order of magnitude faster than competing implementations on similar-generation FPGAs and graphics processors. Our work is a step toward the goal of automated synthesis of hardware accelerators for dynamic programming algorithms

    Instrumentation of CdZnTe detectors for measuring prompt gamma-rays emitted during particle therapy

    Get PDF
    Background: The irradiation of cancer patients with charged particles, mainly protons and carbon ions, has become an established method for the treatment of specific types of tumors. In comparison with the use of X-rays or gamma-rays, particle therapy has the advantage that the dose distribution in the patient can be precisely controlled. Tissue or organs lying near the tumor will be spared. A verification of the treatment plan with the actual dose deposition by means of a measurement can be done through range assessment of the particle beam. For this purpose, prompt gamma-rays are detected, which are emitted by the affected target volume during irradiation. Motivation: The detection of prompt gamma-rays is a task related to radiation detection and measurement. Nuclear applications in medicine can be found in particular for in vivo diagnosis. In that respect the spatially resolved measurement of gamma-rays is an essential technique for nuclear imaging, however, technical requirements of radiation measurement during particle therapy are much more challenging than those of classical applications. For this purpose, appropriate instruments beyond the state-of-the-art need to be developed and tested for detecting prompt gamma-rays. Hence the success of a method for range assessment of particle beams is largely determined by the implementation of electronics. In practice, this means that a suitable detector material with adapted readout electronics, signal and information processing, and data interface must be utilized to solve the challenges. Thus, the parameters of the system (e.g. segmentation, time or energy resolution) can be optimized depending on the method (e.g. slit camera, time-of-flight measurement or Compton camera). Regardless of the method, the detector system must have a high count rate capability and a large measuring range (>7 MeV). For a subsequent evaluation of a suitable method for imaging, the mentioned parameters may not be restricted by the electronics. Digital signal processing is predestined for multipurpose tasks, and, in terms of the demands made, the performance of such an implementation has to be determined. Materials and methods: In this study, the instrumentation of a detector system for prompt gamma-rays emitted during particle therapy is limited to the use of a cadmium zinc telluride (CdZnTe, CZT) semiconductor detector. The detector crystal is divided into an 8x8 pixel array by segmented electrodes. Analog and digital signal processing are exemplarily tested with this type of detector and aims for application of a Compton camera to range assessment. The electronics are implemented with commercial off-the-shelf (COTS) components. If applicable, functional units of the detector system were digitalized and implemented in a field-programmable gate array (FPGA). An efficient implementation of the algorithms in terms of timing and logic utilization is fundamental to the design of digital circuits. The measurement system is characterized with radioactive sources to determine the measurement dynamic range and resolution. Finally, the performance is examined in terms of the requirements of particle therapy with experiments at particle accelerators. Results: A detector system based on a CZT pixel detector has been developed and tested. Although the use of an application-specific integrated circuit is convenient, this approach was rejected because there was no circuit available which met the requirements. Instead, a multichannel, compact, and low-noise analog amplifier circuit with COTS components has been implemented. Finally, the 65 information channels of a detector are digitized, processed and visualized. An advanced digital signal processing transforms the traditional approaches of nuclear electronics in algorithms and digital filter structures for an FPGA. With regard to the characteristic signals (e.g. varying rise times, depth-dependent energy measurement) of a CZT pixel detector, it could be shown that digital pulse processing results in a very good energy resolution (~2% FWHM at 511 keV), as well as permits a time measurement in the range of some tens of nanoseconds. Furthermore, the experimental results have shown that the dynamic range of the detector system could be significantly improved compared to the existing prototype of the Compton camera (~10 keV..7 MeV). Even count rates of ~100 kcps in a high-energy beam could be ultimately processed with the CZT pixel detector. But this is merely a limit of the detector due to its volume, and not related to electronics. In addition, the versatility of digital signal processing has been demonstrated with other detector materials (e.g. CeBr3). With foresight on high data throughput in a distributed data acquisition from multiple detectors, a Gigabit Ethernet link has been implemented as data interface. Conclusions: To fully exploit the capabilities of a CZT pixel detector, a digital signal processing is absolutely necessary. A decisive advantage of the digital approach is the ease of use in a multichannel system. Thus with digitalization, a necessary step has been done to master the complexity of a Compton camera. Furthermore, the benchmark of technology shows that a CZT pixel detector withstands the requirements of measuring prompt gamma-rays during particle therapy. The previously used orthogonal strip detector must be replaced by the pixel detector in favor of increased efficiency and improved energy resolution. With the integration of the developed digital detector system into a Compton camera, it must be ultimately proven whether this method is applicable for range assessment in particle therapy. Even if another method is more convenient in a clinical environment due to practical considerations, the detector system of that method may benefit from the shown instrumentation of a digital signal processing system for nuclear applications.:1. Introduction 1.1. Aim of this work 2. Analog front-end electronics 2.1. State-of-the-art 2.2. Basic design considerations 2.2.1. CZT detector assembly 2.2.2. Electrical characteristics of a CZT pixel detector 2.2.3. High voltage biasing and grounding 2.2.4. Signal formation in CZT detectors 2.2.5. Readout concepts 2.2.6. Operational amplifier 2.3. Circuit design of a charge-sensitive amplifier 2.3.1. Circuit analysis 2.3.2. Charge-to-voltage transfer function 2.3.3. Input coupling of the CSA 2.3.4. Noise 2.4. Implementation and Test 2.5. Results 2.5.1. Test pulse input 2.5.2. Pixel detector 2.6. Conclusion 3. Digital signal processing 3.1. Unfolding-synthesis technique 3.2. Digital deconvolution 3.2.1. Prior work 3.2.2. Discrete-time inverse amplifier transfer function 3.2.3. Application to measured signals 3.2.4. Implementation of a higher order IIR filter 3.2.5. Conclusion 3.3. Digital pulse synthesis 3.3.1. Prior work 3.3.2. FIR filter structures for FPGAs 3.3.3. Optimized fixed-point arithmetic 3.3.4. Conclusion 4. Data interface 4.1. State-of-the-art 4.2. Embedded Gigabit Ethernet protocol stack 4.3. Implementation 4.3.1. System overview 4.3.2. Media Access Control 4.3.3. Embedded protocol stack 4.3.4. Clock synchronization 4.4. Measurements and results 4.4.1. Throughput performance 4.4.2. Synchronization 4.4.3. Resource utilization 4.5. Conclusion 5. Experimental results 5.1. Digital pulse shapers 5.1.1. Spectroscopy application 5.1.2. Timing applications 5.2. Gamma-ray spectroscopy 5.2.1. Energy resolution of scintillation detectors 5.2.2. Energy resolution of a CZT pixel detector 5.3. Gamma-ray timing 5.3.1. Timing performance of scintillation detectors 5.3.2. Timing performance of CZT pixel detectors 5.4. Measurements with a particle beam 5.4.1. Bremsstrahlung Facility at ELBE 6. Discussion 7. Summary 8. ZusammenfassungHintergrund: Die Bestrahlung von Krebspatienten mit geladenen Teilchen, vor allem Protonen oder Kohlenstoffionen, ist mittlerweile eine etablierte Methode zur Behandlung von speziellen Tumorarten. Im Vergleich mit der Anwendung von Röntgen- oder Gammastrahlen hat die Teilchentherapie den Vorteil, dass die Dosisverteilung im Patienten präziser gesteuert werden kann. Dadurch werden um den Tumor liegendes Gewebe oder Organe geschont. Die messtechnische Verifikation des Bestrahlungsplans mit der tatsächlichen Dosisdeposition kann über eine Reichweitenkontrolle des Teilchenstrahls erfolgen. Für diesen Zweck werden prompte Gammastrahlen detektiert, die während der Bestrahlung vom getroffenen Zielvolumen emittiert werden. Fragestellung: Die Detektion von prompten Gammastrahlen ist eine Aufgabenstellung der Strahlenmesstechnik. Strahlenanwendungen in der Medizintechnik finden sich insbesondere in der in-vivo Diagnostik. Dabei ist die räumlich aufgelöste Messung von Gammastrahlen bereits zentraler Bestandteil der nuklearmedizinischen Bildgebung, jedoch sind die technischen Anforderungen der Strahlendetektion während der Teilchentherapie im Vergleich mit klassischen Anwendungen weitaus anspruchsvoller. Über den Stand der Technik hinaus müssen für diesen Zweck geeignete Instrumente zur Erfassung der prompten Gammastrahlen entwickelt und erprobt werden. Die elektrotechnische Realisierung bestimmt maßgeblich den Erfolg eines Verfahrens zur Reichweitenkontrolle von Teilchenstrahlen. Konkret bedeutet dies, dass ein geeignetes Detektormaterial mit angepasster Ausleseelektronik, Signal- und Informationsverarbeitung sowie Datenschnittstelle zur Problemlösung eingesetzt werden muss. Damit können die Parameter des Systems (z. B. Segmentierung, Zeit- oder Energieauflösung) in Abhängigkeit der Methode (z.B. Schlitzkamera, Flugzeitmessung oder Compton-Kamera) optimiert werden. Unabhängig vom Verfahren muss das Detektorsystem eine hohe Ratenfestigkeit und einen großen Messbereich (>7 MeV) besitzen. Für die anschließende Evaluierung eines geeigneten Verfahrens zur Bildgebung dürfen die genannten Parameter durch die Elektronik nicht eingeschränkt werden. Eine digitale Signalverarbeitung ist für universelle Aufgaben prädestiniert und die Leistungsfähigkeit einer solchen Implementierung soll hinsichtlich der gestellten Anforderungen bestimmt werden. Material und Methode: Die Instrumentierung eines Detektorsystems für prompte Gammastrahlen beschränkt sich in dieser Arbeit auf die Anwendung eines Cadmiumzinktellurid (CdZnTe, CZT) Halbleiterdetektors. Der Detektorkristall ist durch segmentierte Elektroden in ein 8x8 Pixelarray geteilt. Die analoge und digitale Signalverarbeitung wird beispielhaft mit diesem Detektortyp erprobt und zielt auf die Anwendung zur Reichweitenkontrolle mit einer Compton-Kamera. Die Elektronik wird mit seriengefertigten integrierten Schaltkreisen umgesetzt. Soweit möglich, werden die Funktionseinheiten des Detektorsystems digitalisiert und in einem field-programmable gate array (FPGA) implementiert. Eine effiziente Umsetzung der Algorithmen in Bezug auf Zeitverhalten und Logikverbrauch ist grundlegend für den Entwurf der digitalen Schaltungen. Das Messsystem wird mit radioaktiven Prüfstrahlern hinsichtlich Messbereichsdynamik und Auflösung charakterisiert. Schließlich wird die Leistungsfähigkeit hinsichtlich der Anforderungen der Teilchentherapie mit Experimenten am Teilchenbeschleuniger untersucht. Ergebnisse: Es wurde ein Detektorsystem auf Basis von CZT Pixeldetektoren entwickelt und erprobt. Obwohl der Einsatz einer anwendungsspezifischen integrierten Schaltung zweckmäßig wäre, wurde dieser Ansatz zurückgewiesen, da kein verfügbarer Schaltkreis die Anforderungen erfüllte. Stattdessen wurde eine vielkanalige, kompakte und rauscharme analoge Verstärkerschaltung mit seriengefertigten integrierten Schaltkreisen aufgebaut. Letztendlich werden die 65 Informationskanäle eines Detektors digitalisiert, verarbeitet und visualisiert. Eine fortschrittliche digitale Signalverarbeitung überführt die traditionellen Ansätze der Nuklearelektronik in Algorithmen und digitale Filterstrukturen für einen FPGA. Es konnte gezeigt werden, dass die digitale Pulsverarbeitung in Bezug auf die charakteristischen Signale (u.a. variierende Anstiegszeiten, tiefenabhängige Energiemessung) eines CZT Pixeldetektors eine sehr gute Energieauflösung (~2% FWHM at 511 keV) sowie eine Zeitmessung im Bereich von einigen 10 ns ermöglicht. Weiterhin haben die experimentellen Ergebnisse gezeigt, dass der Dynamikbereich des Detektorsystems im Vergleich zum bestehenden Prototyp der Compton-Kamera deutlich verbessert werden konnte (~10 keV..7 MeV). Nach allem konnten auch Zählraten von >100 kcps in einem hochenergetischen Strahl mit dem CZT Pixeldetektor verarbeitet werden. Dies stellt aber lediglich eine Begrenzung des Detektors aufgrund seines Volumens, nicht jedoch der Elektronik, dar. Zudem wurde die Vielseitigkeit der digitalen Signalverarbeitung auch mit anderen Detektormaterialen (u.a. CeBr3) demonstriert. Mit Voraussicht auf einen hohen Datendurchsatz in einer verteilten Datenerfassung von mehreren Detektoren, wurde als Datenschnittstelle eine Gigabit Ethernet Verbindung implementiert. Schlussfolgerung: Um die Leistungsfähigkeit eines CZT Pixeldetektors vollständig auszunutzen, ist eine digitale Signalverarbeitung zwingend notwendig. Ein entscheidender Vorteil des digitalen Ansatzes ist die einfache Handhabbarkeit in einem vielkanaligen System. Mit der Digitalisierung wurde ein notwendiger Schritt getan, um die Komplexität einer Compton-Kamera beherrschbar zu machen. Weiterhin zeigt die Technologiebewertung, dass ein CZT Pixeldetektor den Anforderungen der Teilchentherapie für die Messung prompter Gammastrahlen stand hält. Der bisher eingesetzte Streifendetektor muss zugunsten einer gesteigerten Effizienz und verbesserter Energieauflösung durch den Pixeldetektor ersetzt werden. Mit der Integration des entwickelten digitalen Detektorsystems in eine Compton-Kamera muss abschließend geprüft werden, ob dieses Verfahren für die Reichweitenkontrolle in der Teilchentherapie anwendbar ist. Auch wenn sich herausstellt, dass ein anderes Verfahren unter klinischen Bedingungen praktikabler ist, so kann auch dieses Detektorsystem von der gezeigten Instrumentierung eines digitalen Signalverarbeitungssystems profitieren.:1. Introduction 1.1. Aim of this work 2. Analog front-end electronics 2.1. State-of-the-art 2.2. Basic design considerations 2.2.1. CZT detector assembly 2.2.2. Electrical characteristics of a CZT pixel detector 2.2.3. High voltage biasing and grounding 2.2.4. Signal formation in CZT detectors 2.2.5. Readout concepts 2.2.6. Operational amplifier 2.3. Circuit design of a charge-sensitive amplifier 2.3.1. Circuit analysis 2.3.2. Charge-to-voltage transfer function 2.3.3. Input coupling of the CSA 2.3.4. Noise 2.4. Implementation and Test 2.5. Results 2.5.1. Test pulse input 2.5.2. Pixel detector 2.6. Conclusion 3. Digital signal processing 3.1. Unfolding-synthesis technique 3.2. Digital deconvolution 3.2.1. Prior work 3.2.2. Discrete-time inverse amplifier transfer function 3.2.3. Application to measured signals 3.2.4. Implementation of a higher order IIR filter 3.2.5. Conclusion 3.3. Digital pulse synthesis 3.3.1. Prior work 3.3.2. FIR filter structures for FPGAs 3.3.3. Optimized fixed-point arithmetic 3.3.4. Conclusion 4. Data interface 4.1. State-of-the-art 4.2. Embedded Gigabit Ethernet protocol stack 4.3. Implementation 4.3.1. System overview 4.3.2. Media Access Control 4.3.3. Embedded protocol stack 4.3.4. Clock synchronization 4.4. Measurements and results 4.4.1. Throughput performance 4.4.2. Synchronization 4.4.3. Resource utilization 4.5. Conclusion 5. Experimental results 5.1. Digital pulse shapers 5.1.1. Spectroscopy application 5.1.2. Timing applications 5.2. Gamma-ray spectroscopy 5.2.1. Energy resolution of scintillation detectors 5.2.2. Energy resolution of a CZT pixel detector 5.3. Gamma-ray timing 5.3.1. Timing performance of scintillation detectors 5.3.2. Timing performance of CZT pixel detectors 5.4. Measurements with a particle beam 5.4.1. Bremsstrahlung Facility at ELBE 6. Discussion 7. Summary 8. Zusammenfassun

    Analysis of aortic-valve blood flow using computational fluid dynamics

    Get PDF

    Using reconfigurable computing technology to accelerate matrix decomposition and applications

    Get PDF
    Matrix decomposition plays an increasingly significant role in many scientific and engineering applications. Among numerous techniques, Singular Value Decomposition (SVD) and Eigenvalue Decomposition (EVD) are widely used as factorization tools to perform Principal Component Analysis for dimensionality reduction and pattern recognition in image processing, text mining and wireless communications, while QR Decomposition (QRD) and sparse LU Decomposition (LUD) are employed to solve the dense or sparse linear system of equations in bioinformatics, power system and computer vision. Matrix decompositions are computationally expensive and their sequential implementations often fail to meet the requirements of many time-sensitive applications. The emergence of reconfigurable computing has provided a flexible and low-cost opportunity to pursue high-performance parallel designs, and the use of FPGAs has shown promise in accelerating this class of computation. In this research, we have proposed and implemented several highly parallel FPGA-based architectures to accelerate matrix decompositions and their applications in data mining and signal processing. Specifically, in this dissertation we describe the following contributions: • We propose an efficient FPGA-based double-precision floating-point architecture for EVD, which can efficiently analyze large-scale matrices. • We implement a floating-point Hestenes-Jacobi architecture for SVD, which is capable of analyzing arbitrary sized matrices. • We introduce a novel deeply pipelined reconfigurable architecture for QRD, which can be dynamically configured to perform either Householder transformation or Givens rotation in a manner that takes advantage of the strengths of each. • We design a configurable architecture for sparse LUD that supports both symmetric and asymmetric sparse matrices with arbitrary sparsity patterns. • By further extending the proposed hardware solution for SVD, we parallelize a popular text mining tool-Latent Semantic Indexing with an FPGA-based architecture. • We present a configurable architecture to accelerate Homotopy l1-minimization, in which the modification of the proposed FPGA architecture for sparse LUD is used at its core to parallelize both Cholesky decomposition and rank-1 update. Our experimental results using an FPGA-based acceleration system indicate the efficiency of our proposed novel architectures, with application and dimension-dependent speedups over an optimized software implementation that range from 1.5ÃÂ to 43.6ÃÂ in terms of computation time
    corecore