53 research outputs found

    Using embedded hardware monitor cores in critical computer systems

    Get PDF
    The integration of FPGA devices in many different architectures and services makes monitoring and real time detection of errors an important concern in FPGA system design. A monitor is a tool, or a set of tools, that facilitate analytic measurements in observing a given system. The goal of these observations is usually the performance analysis and optimisation, or the surveillance of the system. However, System-on-Chip (SoC) based designs leave few points to attach external tools such as logic analysers. Thus, an embedded error detection core that allows observation of critical system nodes (such as processor cores and buses) should enforce the operation of the FPGA-based system, in order to prevent system failures. The core should not interfere with system performance and must ensure timely detection of errors. This thesis is an investigation onto how a robust hardware-monitoring module can be efficiently integrated in a target PCI board (with FPGA-based application processing features) which is part of a critical computing system. [Continues.

    Development and analysis of a verstile, reusable, high speed, DMA controller for custom embedded applications using the PCI bus

    Get PDF
    This thesis investigates the plausibility of designing and developing a versatile, reusable, high speed interface for custom computing applications, based on the Peripheral Component Interface (PCI) Bus. A PCI I/O board was developed, utilizing mainly Complex Programmable Logic Devices (CPLD\u27s), which included a custom Direct Memory Access (DMA) Controller to take advantage of the unique feature set of the PCI bus. The arbitration mechanisms and performance characteristics of the PCI bus are taken advantage of in order to achieve a maximum burst throughput rate of 66 Megabytes per second. Performance characteristics of the I/O board are analyzed for two separate PCI host systems. In the faster of the two systems, a 166MHz Pentium PC, a maximum aggregate throughput rate of 54 Megabytes per second for PCI burst writes was achieved. In all cases throughput increased as a function of transfer size. Due to buffering implementations in the host systems write performance was always superior to read performance. In addition to exceptional throughput capability, this implementation provides a design engineer with a versatile interface which can be mated to a number of high performance applications. The PCI I/O board\u27s external interface is implemented with a CPLD which can be quickly and easily modified to meet the needs of practically any custom interface without decreasing PCI bus performance. Using the on-board latency timer and programmable FIFO\u27s the board can be fine tuned to meet a variety of application requirements. The two main design goals were to provide unlimited bursting capability and to transfer 32-bits of data on every clock. The first was achieved through the implementation of a 32-bit burst Transfer Count register. The second goal had to be reduced by 50% due to a timing margin violation discovered during board debug

    Development of a PXI Express Peripheral Module and Data Transfer Platform

    No full text
    Magritek, a company who specialise in NMR and MRI devices, required a new backplane communication solution for transmission of data. Possible options were evaluated and it was decided to move to the PXI Express instrumentation standard. As a first step of moving to this system, an FPGA based PXI Express Peripheral Module was designed and constructed. In order to produce this device, details on PXI Express boards and the signals required were researched, and schematics produced. These were then passed onto the board designer who incorporated the design with other design work at Magritek to produce a PXI Express Peripheral Module for use as an NMR transceiver board. With the board designed, the FPGA was configured to provide PXI Express functionality. This was designed to allow PCI Express transfers at high data speeds using Direct Memory Access (DMA). The PXI Express Peripheral board was then tested and found to function correctly, providing Memory Write speeds of 228 MB/s and Memory Read speeds of 162 MB/s. Also, to provide a test system for this physical and FPGA design, backplanes were designed to test communication between PXI Express modules

    Memory hierarchy and data communication in heterogeneous reconfigurable SoCs

    Get PDF
    The miniaturization race in the hardware industry aiming at continuous increasing of transistor density on a die does not bring respective application performance improvements any more. One of the most promising alternatives is to exploit a heterogeneous nature of common applications in hardware. Supported by reconfigurable computation, which has already proved its efficiency in accelerating data intensive applications, this concept promises a breakthrough in contemporary technology development. Memory organization in such heterogeneous reconfigurable architectures becomes very critical. Two primary aspects introduce a sophisticated trade-off. On the one hand, a memory subsystem should provide well organized distributed data structure and guarantee the required data bandwidth. On the other hand, it should hide the heterogeneous hardware structure from the end-user, in order to support feasible high-level programmability of the system. This thesis work explores the heterogeneous reconfigurable hardware architectures and presents possible solutions to cope the problem of memory organization and data structure. By the example of the MORPHEUS heterogeneous platform, the discussion follows the complete design cycle, starting from decision making and justification, until hardware realization. Particular emphasis is made on the methods to support high system performance, meet application requirements, and provide a user-friendly programmer interface. As a result, the research introduces a complete heterogeneous platform enhanced with a hierarchical memory organization, which copes with its task by means of separating computation from communication, providing reconfigurable engines with computation and configuration data, and unification of heterogeneous computational devices using local storage buffers. It is distinguished from the related solutions by distributed data-flow organization, specifically engineered mechanisms to operate with data on local domains, particular communication infrastructure based on Network-on-Chip, and thorough methods to prevent computation and communication stalls. In addition, a novel advanced technique to accelerate memory access was developed and implemented

    Systematische Transaction-Level-Kommunikations-Modellierung mit SystemC

    Get PDF
    An emerging approach to embedded system design is to assemble them from a library of hardware and software component models (IP, intellectual property) using a system description language, such as SystemC. SystemC allows describing the communication among IPs in terms of abstract operations (transactions). The promise is that with transaction-level modeling (TLM), future systems-on-chip with one billion transistors and more can be composed out of IPs as simply as playing with LEGO bricks. However, reality is far out. In fact, each IP vendor promotes another proprietary interface standard and the provided design tools lack compatibility, such that heterogeneous IPs cannot be integrated efficiently. A novel generic interconnect fabric for TLM is presented which aims at enabling inter-operation between models of different levels of abstraction (mixed-mode) and models with different interfaces (heterogeneous components), with as little overhead as possible. A generic, protocol independent representation of transactions is developed, among with an abstraction level formalism. This approach is shown to support systematic simulation of state-of-the-art buses and networks-on-chip such as IBM CoreConnect and PCI Express over several levels of TLM abstraction. A layered simulation framework for SystemC, GreenBus, is developed to examine the proposed concepts. The thesis discusses new implementation techniques for communication modeling with SystemC which outperform the existing approaches in terms of flexibility, simulation accuracy, and performance. Based on these techniques, advanced concepts for TLM-based hardware/software co-design and FPGA prototyping are examined. Several experiments and a video processor case study highlight the efficiency of the approach and show its applicability in a TLM design flow.Eingebettete Systeme werden zunehmend auf Basis vorgefertigter Hard- und Softwarebausteine entwickelt, die in Form von Modellen (IP, Intellectual Property) vorliegen. Hierzu werden Systembeschreibungssprachen wie SystemC eingesetzt. SystemC ermöglicht, die Kommunikation zwischen IPs durch abstrakte Operationen, sog. Transaktionen zu beschreiben. Mit dieser Transaction-Level-Modellierung (TLM) sollen auch zukünftige Systeme mit 1 Milliarde Transistoren und mehr effizient entwickelt werden können. Idealerweise sollte das Hantieren mit IPs dabei so einfach sein wie das Spielen mit LEGO-Steinen. In der Realität sind jedoch IPs unterschiedlicher Hersteller nicht ohne weiteres integrierbar, und auch die Entwurfswerkzeuge sind nicht kompatibel. In dieser Doktorarbeit wird ein neuer, generischer Ansatz für die Transaction-Level-Modellierung mit SystemC vorgestellt, der Kommunikation zwischen Modellen auf unterschiedlichen Abstraktionsebenen (Mixed-Mode) und mit unterschiedlichen Schnittstellen (heterogene Komponenten) möglich macht. Der zusätzlich benötigte Simulations- und Code-Aufwand ist minimal. Ein protokollunabhängiges Transaktionsmodell und ein formaler Ansatz zur Beschreibung von Abstraktionsebenen werden vorgestellt, mit denen verschiedenartige Busse und Networks-on-Chip wie IBM CoreConnect und PCI Express auf verschiedenen TLM-Abstraktionsebenen simuliert werden können. Ein modulares Simulationsframework für SystemC wird entwickelt (GreenBus), um die vorgeschlagenen Konzepte zu untersuchen. Anhand von GreenBus werden neue Implementierungstechniken diskutiert, die den existierenden Ansätzen in Flexibilität, Simulationsgenauigkeit und -geschwindigkeit überlegen sind. Die Vor- und Nachteile der entwickelten Techniken werden mit Experimenten belegt, und eine Videoprozessor-Fallstudie demonstriert die Effizienz des Ansatzes in einem TLM-basierten Entwurfsfluss

    The ATLAS ROBIN – A High-Performance Data-Acquisition Module

    Get PDF
    This work presents the re-configurable processor ROBIN, which is a key element of the data-acquisition-system of the ATLAS experiment, located at the new LHC at CERN. The ATLAS detector provides data over 1600 channels simultaneously towards the DAQ system. The ATLAS dataflow model follows the “PULL” strategy in contrast to the commonly used “PUSH” strategy. The data volume transported is reduced by a factor of 10, however the data must be temporarily stored at the entry to the DAQ system. The input layer consists of approx. 160 ROS read-out units comprising 1 PC and 4 ROBIN modules. Each ROBIN device acquires detector data via 3 input channels and performs local buffering. Board control is done via a 64-bit PCI interface. Event selection and data transmission runs via PCI in the baseline bus-based ROS. Alternatively, a local GE interface can take over part or all of the data traffic in the switch-based ROS, in order to reduce the load on the host PC. The performance of the ROBIN module stems from the close cooperation of a fast embedded processor with a complex FPGA. The efficient task-distribution lets the processor handle all complex management functionality, programmed in “C” while all movement of data is performed by the FPGA via multiple, concurrently operating DMA engines. The ROBIN-project was carried-out by and international team and comprises the design specification, the development of the ROBIN hardware, firmware (VHDL and C-Code), host-code (C++), prototyping, volume production and installation of 700 boards. The project was led by the author of this thesis. The hardware platform is an evolution of a FPGA processor previously designed by the author. He has contributed elementary concepts of the communication mechanisms and the “C”-coded embedded application software. He also organised and supervised the prototype and series productions including the various design reports and presentations. The results show that the ROBIN-module is able to meet its ambitious requirements of 100kHz incoming fragment rate per channel with a concurrent outgoing fragment rate of 21kHz per channel. At the system level, each ROS unit (12 channels) operates at the same rates, however for a subset of the channels only. The ATLAS DAQ system – with 640 ROBIN modules installed – has performed a successful data-taking phase at the start-up of the LHC in September

    A general-purpose pulse sequencer for quantum computing

    Get PDF
    Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2005.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Includes bibliographical references (p. 165-170).Quantum mechanics presents a more general and potentially more powerful model of computation than classical systems. Quantum bits have many physically different representations which nonetheless share a common need for modulating pulses of electromagnetic waves. This thesis presents the design and evaluates the implementation of a general-purpose sequencer which supports fast, programmable pulses; a flexible, open design; and feedback operation for adaptive algorithms. The sequencer achieves a timing resolution, minimum pulse duration, and minimum delay of 10 nanoseconds; it has 64 simultaneously-switching, independent digital outputs and 8 digital inputs for triggering or feedback. Multiple devices can operate in a daisy chain to facilitate adding and removing channels. An FPGA is used to implement a firmware network stack and a specialized pulse processor core whose modules are all interconnected using the Wishbone bus standard. Users can write pulse programs in an assembly language and control the device from a host computer over an Ethernet network. An embedded web server provides an intuitive, graphical user interface, while a non-interactive, efficient UDP protocol provides programmatic access to third-party software. The performance characteristics, tolerances, and cost of the device are measured and compared with those of contemporary research and commercial offerings. Future improvements and extensions are suggested. All circuit schematics, PCB layouts, source code, and design documents are released under an open source license.by Paul Tân Thế Phạm.M.Eng

    Proceedings of the 5th International Workshop on Reconfigurable Communication-centric Systems on Chip 2010 - ReCoSoC\u2710 - May 17-19, 2010 Karlsruhe, Germany. (KIT Scientific Reports ; 7551)

    Get PDF
    ReCoSoC is intended to be a periodic annual meeting to expose and discuss gathered expertise as well as state of the art research around SoC related topics through plenary invited papers and posters. The workshop aims to provide a prospective view of tomorrow\u27s challenges in the multibillion transistor era, taking into account the emerging techniques and architectures exploring the synergy between flexible on-chip communication and system reconfigurability
    • …
    corecore