16 research outputs found
Generic low power reconfigurable distributed arithmetic processor
Higher performance, lower cost, increasingly minimizing integrated circuit components, and
higher packaging density of chips are ongoing goals of the microelectronic and computer
industry. As these goals are being achieved, however, power consumption and flexibility are
increasingly becoming bottlenecks that need to be addressed with the new technology in Very
Large-Scale Integrated (VLSI) design.
For modern systems, more energy is required to support the powerful computational capability
which accords with the increasing requirements, and these requirements cause the change of
standards not only in audio and video broadcasting but also in communication such as wireless
connection and network protocols. Powerful flexibility and low consumption are repellent, but
their combination in one system is the ultimate goal of designers.
A generic domain-specific low-power reconfigurable processor for the distributed
arithmetic algorithm is presented in this dissertation. This domain reconfigurable processor
features high efficiency in terms of area, power and delay, which approaches the
performance of an ASIC design, while retaining the flexibility of programmable platforms.
The architecture not only supports typical distributed arithmetic algorithms which can be
found in most still picture compression standards and video conferencing standards, but
also offers implementation ability for other distributed arithmetic algorithms found in
digital signal processing, telecommunication protocols and automatic control.
In this processor, a simple reconfigurable low power control unit is implemented with
good performance in area, power and timing. The generic characteristic of the architecture
makes it applicable for any small and medium size finite state machines which can be used
as control units to implement complex system behaviour and can be found in almost all
engineering disciplines. Furthermore, to map target applications efficiently onto the
proposed architecture, a new algorithm is introduced for searching for the best common
sharing terms set and it keeps the area and power consumption of the implementation at
low level. The software implementation of this algorithm is presented, which can be used
not only for the proposed architecture in this dissertation but also for all the
implementations with adder-based distributed arithmetic algorithms. In addition, some low
power design techniques are applied in the architecture, such as unsymmetrical design
style including unsymmetrical interconnection arranging, unsymmetrical PTBs selection
and unsymmetrical mapping basic computing units. All these design techniques achieve
extraordinary power consumption saving. It is believed that they can be extended to more
low power designs and architectures.
The processor presented in this dissertation can be used to implement complex, high
performance distributed arithmetic algorithms for communication and image processing
applications with low cost in area and power compared with the traditional
methods
Embedded electronic systems driven by run-time reconfigurable hardware
Abstract
This doctoral thesis addresses the design of embedded electronic systems based on run-time reconfigurable hardware technology –available through SRAM-based FPGA/SoC devices– aimed at contributing to enhance the life quality of the human beings. This work does research on the conception of the system architecture and the reconfiguration engine that provides to the FPGA the capability of dynamic partial reconfiguration in order to synthesize, by means of hardware/software co-design, a given application partitioned in processing tasks which are multiplexed in time and space, optimizing thus its physical implementation –silicon area, processing time, complexity, flexibility, functional density, cost and power consumption– in comparison with other alternatives based on static hardware (MCU, DSP, GPU, ASSP, ASIC, etc.). The design flow of such technology is evaluated through the prototyping of several engineering applications (control systems, mathematical coprocessors, complex image processors, etc.), showing a high enough level of maturity for its exploitation in the industry.Resumen
Esta tesis doctoral abarca el diseño de sistemas electrónicos embebidos basados en tecnologÃa hardware dinámicamente reconfigurable –disponible a través de dispositivos lógicos programables SRAM FPGA/SoC– que contribuyan a la mejora de la calidad de vida de la sociedad. Se investiga la arquitectura del sistema y del motor de reconfiguración que proporcione a la FPGA la capacidad de reconfiguración dinámica parcial de sus recursos programables, con objeto de sintetizar, mediante codiseño hardware/software, una determinada aplicación particionada en tareas multiplexadas en tiempo y en espacio, optimizando asà su implementación fÃsica –área de silicio, tiempo de procesado, complejidad, flexibilidad, densidad funcional, coste y potencia disipada– comparada con otras alternativas basadas en hardware estático (MCU, DSP, GPU, ASSP, ASIC, etc.). Se evalúa el flujo de diseño de dicha tecnologÃa a través del prototipado de varias aplicaciones de ingenierÃa (sistemas de control, coprocesadores aritméticos, procesadores de imagen, etc.), evidenciando un nivel de madurez viable ya para su explotación en la industria.Resum
Aquesta tesi doctoral està orientada al disseny de sistemes electrònics empotrats basats en tecnologia hardware dinà micament reconfigurable –disponible mitjançant dispositius lògics programables SRAM FPGA/SoC– que contribueixin a la millora de la qualitat de vida de la societat. S’investiga l’arquitectura del sistema i del motor de reconfiguració que proporcioni a la FPGA la capacitat de reconfiguració dinà mica parcial dels seus recursos programables, amb l’objectiu de sintetitzar, mitjançant codisseny hardware/software, una determinada aplicació particionada en tasques multiplexades en temps i en espai, optimizant aixà la seva implementació fÃsica –à rea de silici, temps de processat, complexitat, flexibilitat, densitat funcional, cost i potència dissipada– comparada amb altres alternatives basades en hardware està tic (MCU, DSP, GPU, ASSP, ASIC, etc.). S’evalúa el fluxe de disseny d’aquesta tecnologia a través del prototipat de varies aplicacions d’enginyeria (sistemes de control, coprocessadors aritmètics, processadors d’imatge, etc.), demostrant un nivell de maduresa viable ja per a la seva explotació a la indústria
Programmable flexible cores for SoC applications
Tese de mestrado. Engenharia Electrotécnica e de Computadores. Faculdade de Engenharia. Universidade do Porto. 200
Towards the development of flexible, reliable, reconfigurable, and high-performance imaging systems
Current FPGAs can implement large systems because of the high density of
reconfigurable logic resources in a single chip. FPGAs are comprehensive devices
that combine flexibility and high performance in the same platform compared to
other platform such as General-Purpose Processors (GPPs) and Application Specific
Integrated Circuits (ASICs). The flexibility of modern FPGAs is further enhanced by
introducing Dynamic Partial Reconfiguration (DPR) feature, which allows for
changing the functionality of part of the system while other parts are functioning.
FPGAs became an important platform for digital image processing applications
because of the aforementioned features. They can fulfil the need of efficient and
flexible platforms that execute imaging tasks efficiently as well as the reliably with
low power, high performance and high flexibility. The use of FPGAs as accelerators
for image processing outperforms most of the current solutions. Current FPGA
solutions can to load part of the imaging application that needs high computational
power on dedicated reconfigurable hardware accelerators while other parts are
working on the traditional solution to increase the system performance. Moreover,
the use of the DPR feature enhances the flexibility of image processing further by
swapping accelerators in and out at run-time. The use of fault mitigation techniques
in FPGAs enables imaging applications to operate in harsh environments following
the fact that FPGAs are sensitive to radiation and extreme conditions.
The aim of this thesis is to present a platform for efficient implementations of
imaging tasks. The research uses FPGAs as the key component of this platform and
uses the concept of DPR to increase the performance, flexibility, to reduce the power
dissipation and to expand the cycle of possible imaging applications. In this context,
it proposes the use of FPGAs to accelerate the Image Processing Pipeline (IPP)
stages, the core part of most imaging devices. The thesis has a number of novel
concepts. The first novel concept is the use of FPGA hardware environment and
DPR feature to increase the parallelism and achieve high flexibility. The concept also
increases the performance and reduces the power consumption and area utilisation.
Based on this concept, the following implementations are presented in this thesis: An
implementation of Adams Hamilton Demosaicing algorithm for camera colour
interpolation, which exploits the FPGA parallelism to outperform other equivalents.
In addition, an implementation of Automatic White Balance (AWB), another IPP
stage that employs DPR feature to prove the mentioned novelty aspects. Another
novel concept in this thesis is presented in chapter 6, which uses DPR feature to
develop a novel flexible imaging system that requires less logic and can be
implemented in small FPGAs. The system can be employed as a template for any
imaging application with no limitation. Moreover, discussed in this thesis is a novel
reliable version of the imaging system that adopts novel techniques including
scrubbing, Built-In Self Test (BIST), and Triple Modular Redundancy (TMR) to
detect and correct errors using the Internal Configuration Access Port (ICAP)
primitive. These techniques exploit the datapath-based nature of the implemented
imaging system to improve the system's overall reliability. The thesis presents a
proposal for integrating the imaging system with the Robust Reliable Reconfigurable
Real-Time Heterogeneous Operating System (R4THOS) to get the best out of the
system. The proposal shows the suitability of the proposed DPR imaging system to
be used as part of the core system of autonomous cars because of its unbounded
flexibility. These novel works are presented in a number of publications as shown in section
1.3 later in this thesis
Development of FPGA-based High-Speed serial links for High Energy Physics Experiments
High Energy Physics (HEP) experiments generate high volumes of data which need to be transferred over long distance. Then, for data read out, reliable and high-speed links are necessary. Over the years, due to their extreme high bandwidth, serial links (especially optical) have been preferred over the parallel ones. So that, now, high-speed serial links are commonly used in Trigger and Data Acquisition (TDAQ) systems of HEP experiments, not only for data transfer, but also for the distribution of trigger and control systems.
Examples of their wide use can be found at CERN, where each of the four big experiments mounted on the Large Hadron Collider (LHC) uses a huge amount of serial links in its read out system. Again at LHC, the Timing, Trigger and Control system (TTC), which broadcasts the timing signals, from the LHC machine to the experiments, uses optical serial link to distribute signals over kilometers of distance (diameter of LHC is 27 Km). Also for upgrades of LHC, physical layer components and protocol chips (ASIC) have been designed and are now under development: the Versatile Link and the GBT protocol (and ASICs) whose peculiarity relies in their radiation hardness.
This PhD project is intended to respond to the requests of HEP experiments, developing:
- a high-speed self-adapting serial link, which can be easily used in different application fields;
- the serial interface of a read out board in the end-cap region of ATLAS Experiment at LHC;
- the interface board for the barrel read out system of the ATLAS Experiments.
Both the two last projects have required the development of fixed latency, high-speed serial links.
In order to take advantage of flexibility, re-programmability and system integration of SRAM-based Field Programmable Gate Array devices (FPGAs), their serializer-deserializer (SERDES) embedded modules have been chosen for the development of the links. However, as a drawback, FPGA embedded SERDESes are typically designed for applications that do not require a deterministic latenc. Then, an accurate study of their architecture has been necessary, in order to find a configuration and a clocking scheme to guarantee a deterministic transmission delay in data transfers.
The frequency agile, auto-adaptive serial link is capable to analyze the incoming data stream, by scanning the Unit Interval, and to find the highest transmission line rate, according to a given tolerated Bit Error Ratio (BER).
It uses a new feature (RX eye margin analysis) of the RX side of the Xilinx 7 series FPGAs high-speed transceivers (GTX/GTH), in order to measure and display the receiver eye margin after the equalizer.
When the new eye scan functionality is running, an additional sampler is activated in the GTX. It acquires a new sample (Offset Sample), with programmable (horizontal and vertical) offsets from the data sample point (Data Sample) used in standard operation.
An eye scan measurement run is performed by acquiring a large number of Data Samples (which can range from tens of thousands to 1014 or more) and by counting the number of times the Offset Sample has a different value with respect to the Data Sample; the latter number is often called Error Count. The BER at a specific vertical and horizontal offset is given by the ratio between the Error Count and the Sample Count. By repeating the eye scan measurement for each horizontal and vertical offset in the Unit Interval (or in a part of the U.I.) a 2-D BER map can be produced which is usually called Statistical Eye.
The auto-adaptive derail ink is designed around an FPGA-embedded microprocessor, which drives the programmable ports of the GTX, in order to perform a 2-D eye-scan, and takes care of the reconfiguration of the GTX parameters, in order to fully benefit from the available link bandwidth.
Xilinx provides a standalone tool that allows performing the Eye Scan Analysis on the receiver side of the GTX/GTH transceiver, using the MicroBlaze Micro Controller System macro; the toolkit also includes the Eye Scan algorithm (providing the C code). Moreover, Xilinx supplies the hardware sources files for the implementation of a link based on the XAUI protocol, in which the GTXs are arranged in a loopback configuration.
The original contribution of this work consists in the build-up, design and optimization of a full architecture, on top of the basic Xilinx tool, which:
- drives the programmable ports of the GTX in order to modify the line rate of the link;
- runs consecutive eye scans for various line rate;
- analyses the results of the different scans, in order to find the maximum line rate sustainable by the link;
- manages the synchronization between the transmitter and the receiver of the link, that will be needed at each line rate change.
The application can be deployed as a monitoring tool in HEP experiments, in order to remotely monitor a transmission system or detect issues in the serial link physical layer. An application example could be some of the many experiments at Large Hadron Collider (LHC) at CERN, which have been intensively using different serial links, both for transmission of TTC signals and for trigger and data readout.
Besides, this solution could be easily adapted in wide, different frameworks, as it can be used on top of any user’s existing link, as it has no specific requirement about link specification or protocol.
The other two serial interface developed in this project are in the framework of the ATLAS experiment.
ATLAS is one of the four detectors installed on the LHC proton-proton collider built at CERN. It was designed to collide two opposing particle beams at an energy of 14 TeV and to reach a luminosity of 1034 cm-2/s. In order to reach the design parameters, the LHC system will be upgraded in several phases. In order to take advantage of the improved LHC operation, the ATLAS detector must be upgraded following the same schedule as the LHC upgrade.
The main focus of the Phase-I ATLAS upgrade (to be completed by 2018) is on the Level-1 trigger where upgrades are planned for both the muon and the calorimeter trigger systems.
In particular, for the end-cap region of the muon spectrometer, the installation of a new set of precision tracking and trigger detectors was approved, called the ‘New Small Wheels’ (NSW). It will be instrumented with micro-mesh gaseous structure detectors (MM) and small-strip Thin Gap Chambers (sTGC). These detectors will solve two points of particular importance at high luminosity: high rate of fake high-pt level-1 muon triggers, and high L1 muon rate with the current momentum threshold. With the introduction of new detectors, new electronics need to be developed, in particular new trigger electronics for both the MM and sTGC. I was involved in the development of serial interface of the FPGA-based sTGC trigger board that uses information from the coarse sTGC readout pads.
The sTGC pad trigger board receives serial information coming from 24 front-end chips at 4.8 Gb/s. On the board, data are deserialised, aligned and analyzed by the trigger algorithm. The trigger logic processes the data and choses two candidates at each Bunch Crossing. The result is then serialised and used for selective fine-grained strip readout. I developed the pad trigger board interface logic. The data format from the front-end chips has been agreed upon, and defines the requirements on the receiver and decoding logic. The number of output lines is 24 and the data are 8B/10B formatted. While the receiver uses the Xilinx Kintex-7 GTX transceivers, the output lines are driven by double data rate (DDR) shift registers at 640 Mb/s. A fixed latency in the sTGC trigger chain was guaranteed through the implementation and configuration of all serialisers and deserialisers. In order to test the project, I also developed a simple microprocessor-based protocol for accessing the board via terminal (rs232). A demonstrator board is now being developed.
Another Phase-I Level-1 trigger upgrade consists of a new Muon to Central Trigger Processor Interface (MUCTPI). The MUCTPI receives muon candidate information from each of the muon detectors, selects muon candidates and sends them to the Central Trigger Processor (CTP). In the first runs of ATLAS, the L1 Barrel trigger candidate data were transferred to the MuCTPI via copper cables. In order to cope with the trigger upgrade, serial optical links are necessary. The optical links will provide a much higher bandwidth (up to 6.4 Gb/s) which will be used to transfer additional information from the sector logic modules, for example data for more than two muon candidates. They will also provide a lower transmission latency. I developed the interface board between the new MUCTPI and the Resistive Plate Chambers (RPC) muon trigger, using the Xilinx Artix-7 FPGA GTP transceivers. I took care of the study of feasibility of the new serial optical transmitter and the logic for the new data format. Also in this case, the fixed latency has been a requirement to be fulfilled
New Hardware Architecture for Low-Cost Functional Test Systems Applications to HDMI generation
English: Development of a new test hardware architecture for functional test systems. Development of a proof-of-concept prototype for HDMI generation.Castellano: Desarrollo de una nueva arquitectura para equipos de test destinados a máquinas de test funcional de PCBs. Desarrollo de un prototipo de demostración destinado a la generación de HDMI.Català : Desenvolupament d'una nova arquitectura per equips de test destinats a mà quines de test funcional de PCB. Desenvolupament d'un prototip de demostració destinat a generació d'HDM