43 research outputs found
Dynamically reconfigurable bio-inspired hardware
During the last several years, reconfigurable computing devices have experienced an impressive development in their resource availability, speed, and configurability. Currently, commercial FPGAs offer the possibility of self-reconfiguring by partially modifying their configuration bitstream, providing high architectural flexibility, while guaranteeing high performance. These configurability features have received special interest from computer architects: one can find several reconfigurable coprocessor architectures for cryptographic algorithms, image processing, automotive applications, and different general purpose functions. On the other hand we have bio-inspired hardware, a large research field taking inspiration from living beings in order to design hardware systems, which includes diverse topics: evolvable hardware, neural hardware, cellular automata, and fuzzy hardware, among others. Living beings are well known for their high adaptability to environmental changes, featuring very flexible adaptations at several levels. Bio-inspired hardware systems require such flexibility to be provided by the hardware platform on which the system is implemented. In general, bio-inspired hardware has been implemented on both custom and commercial hardware platforms. These custom platforms are specifically designed for supporting bio-inspired hardware systems, typically featuring special cellular architectures and enhanced reconfigurability capabilities; an example is their partial and dynamic reconfigurability. These aspects are very well appreciated for providing the performance and the high architectural flexibility required by bio-inspired systems. However, the availability and the very high costs of such custom devices make them only accessible to a very few research groups. Even though some commercial FPGAs provide enhanced reconfigurability features such as partial and dynamic reconfiguration, their utilization is still in its early stages and they are not well supported by FPGA vendors, thus making their use difficult to include in existing bio-inspired systems. In this thesis, I present a set of architectures, techniques, and methodologies for benefiting from the configurability advantages of current commercial FPGAs in the design of bio-inspired hardware systems. Among the presented architectures there are neural networks, spiking neuron models, fuzzy systems, cellular automata and random boolean networks. For these architectures, I propose several adaptation techniques for parametric and topological adaptation, such as hebbian learning, evolutionary and co-evolutionary algorithms, and particle swarm optimization. Finally, as case study I consider the implementation of bio-inspired hardware systems in two platforms: YaMoR (Yet another Modular Robot) and ROPES (Reconfigurable Object for Pervasive Systems); the development of both platforms having been co-supervised in the framework of this thesis
802.16e System Profile for NASA Extra-Vehicular Activities
This report identifies an 802.16e system profile that is applicable to a lunar surface wireless network, and specifically for meeting extra-vehicular activity (EVA) data flow requirements. EVA suit communication needs are addressed. Design-driving operational scenarios are considered. These scenarios are then used to identify a configuration of the 802.16e system (system profile) that meets EVA requirements, but also aim to make the radio realizable within EVA constraints. Limitations of this system configuration are highlighted. An overview and development status is presented by Toyon Research Corporation concerning the development of an 802.16e compatible modem under NASA s Small Business Innovative Research (SBIR) Program. This modem is based on the recommended system profile developed as part of this report. Last, a path forward is outlined that presents an evolvable solution for the EVA radio system and lunar surface radio networks. This solution is based on a custom link layer, and 802.16e compliant physical layer compliant to the identified system profile, and a later progression to a fully interoperable 802.16e system
High-performance hardware accelerators for image processing in space applications
Mars is a hard place to reach. While there have been many notable success stories in getting probes to the Red Planet, the historical record is full of bad news. The success rate for actually landing on the Martian surface is even worse, roughly 30%. This low success rate must be mainly credited to the Mars environment characteristics. In the Mars atmosphere strong winds frequently breath. This phenomena usually modifies the lander descending trajectory diverging it from the target one. Moreover, the Mars surface is not the best place where performing a safe land. It is pitched by many and close craters and huge stones, and characterized by huge mountains and hills (e.g., Olympus Mons is 648 km in diameter and 27 km tall). For these reasons a mission failure due to a landing in huge craters, on big stones or on part of the surface characterized by a high slope is highly probable.
In the last years, all space agencies have increased their research efforts in order to enhance the success rate of Mars missions. In particular, the two hottest research topics are: the active debris removal and the guided landing on Mars.
The former aims at finding new methods to remove space debris exploiting unmanned spacecrafts. These must be able to autonomously: detect a debris, analyses it, in order to extract its characteristics in terms of weight, speed and dimension, and, eventually, rendezvous with it. In order to perform these tasks, the spacecraft must have high vision capabilities. In other words, it must be able to take pictures and process them with very complex image processing algorithms in order to detect, track and analyse the debris.
The latter aims at increasing the landing point precision (i.e., landing ellipse) on Mars. Future space-missions will increasingly adopt Video Based Navigation systems to assist the entry, descent and landing (EDL) phase of space modules (e.g., spacecrafts), enhancing the precision of automatic EDL navigation systems. For instance, recent space exploration missions, e.g., Spirity, Oppurtunity, and Curiosity, made use of an EDL procedure aiming at following a fixed and precomputed descending trajectory to reach a precise landing point. This approach guarantees a maximum landing point precision of 20 km. By comparing this data with the Mars environment characteristics, it is possible to understand how the mission failure probability still remains really high.
A very challenging problem is to design an autonomous-guided EDL system able to even more reduce the landing ellipse, guaranteeing to avoid the landing in dangerous area of Mars surface (e.g., huge craters or big stones) that could lead to the mission failure. The autonomous behaviour of the system is mandatory since a manual driven approach is not feasible due to the distance between Earth and Mars. Since this distance varies from 56 to 100 million of km approximately due to the orbit eccentricity, even if a signal transmission at the light speed could be possible, in the best case the transmission time would be around 31 minutes, exceeding so the overall duration of the EDL phase.
In both applications, algorithms must guarantee self-adaptability to the environmental conditions. Since the Mars (and in general the space) harsh conditions are difficult to be predicted at design time, these algorithms must be able to automatically tune the internal parameters depending on the current conditions.
Moreover, real-time performances are another key factor. Since a software implementation of these computational intensive tasks cannot reach the required performances, these algorithms must be accelerated via hardware.
For this reasons, this thesis presents my research work done on advanced image processing algorithms for space applications and the associated hardware accelerators. My research activity has been focused on both the algorithm and their hardware implementations. Concerning the first aspect, I mainly focused my research effort to integrate self-adaptability features in the existing algorithms. While concerning the second, I studied and validated a methodology to efficiently develop, verify and validate hardware components aimed at accelerating video-based applications. This approach allowed me to develop and test high performance hardware accelerators that strongly overcome the performances of the actual state-of-the-art implementations.
The thesis is organized in four main chapters.
Chapter 2 starts with a brief introduction about the story of digital image processing. The main content of this chapter is the description of space missions in which digital image processing has a key role. A major effort has been spent on the missions in which my research activity has a substantial impact. In particular, for these missions, this chapter deeply analizes and evaluates the state-of-the-art approaches and algorithms.
Chapter 3 analyzes and compares the two technologies used to implement high performances hardware accelerators, i.e., Application Specific Integrated Circuits (ASICs) and Field Programmable Gate Arrays (FPGAs). Thanks to this information the reader may understand the main reasons behind the decision of space agencies to exploit FPGAs instead of ASICs for high-performance hardware accelerators in space missions, even if FPGAs are more sensible to Single Event Upsets (i.e., transient error induced on hardware component by alpha particles and solar radiation in space). Moreover, this chapter deeply describes the three available space-grade FPGA technologies (i.e., One-time Programmable, Flash-based, and SRAM-based), and the main fault-mitigation techniques against SEUs that are mandatory for employing space-grade FPGAs in actual missions.
Chapter 4 describes one of the main contribution of my research work: a library of high-performance hardware accelerators for image processing in space applications. The basic idea behind this library is to offer to designers a set of validated hardware components able to strongly speed up the basic image processing operations commonly used in an image processing chain. In other words, these components can be directly used as elementary building blocks to easily create a complex image processing system, without wasting time in the debug and validation phase. This library groups the proposed hardware accelerators in IP-core families. The components contained in a same family share the same provided functionality and input/output interface. This harmonization in the I/O interface enables to substitute, inside a complex image processing system, components of the same family without requiring modifications to the system communication infrastructure. In addition to the analysis of the internal architecture of the proposed components, another important aspect of this chapter is the methodology used to develop, verify and validate the proposed high performance image processing hardware accelerators. This methodology involves the usage of different programming and hardware description languages in order to support the designer from the algorithm modelling up to the hardware implementation and validation.
Chapter 5 presents the proposed complex image processing systems. In particular, it exploits a set of actual case studies, associated with the most recent space agency needs, to show how the hardware accelerator components can be assembled to build a complex image processing system. In addition to the hardware accelerators contained in the library, the described complex system embeds innovative ad-hoc hardware components and software routines able to provide high performance and self-adaptable image processing functionalities. To prove the benefits of the proposed methodology, each case study is concluded with a comparison with the current state-of-the-art implementations, highlighting the benefits in terms of performances and self-adaptability to the environmental conditions
Técnicas de pré-codificação para sistemas multicelulares coordenados
Doutoramento em TelecomunicaçõesCoordenação Multicélula é um tópico de investigação em rápido
crescimento e uma solução promissora para controlar a interferência entre
células em sistemas celulares, melhorando a equidade do sistema e
aumentando a sua capacidade. Esta tecnologia já está em estudo no LTEAdvanced
sob o conceito de coordenação multiponto (COMP). Existem
várias abordagens sobre coordenação multicélula, dependendo da
quantidade e do tipo de informação partilhada pelas estações base, através
da rede de suporte (backhaul network), e do local onde essa informação é
processada, i.e., numa unidade de processamento central ou de uma forma
distribuída em cada estação base.
Nesta tese, são propostas técnicas de pré-codificação e alocação de
potência considerando várias estratégias: centralizada, todo o
processamento é feito na unidade de processamento central; semidistribuída,
neste caso apenas parte do processamento é executado na
unidade de processamento central, nomeadamente a potência alocada a
cada utilizador servido por cada estação base; e distribuída em que o
processamento é feito localmente em cada estação base. Os esquemas
propostos são projectados em duas fases: primeiro são propostas soluções
de pré-codificação para mitigar ou eliminar a interferência entre células,
de seguida o sistema é melhorado através do desenvolvimento de vários
esquemas de alocação de potência. São propostas três esquemas de
alocação de potência centralizada condicionada a cada estação base e com
diferentes relações entre desempenho e complexidade. São também
derivados esquemas de alocação distribuídos, assumindo que um sistema
multicelular pode ser visto como a sobreposição de vários sistemas com
uma única célula. Com base neste conceito foi definido uma taxa de erro
média virtual para cada um desses sistemas de célula única que compõem
o sistema multicelular, permitindo assim projectar esquemas de alocação
de potência completamente distribuídos.
Todos os esquemas propostos foram avaliados em cenários realistas,
bastante próximos dos considerados no LTE. Os resultados mostram que
os esquemas propostos são eficientes a remover a interferência entre
células e que o desempenho das técnicas de alocação de potência
propostas é claramente superior ao caso de não alocação de potência. O
desempenho dos sistemas completamente distribuídos é inferior aos
baseados num processamento centralizado, mas em contrapartida podem
ser usados em sistemas em que a rede de suporte não permita a troca de
grandes quantidades de informação.Multicell coordination is a promising solution for cellular wireless systems
to mitigate inter-cell interference, improving system fairness and
increasing capacity and thus is already under study in LTE-A under the
coordinated multipoint (CoMP) concept. There are several coordinated
transmission approaches depending on the amount of information shared
by the transmitters through the backhaul network and where the
processing takes place i.e. in a central processing unit or in a distributed
way on each base station.
In this thesis, we propose joint precoding and power allocation techniques
considering different strategies: Full-centralized, where all the processing
takes place at the central unit; Semi-distributed, in this case only some
process related with power allocation is done at the central unit; and Fulldistributed,
where all the processing is done locally at each base station.
The methods are designed in two phases: first the inter-cell interference is
removed by applying a set of centralized or distributed precoding vectors;
then the system is further optimized by centralized or distributed power
allocation schemes. Three centralized power allocation algorithms with
per-BS power constraint and different complexity tradeoffs are proposed.
Also distributed power allocation schemes are proposed by considering
the multicell system as superposition of single cell systems, where we
define the average virtual bit error rate (BER) of interference-free single
cell system, allowing us to compute the power allocation coefficients in a
distributed manner at each BS.
All proposed schemes are evaluated in realistic scenarios considering LTE
specifications. The numerical evaluations show that the proposed schemes
are efficient in removing inter-cell interference and improve system
performance comparing to equal power allocation. Furthermore, fulldistributed
schemes can be used when the amounts of information to be
exchanged over the backhaul is restricted, although system performance is
slightly degraded from semi-distributed and full-centralized schemes, but
the complexity is considerably lower. Besides that for high degrees of
freedom distributed schemes show similar behaviour to centralized ones
FPGA structures for high speed and low overhead dynamic circuit specialization
A Field Programmable Gate Array (FPGA) is a programmable digital electronic chip. The FPGA does not come with a predefined function from the manufacturer; instead, the developer has to define its function through implementing a digital circuit on the FPGA resources. The functionality of the FPGA can be reprogrammed as desired and hence the name “field programmable”. FPGAs are useful in small volume digital electronic products as the design of a digital custom chip is expensive. Changing the FPGA (also called configuring it) is done by changing the configuration data (in the form of bitstreams) that defines the FPGA functionality. These bitstreams are stored in a memory of the FPGA called configuration memory. The SRAM cells of LookUp Tables (LUTs), Block Random Access Memories (BRAMs) and DSP blocks together form the configuration memory of an FPGA. The configuration data can be modified according to the user’s needs to implement the user-defined hardware. The simplest way to program the configuration memory is to download the bitstreams using a JTAG interface. However, modern techniques such as Partial Reconfiguration (PR) enable us to configure a part in the configuration memory with partial bitstreams during run-time. The reconfiguration
is achieved by swapping in partial bitstreams into the configuration memory via a configuration interface called Internal Configuration Access Port (ICAP). The ICAP is a hardware primitive (macro) present in the FPGA used to access the
configuration memory internally by an embedded processor. The reconfiguration technique adds flexibility to use specialized ci rcuits that are more compact and more efficient t han t heir b ulky c ounterparts. An example of such an implementation is the use of specialized multipliers instead of big generic multipliers in an FIR implementation with constant coefficients. To specialize these circuits and reconfigure during the run-time, researchers at the HES group proposed the novel technique called parameterized reconfiguration that can be used to efficiently and automatically implement Dynamic Circuit Specialization (DCS) that is built on top of the Partial Reconfiguration method. It uses
the run-time reconfiguration technique that is tailored to implement a parameterized design. An application is said to be parameterized if some of its input values change much less frequently than the rest. These inputs are called parameters. Instead of implementing these parameters as regular inputs, in DCS these inputs are implemented as constants, and the application is optimized for the constants. For every change in parameter values, the design is re-optimized (specialized) during run-time and implemented by reconfiguring the optimized design for a new set of parameters. In DCS, the bitstreams of the parameterized design are expressed as Boolean functions of the parameters. For every infrequent change in parameters, a specialized FPGA configuration is generated by evaluating the corresponding Boolean functions, and the FPGA is reconfigured with the specialized configuration. A detailed study of overheads of DCS and providing suitable solutions with appropriate custom FPGA structures is the primary goal of the dissertation. I also suggest different improvements to the FPGA configuration memory architecture. After offering the custom FPGA structures, I investigated the role of DCS on FPGA overlays and the use of custom FPGA structures that help to reduce the overheads of DCS on FPGA overlays. By doing so, I hope I can convince the developer to use DCS (which now comes with minimal costs) in real-world applications. I start the investigations of overheads of DCS by implementing an adaptive FIR filter (using the DCS technique) on three different Xilinx FPGA platforms: Virtex-II Pro, Virtex-5, and Zynq-SoC. The study of how DCS behaves and what is its overhead in the evolution of the three FPGA platforms is the non-trivial basis to discover the costs of DCS. After that, I propose custom FPGA structures (reconfiguration controllers and reconfiguration drivers) to reduce the main overhead (reconfiguration time) of DCS. These structures not only reduce the reconfiguration time but also help curbing the power hungry part of the DCS system. After these chapters, I study the role of DCS on FPGA overlays. I investigate the effect of the proposed FPGA structures on Virtual-Coarse-Grained Reconfigurable Arrays (VCGRAs). I classify the VCGRA implementations into three types: the conventional VCGRA, partially parameterized VCGRA and fully parameterized VCGRA depending upon the level of parameterization. I have designed two variants of VCGRA grids for HPC image processing applications,
namely, the MAC grid and Pixie. Finally, I try to tackle the reconfiguration time overhead at the hardware level of the FPGA by customizing the FPGA configuration memory architecture. In this part of my research, I propose to use a parallel memory structure to improve the reconfiguration time of DCS drastically. However, this improvement comes with a
significant overhead of hardware resources which will need to be solved in future research on commercial FPGA configuration memory architectures
Survey of FPGA applications in the period 2000 – 2015 (Technical Report)
Romoth J, Porrmann M, Rückert U. Survey of FPGA applications in the period 2000 – 2015 (Technical Report).; 2017.Since their introduction, FPGAs can be seen in more and more different fields of applications. The key advantage is the combination of software-like flexibility with the performance otherwise common to hardware. Nevertheless, every application field introduces special requirements to the used computational architecture. This paper provides an overview of the different topics FPGAs have been used for in the last 15 years of research and why they have been chosen over other processing units like e.g. CPUs
Recommended from our members
Adaptive evolution in static and dynamic environments
This thesis provides a framework for describing a canonical evolutionary system. Populations of individuals are envisaged as traversing a search space structured by genetic and developmental operators under the influence of selection. Selection acts on individuals' phenotypic expressions, guiding the population over an evaluation landscape, which describes an idealised evaluation surface over the phenotypic space. The corresponding valuation landscape describes evaluations over the genotypic space and may be transformed by within generation adaptive (learning) or maladaptive (fault induction) local search.
Populations subjected to particular genetic and selection operators are claimed to evolve towards a region of the valuation landscape with a characteristic local ruggedness, as given by the runtime operator correlation coefficient. This corresponds to the view of evolution discovering an evolutionarily stable population, or quasi-species, held in a state of dynamic equilibrium by the operator set and evaluation function. This is demonstrated by genetic algorithm experiments using the NK landscapes and a novel, evolvable evaluation function, The Tower of Babel. In fluctuating environments of varying temporal ruggedness, different operator sets are correspondingly more or less adapted.
Quantitative genetics analyses of populations in sinusoidally fluctuating conditions are shown to describe certain well known electronic filters. This observation suggests the notion of Evolutionary Signal Processing. Genetic algorithm experiments in which a population tracks a sinusoidally fluctuating optimum support this view. Using a self-adaptive mutation rate, it is possible to tune the evolutionary filter to the environmental frequency. For a time varying frequency, the mutation rate reacts accordingly. With local search, the valuation landscape is transformed through temporal smoothing. By coevolving modifier genes for individual learning and the rate at which the benefits may be directly transmitted to the next generation, the relative adaptedness of individual learning and cultural inheritance according to the rate of environmental change is demonstrated
Intrinsic Hardware Evolution on the Transistor Level
This thesis presents a novel approach to the automated synthesis of analog circuits. Evolutionary algorithms are used in conjunction with a fitness evaluation on a dedicated ASIC that serves as the analog substrate for the newly bred candidate solutions. The advantage of evaluating the candidate circuits directly in hardware is twofold. First, it may speed up the evolutionary algorithms, because hardware tests can usually be performed faster than simulations. Second, the evolved circuits are guaranteed to work on a real piece of silicon. The proposed approach is realized as a hardware evolution system consisting of an IBM compatible general purpose computer that hosts the evolutionary algorithm, an FPGA-based mixed signal test board, and the analog substrate. The latter one is designed as a Field Programmable Transistor Array (FPTA) whose programmable transistor cells can be almost freely connected. The transistor cells can be configured to adopt one out of 75 different channel geometries. The chip was produced in a 0.6µm CMOS process and provides ample means for the input and output of analog signals. The configuration is stored in SRAM cells embedded in the programmable transistor cells. The hardware evolution system is used for numerous evolution experiments targeted at a wide variety of different circuit functionalities. These comprise logic gates, Gaussian function circuits, D/A converters, low- and highpass filters, tone discriminators, and comparators. The experimental results are thoroughly analyzed and discussed with respect to related work