292 research outputs found

    Enhancing Real-time Embedded Image Processing Robustness on Reconfigurable Devices for Critical Applications

    Get PDF
    Nowadays, image processing is increasingly used in several application fields, such as biomedical, aerospace, or automotive. Within these fields, image processing is used to serve both non-critical and critical tasks. As example, in automotive, cameras are becoming key sensors in increasing car safety, driving assistance and driving comfort. They have been employed for infotainment (non-critical), as well as for some driver assistance tasks (critical), such as Forward Collision Avoidance, Intelligent Speed Control, or Pedestrian Detection. The complexity of these algorithms brings a challenge in real-time image processing systems, requiring high computing capacity, usually not available in processors for embedded systems. Hardware acceleration is therefore crucial, and devices such as Field Programmable Gate Arrays (FPGAs) best fit the growing demand of computational capabilities. These devices can assist embedded processors by significantly speeding-up computationally intensive software algorithms. Moreover, critical applications introduce strict requirements not only from the real-time constraints, but also from the device reliability and algorithm robustness points of view. Technology scaling is highlighting reliability problems related to aging phenomena, and to the increasing sensitivity of digital devices to external radiation events that can cause transient or even permanent faults. These faults can lead to wrong information processed or, in the worst case, to a dangerous system failure. In this context, the reconfigurable nature of FPGA devices can be exploited to increase the system reliability and robustness by leveraging Dynamic Partial Reconfiguration features. The research work presented in this thesis focuses on the development of techniques for implementing efficient and robust real-time embedded image processing hardware accelerators and systems for mission-critical applications. Three main challenges have been faced and will be discussed, along with proposed solutions, throughout the thesis: (i) achieving real-time performances, (ii) enhancing algorithm robustness, and (iii) increasing overall system's dependability. In order to ensure real-time performances, efficient FPGA-based hardware accelerators implementing selected image processing algorithms have been developed. Functionalities offered by the target technology, and algorithm's characteristics have been constantly taken into account while designing such accelerators, in order to efficiently tailor algorithm's operations to available hardware resources. On the other hand, the key idea for increasing image processing algorithms' robustness is to introduce self-adaptivity features at algorithm level, in order to maintain constant, or improve, the quality of results for a wide range of input conditions, that are not always fully predictable at design-time (e.g., noise level variations). This has been accomplished by measuring at run-time some characteristics of the input images, and then tuning the algorithm parameters based on such estimations. Dynamic reconfiguration features of modern reconfigurable FPGA have been extensively exploited in order to integrate run-time adaptivity into the designed hardware accelerators. Tools and methodologies have been also developed in order to increase the overall system dependability during reconfiguration processes, thus providing safe run-time adaptation mechanisms. In addition, taking into account the target technology and the environments in which the developed hardware accelerators and systems may be employed, dependability issues have been analyzed, leading to the development of a platform for quickly assessing the reliability and characterizing the behavior of hardware accelerators implemented on reconfigurable FPGAs when they are affected by such faults

    Custom optimization algorithms for efficient hardware implementation

    No full text
    The focus is on real-time optimal decision making with application in advanced control systems. These computationally intensive schemes, which involve the repeated solution of (convex) optimization problems within a sampling interval, require more efficient computational methods than currently available for extending their application to highly dynamical systems and setups with resource-constrained embedded computing platforms. A range of techniques are proposed to exploit synergies between digital hardware, numerical analysis and algorithm design. These techniques build on top of parameterisable hardware code generation tools that generate VHDL code describing custom computing architectures for interior-point methods and a range of first-order constrained optimization methods. Since memory limitations are often important in embedded implementations we develop a custom storage scheme for KKT matrices arising in interior-point methods for control, which reduces memory requirements significantly and prevents I/O bandwidth limitations from affecting the performance in our implementations. To take advantage of the trend towards parallel computing architectures and to exploit the special characteristics of our custom architectures we propose several high-level parallel optimal control schemes that can reduce computation time. A novel optimization formulation was devised for reducing the computational effort in solving certain problems independent of the computing platform used. In order to be able to solve optimization problems in fixed-point arithmetic, which is significantly more resource-efficient than floating-point, tailored linear algebra algorithms were developed for solving the linear systems that form the computational bottleneck in many optimization methods. These methods come with guarantees for reliable operation. We also provide finite-precision error analysis for fixed-point implementations of first-order methods that can be used to minimize the use of resources while meeting accuracy specifications. The suggested techniques are demonstrated on several practical examples, including a hardware-in-the-loop setup for optimization-based control of a large airliner.Open Acces

    Industrial applications of the Kalman filter:a review

    Get PDF
    International audienc

    Intrinsically Evolvable Artificial Neural Networks

    Get PDF
    Dedicated hardware implementations of neural networks promise to provide faster, lower power operation when compared to software implementations executing on processors. Unfortunately, most custom hardware implementations do not support intrinsic training of these networks on-chip. The training is typically done using offline software simulations and the obtained network is synthesized and targeted to the hardware offline. The FPGA design presented here facilitates on-chip intrinsic training of artificial neural networks. Block-based neural networks (BbNN), the type of artificial neural networks implemented here, are grid-based networks neuron blocks. These networks are trained using genetic algorithms to simultaneously optimize the network structure and the internal synaptic parameters. The design supports online structure and parameter updates, and is an intrinsically evolvable BbNN platform supporting functional-level hardware evolution. Functional-level evolvable hardware (EHW) uses evolutionary algorithms to evolve interconnections and internal parameters of functional modules in reconfigurable computing systems such as FPGAs. Functional modules can be any hardware modules such as multipliers, adders, and trigonometric functions. In the implementation presented, the functional module is a neuron block. The designed platform is suitable for applications in dynamic environments, and can be adapted and retrained online. The online training capability has been demonstrated using a case study. A performance characterization model for RC implementations of BbNNs has also been presented

    High-performance hardware accelerators for image processing in space applications

    Get PDF
    Mars is a hard place to reach. While there have been many notable success stories in getting probes to the Red Planet, the historical record is full of bad news. The success rate for actually landing on the Martian surface is even worse, roughly 30%. This low success rate must be mainly credited to the Mars environment characteristics. In the Mars atmosphere strong winds frequently breath. This phenomena usually modifies the lander descending trajectory diverging it from the target one. Moreover, the Mars surface is not the best place where performing a safe land. It is pitched by many and close craters and huge stones, and characterized by huge mountains and hills (e.g., Olympus Mons is 648 km in diameter and 27 km tall). For these reasons a mission failure due to a landing in huge craters, on big stones or on part of the surface characterized by a high slope is highly probable. In the last years, all space agencies have increased their research efforts in order to enhance the success rate of Mars missions. In particular, the two hottest research topics are: the active debris removal and the guided landing on Mars. The former aims at finding new methods to remove space debris exploiting unmanned spacecrafts. These must be able to autonomously: detect a debris, analyses it, in order to extract its characteristics in terms of weight, speed and dimension, and, eventually, rendezvous with it. In order to perform these tasks, the spacecraft must have high vision capabilities. In other words, it must be able to take pictures and process them with very complex image processing algorithms in order to detect, track and analyse the debris. The latter aims at increasing the landing point precision (i.e., landing ellipse) on Mars. Future space-missions will increasingly adopt Video Based Navigation systems to assist the entry, descent and landing (EDL) phase of space modules (e.g., spacecrafts), enhancing the precision of automatic EDL navigation systems. For instance, recent space exploration missions, e.g., Spirity, Oppurtunity, and Curiosity, made use of an EDL procedure aiming at following a fixed and precomputed descending trajectory to reach a precise landing point. This approach guarantees a maximum landing point precision of 20 km. By comparing this data with the Mars environment characteristics, it is possible to understand how the mission failure probability still remains really high. A very challenging problem is to design an autonomous-guided EDL system able to even more reduce the landing ellipse, guaranteeing to avoid the landing in dangerous area of Mars surface (e.g., huge craters or big stones) that could lead to the mission failure. The autonomous behaviour of the system is mandatory since a manual driven approach is not feasible due to the distance between Earth and Mars. Since this distance varies from 56 to 100 million of km approximately due to the orbit eccentricity, even if a signal transmission at the light speed could be possible, in the best case the transmission time would be around 31 minutes, exceeding so the overall duration of the EDL phase. In both applications, algorithms must guarantee self-adaptability to the environmental conditions. Since the Mars (and in general the space) harsh conditions are difficult to be predicted at design time, these algorithms must be able to automatically tune the internal parameters depending on the current conditions. Moreover, real-time performances are another key factor. Since a software implementation of these computational intensive tasks cannot reach the required performances, these algorithms must be accelerated via hardware. For this reasons, this thesis presents my research work done on advanced image processing algorithms for space applications and the associated hardware accelerators. My research activity has been focused on both the algorithm and their hardware implementations. Concerning the first aspect, I mainly focused my research effort to integrate self-adaptability features in the existing algorithms. While concerning the second, I studied and validated a methodology to efficiently develop, verify and validate hardware components aimed at accelerating video-based applications. This approach allowed me to develop and test high performance hardware accelerators that strongly overcome the performances of the actual state-of-the-art implementations. The thesis is organized in four main chapters. Chapter 2 starts with a brief introduction about the story of digital image processing. The main content of this chapter is the description of space missions in which digital image processing has a key role. A major effort has been spent on the missions in which my research activity has a substantial impact. In particular, for these missions, this chapter deeply analizes and evaluates the state-of-the-art approaches and algorithms. Chapter 3 analyzes and compares the two technologies used to implement high performances hardware accelerators, i.e., Application Specific Integrated Circuits (ASICs) and Field Programmable Gate Arrays (FPGAs). Thanks to this information the reader may understand the main reasons behind the decision of space agencies to exploit FPGAs instead of ASICs for high-performance hardware accelerators in space missions, even if FPGAs are more sensible to Single Event Upsets (i.e., transient error induced on hardware component by alpha particles and solar radiation in space). Moreover, this chapter deeply describes the three available space-grade FPGA technologies (i.e., One-time Programmable, Flash-based, and SRAM-based), and the main fault-mitigation techniques against SEUs that are mandatory for employing space-grade FPGAs in actual missions. Chapter 4 describes one of the main contribution of my research work: a library of high-performance hardware accelerators for image processing in space applications. The basic idea behind this library is to offer to designers a set of validated hardware components able to strongly speed up the basic image processing operations commonly used in an image processing chain. In other words, these components can be directly used as elementary building blocks to easily create a complex image processing system, without wasting time in the debug and validation phase. This library groups the proposed hardware accelerators in IP-core families. The components contained in a same family share the same provided functionality and input/output interface. This harmonization in the I/O interface enables to substitute, inside a complex image processing system, components of the same family without requiring modifications to the system communication infrastructure. In addition to the analysis of the internal architecture of the proposed components, another important aspect of this chapter is the methodology used to develop, verify and validate the proposed high performance image processing hardware accelerators. This methodology involves the usage of different programming and hardware description languages in order to support the designer from the algorithm modelling up to the hardware implementation and validation. Chapter 5 presents the proposed complex image processing systems. In particular, it exploits a set of actual case studies, associated with the most recent space agency needs, to show how the hardware accelerator components can be assembled to build a complex image processing system. In addition to the hardware accelerators contained in the library, the described complex system embeds innovative ad-hoc hardware components and software routines able to provide high performance and self-adaptable image processing functionalities. To prove the benefits of the proposed methodology, each case study is concluded with a comparison with the current state-of-the-art implementations, highlighting the benefits in terms of performances and self-adaptability to the environmental conditions

    Aspects of parallel processing and control engineering

    Get PDF
    The concept of parallel processing is not a new one, but the application of it to control engineering tasks is a relatively recent development, made possible by contemporary hardware and software innovation. It has long been accepted that, if properly orchestrated several processors/CPUs when combined can form a powerful processing entity. What prevented this from being implemented in commercial systems was the adequacy of the microprocessor for most tasks and hence the expense of a multi-processor system was not justified. With the advent of high demand systems, such as highly fault tolerant flight controllers and fast robotic controllers, parallel processing became a viable option. Nonetheless, the software interfacing of control laws onto parallel systems has remained somewhat of an impasse. There are no software compilers at present which allow a programmer to specify a control law in pure mathematical terminology and then decompose it into a flow diagram of concurrent processes which may then be implemented on, say, a target Transputer system, liiere are several parallel programming languages with which a programmer can generate parallel processes but, generally, in order to realise a control algorithm in parallel the programmer must have intimate knowledge of the algorithm. Therefore, efficiency is based on the ability of the programmer to recognise inherent parellelism. Some attempts are being made to create intelligent partition and scheduling compilers but this usually means significantly extra overheads on the multiprocessor system. In the absence of an automated technique control algorithms must be decomposed by inspection. The research presented in this thesis is founded upon the application of both parallel and pipelining techniques to particular control strategies. Parallelism is tackled objectively and by creating a tailored terminology it is defined mathematically, and consequently related concepts, such as bounded parallelism and algorithm speedup, are also quantified in a numerical sense. A pipelined explicit Self Tuning Regulator (STR) controller is developed and tested on systems of different order. Under the governance of the parallelism terminology the effectiveness of the parallel STR is evaluated and numerically quantified in terms of relevant performance indices. A parallel simulator is presented for the Puma 560 robotic manipulator. By exploiting parallelism and pipelinability in the robot model a significant increase in execution speed is achieved over the sequential model. The use of Transputers is examined and graphical results obtained for several performance indices, including speedup, processor efficiency and bounded parallelism. By the same analytical technique a parallel computed torque feedforward controller incorporating proportional derivative feedback control for the Puma 560 manipulator is developed and appraised. The performance of a Transputer system in hosting the controller is graphically analysed and as in the case of the parallel simulator the more important performance indices are examined under both optimal conditions and conditions of varying hardware constraints

    Design and debugging of multi-step analog to digital converters

    Get PDF
    With the fast advancement of CMOS fabrication technology, more and more signal-processing functions are implemented in the digital domain for a lower cost, lower power consumption, higher yield, and higher re-configurability. The trend of increasing integration level for integrated circuits has forced the A/D converter interface to reside on the same silicon in complex mixed-signal ICs containing mostly digital blocks for DSP and control. However, specifications of the converters in various applications emphasize high dynamic range and low spurious spectral performance. It is nontrivial to achieve this level of linearity in a monolithic environment where post-fabrication component trimming or calibration is cumbersome to implement for certain applications or/and for cost and manufacturability reasons. Additionally, as CMOS integrated circuits are accomplishing unprecedented integration levels, potential problems associated with device scaling – the short-channel effects – are also looming large as technology strides into the deep-submicron regime. The A/D conversion process involves sampling the applied analog input signal and quantizing it to its digital representation by comparing it to reference voltages before further signal processing in subsequent digital systems. Depending on how these functions are combined, different A/D converter architectures can be implemented with different requirements on each function. Practical realizations show the trend that to a first order, converter power is directly proportional to sampling rate. However, power dissipation required becomes nonlinear as the speed capabilities of a process technology are pushed to the limit. Pipeline and two-step/multi-step converters tend to be the most efficient at achieving a given resolution and sampling rate specification. This thesis is in a sense unique work as it covers the whole spectrum of design, test, debugging and calibration of multi-step A/D converters; it incorporates development of circuit techniques and algorithms to enhance the resolution and attainable sample rate of an A/D converter and to enhance testing and debugging potential to detect errors dynamically, to isolate and confine faults, and to recover and compensate for the errors continuously. The power proficiency for high resolution of multi-step converter by combining parallelism and calibration and exploiting low-voltage circuit techniques is demonstrated with a 1.8 V, 12-bit, 80 MS/s, 100 mW analog to-digital converter fabricated in five-metal layers 0.18-”m CMOS process. Lower power supply voltages significantly reduce noise margins and increase variations in process, device and design parameters. Consequently, it is steadily more difficult to control the fabrication process precisely enough to maintain uniformity. Microscopic particles present in the manufacturing environment and slight variations in the parameters of manufacturing steps can all lead to the geometrical and electrical properties of an IC to deviate from those generated at the end of the design process. Those defects can cause various types of malfunctioning, depending on the IC topology and the nature of the defect. To relive the burden placed on IC design and manufacturing originated with ever-increasing costs associated with testing and debugging of complex mixed-signal electronic systems, several circuit techniques and algorithms are developed and incorporated in proposed ATPG, DfT and BIST methodologies. Process variation cannot be solved by improving manufacturing tolerances; variability must be reduced by new device technology or managed by design in order for scaling to continue. Similarly, within-die performance variation also imposes new challenges for test methods. With the use of dedicated sensors, which exploit knowledge of the circuit structure and the specific defect mechanisms, the method described in this thesis facilitates early and fast identification of excessive process parameter variation effects. The expectation-maximization algorithm makes the estimation problem more tractable and also yields good estimates of the parameters for small sample sizes. To allow the test guidance with the information obtained through monitoring process variations implemented adjusted support vector machine classifier simultaneously minimize the empirical classification error and maximize the geometric margin. On a positive note, the use of digital enhancing calibration techniques reduces the need for expensive technologies with special fabrication steps. Indeed, the extra cost of digital processing is normally affordable as the use of submicron mixed signal technologies allows for efficient usage of silicon area even for relatively complex algorithms. Employed adaptive filtering algorithm for error estimation offers the small number of operations per iteration and does not require correlation function calculation nor matrix inversions. The presented foreground calibration algorithm does not need any dedicated test signal and does not require a part of the conversion time. It works continuously and with every signal applied to the A/D converter. The feasibility of the method for on-line and off-line debugging and calibration has been verified by experimental measurements from the silicon prototype fabricated in standard single poly, six metal 0.09-”m CMOS process
    • 

    corecore