    Discrete wavelet transform realisation using run-time reconfiguration of field programmable gate array (FPGA)s

    Abstract: Designing a universal embedded hardware architecture for discrete wavelet transform is a challenging problem because of the diversity among wavelet kernel filters. In this work, the authors present three different hardware architectures for implementing multiple wavelet kernels. The first scheme utilises fixed, parallel hardware for all the required wavelet kernels, whereas the second scheme employs a processing element (PE)-based datapath that can be configured for multiple wavelet filters during run-time. The third scheme makes use of partial run-time configuration of FPGA units for dynamically programming any desired wavelet filter. As a case study, the authors present FPGA synthesis results for simultaneous implementation of six different wavelets for the proposed methods. Performance analysis and comparison of area, timing and power results are presented for the Virtex-II Pro FPGA implementations

    A Pipeline VLSI Architecture for High-Speed Computation of the 1-D Discrete Wavelet Transform

    In this paper, a scheme for the design of a high-speed pipeline VLSI architecture for the computation of the 1-D discrete wavelet transform (DWT) is proposed. The main focus of the scheme is on reducing the number and period of clock cycles for the DWT computation with little or no overhead on the hardware resources by maximizing the inter- and intrastage parallelisms of the pipeline. The interstage parallelism is enhanced by optimally mapping the computational load associated with the various DWT decomposition levels to the stages of the pipeline and by synchronizing their operations. The intrastage parallelism is enhanced by decomposing the filtering operation equally into two subtasks that can be performed independently in parallel and by optimally organizing the bitwise operations for performing each subtask so that the delay of the critical data path from a partial-product bit to a bit of the output sample for the filtering operation is minimized. It is shown that an architecture designed based on the proposed scheme requires a smaller number of clock cycles compared to that of the architectures employing comparable hardware resources. In fact, the requirement on the hardware resources of the architecture designed by using the proposed scheme also gets improved due to a smaller number of registers that need to be employed. Based on the proposed scheme, a specific example of designing an architecture for the DWT computation is considered. In order to assess the feasibility and the efficiency of the proposed scheme, the architecture thus designed is simulated and implemented on a field-programmable gate-array board. It is seen that the simulation and implementation results conform to the stated goals of the proposed scheme, thus making the scheme a viable approach for designing a practical and realizable architecture for real-time DWT computation

    Single event upset hardened embedded domain specific reconfigurable architecture

    FPGAs in Industrial Control Applications

    The aim of this paper is to review the state-of-the-art of Field Programmable Gate Array (FPGA) technologies and their contribution to industrial control applications. Authors start by addressing various research fields which can exploit the advantages of FPGAs. The features of these devices are then presented, followed by their corresponding design tools. To illustrate the benefits of using FPGAs in the case of complex control applications, a sensorless motor controller has been treated. This controller is based on the Extended Kalman Filter. Its development has been made according to a dedicated design methodology, which is also discussed. The use of FPGAs to implement artificial intelligence-based industrial controllers is then briefly reviewed. The final section presents two short case studies of Neural Network control systems designs targeting FPGAs

    Design of a flexible, re-usable hardware component for the 2D discrete wavelet transform

    This paper deals with the implementation of the 2D discrete wavelet transform in the form of a reusable, flexible hardware component. This component is compliant with the JPEG2000 standard and targets a variety of embedded imaging applications. A novel design methodology based on the emerging high-level synthesis tools allowed us to achieve a high degree of flexibility in the specification and synthesis of this component. Customization of functional parameters is supported (choice of the lifting-based filter bank, number of decomposition levels) as well as communication constraints (pixel ordering and I/O scheduling) and performance constraints (computation speed and parallelism), facilitating reuse in various applications and integration environments. In this paper, we first provide a summary of the recent trends in embedded system design. Then the theoretical bases of the discrete wavelet transform and the classical approaches for implementing it in hardware are briefly presented. After presenting the principles of our design methodology, we detail the successive design stages, from the algorithm to the architectures, in the case of the 2D lifting-based discrete wavelet transform. We conclude with synthesis results demonstrating the effectiveness of our approach for designing highly flexible hardware components.Dans cet article, nous nous intĂ©ressons Ă  l'implantation matĂ©rielle de la transformation en ondelettes discrĂšte 2D sous forme d'un composant rĂ©utilisable flexible. Ce composant, compatible avec le standard JPEG2000, est destinĂ© Ă  ĂȘtre intĂ©grĂ© dans une variĂ©tĂ© d'applications embarquĂ©es de compression d'images. Une mĂ©thodologie de conception originale reposant sur les nouveaux outils de synthĂšse de haut niveau nous a permis d'atteindre un degrĂ© Ă©levĂ© de flexibilitĂ© dans la spĂ©cification et la synthĂšse de ce composant. Celle-ci autorise en effet la personnalisation de paramĂštres fonctionnels (choix du banc de filtres lifting, nombre de niveaux de dĂ©composition), de contraintes de communication (ordre de parcours des pixels de l'image, date de lecture/Ă©criture des donnĂ©es) et de contraintes de performances (vitesse de traitement, parallĂ©lisme de calcul) qui facilitent ainsi sa rĂ©utilisation dans diffĂ©rentes applications et environnements d'intĂ©gration. Dans cet article, nous dressons tout d'abord un Ă©tat de l'art des nouvelles approches en conception de systĂšmes intĂ©grĂ©s. Nous rappelons briĂšvement les bases thĂ©oriques de la transformation en ondelettes discrĂšte et nous prĂ©sentons les approches classiques pour son implantation sous forme d'architectures VLSI. AprĂšs avoir prĂ©sentĂ© les principes de notre mĂ©thodologie de conception, nous dĂ©clinons ses Ă©tapes successives, de l'algorithme aux architectures, dans le cas de la transformation en ondelettes 2D utilisant le Lifting Scheme. Nous concluons par des rĂ©sultats de synthĂšse dĂ©montrant l'efficacitĂ© de la dĂ©marche suivie en termes de flexibilitĂ© de la spĂ©cification obtenue

    VLSI design concepts for iterative algorithms

    Circuit design becomes more and more complicated, especially when the Very Large Scale Integration (VLSI) manufacturing technology node keeps shrinking down to nanoscale level. New challenges come up such as an increasing gap between the design productivity and the Moore’s Law. Leakage power becomes a major factor of the power consumption and traditional shared bus transmission is the critical bottleneck in the billion transistors Multi-Processor System–on–Chip (MPSoC) designs. These issues lead us to discuss the impact on the design of iterative algorithms. This thesis presents several strategies that satisfy various design con- straints, which can be used to explore superior solutions for the circuit design of iterative algorithms. Four selected examples of iterative al- gorithms are elaborated in this respect: hardware implementation of COordinate Rotation DIgital Computer (CORDIC) processor for sig- nal processing, configurable DCT and integer transformations based CORDIC algorithm for image/video compression, parallel Jacobi Eigen- value Decomposition (EVD) method with arbitrary iterations for com- munication, and acceleration of parallel Sparse Matrix–Vector Multipli- cation (SMVM) operations based Network–on–Chip (NoC) for solving systems of linear equations. These four applications of iterative meth- ods have been chosen since they cover a wide area of current signal processing tasks. Each method has its own unique design criteria when it comes to the direct implementation on the circuit level. Therefore, a balanced solution between various design tradeoffs is elaborated for each method. These tradeoffs are between throughput and power consumption, com- putational complexity and transformation accuracy, the number of in- ner/outer iterations and energy consumption, data structure and net- work topology. It is shown that all of these algorithms can be imple- mented on FPGA devices or as ASICs efficiently

    Optimization of a hardware/software coprocessing platform for EEG eyeblink detection and removal

    The feasibility of implementing a real-time system for removing eyeblink artifacts from electroencephalogram (EEG) recordings utilizing a hardware/software coprocessing platform was investigated. A software based wavelet and independent component analysis (ICA) eyeblink detection and removal process was extended to enable variation in its processing parameters. Exploiting the efficiency of hardware and the reconfigurability of software, it was ported to a field programmable gate array (FPGA) development platform which was found to be capable of implementing the revised algorithm, although not in real-time. The implemented hardware and software solution was applied to a collection of both simulated and clinically acquired EEG data with known artifact and waveform characteristics to assess its speed and accuracy. Configured for optimal accuracy in terms of minimal false positives and negatives as well as maintaining the integrity of the underlying EEG, especially when encountering EEG waveform patterns with an appearance similar to eyeblink artifacts, the system was capable of processing a 10 second EEG epoch in an average of 123 seconds. Configured for efficiency, but with diminished accuracy, the system required an average of 34 seconds. Varying the ICA contrast function showed that the gaussian nonlinearity provided the best combination of reliability and accuracy, albeit with a long execution time. The cubic nonlinearity was fast, but unreliable, while the hyperbolic tangent contrast function frequently diverged. It is believed that the utilization of programmable logic with increased logic capacity and processing speed may enable this approach to achieve the objective of real-time operation

    FlexWAFE - eine Architektur fĂŒr rekonfigurierbare-Bildverarbeitungssysteme

    Recently there has been an increase in demand for high-resolution digital media content in both cinema and television industries. Currently existing equipment does not meet the requirements, or is too costly. New hardware systems and new programming techniques are needed in order to meet the high-resolution, high-quality, image requirements and reduce costs. The industry seeks a flexible architecture capable of running multiple applications on top of standard off-the-shelf components, with reduced development time. Until now, standard practice has been to develop specialized architectures and systems that target a single application. This has little flexibility and leads to high developments costs, every new application is designed almost from scratch. Our focus was to develop an architecture that is suited to image stream processing and has the flexibility to run multiple applications using the same FPGA-based hardware platform. The novelty in our approach is that we reconfigure parts of the architecture at run-time, but without incurring in the time and added constraints penalty of FPGA-partial-reconfiguration techniques. The architecture uses a hierarchical control structure that is well suited to parallel processing, and allows single cycle latency reconfiguration of parts of the processing pipeline. This is achieved using relatively little resources for the distributed control structures. To test the developed architecture a complex film-grain noise reduction algorithm was implemented on an off-the-shelf hardware platform developed by Thomson-Grass Valley. The system meet all the requirements and had very little load on the hierarchical control structures, there is growth headroom for much complexer control demands. The architecture has been ported to other hardware platforms, and other applications have been implemented as well. The run-time reconfigurability has proven to be a key factor in the success of the FlexWAFE.KĂŒrzlich gab es eine Zunahme der Nachfrage nach hochauflösenden digitalen Medieninhalten in den Kino- und Fernsehenindustrien. Derzeit vorhandene Systeme entsprechen nicht den Anforderungen, oder sind zu teuer. Neue Hardware-Systeme und neuer Programmiertechniken sind erforderlich, um den hochauflösenden, hochwertigen, Bildanforderungen zu genĂŒgen und Kosten zu verringern. Die Industrie sucht eine flexible Architektur zur AusfĂŒhrung mehrerer Anwendungen auf Standard-Komponenten, mit reduzierten Entwicklungszeiten. Bis jetzt ist gĂ€ngige Praxis, spezialisierten Architektur und Systeme zu entwickeln, die eine einzelne Anwendung zielen. Dieses hat wenig FlexibilitĂ€t und fĂŒhrt zu hohe Entwicklungskosten, jede neue Anwendung ist fast von Grund auf neu konzipiert. Unser Fokus war es, eine fĂŒr Bild Verarbeitung geeignet Architektur zu entwickeln dass die FlexibilitĂ€t hat mehrere Anwendungen an dieselbe FPGA-basierte Hardware-Plattform zu laufen. Die Neuheit in unserem Ansatz ist, dass wir Teile der Architektur zur Laufzeit rekonfigurieren, aber, ohne das Zeit und constraints strafe von FPGA Partielle-Rekonfiguration-Techniken. Die Architektur verwendet eine hierarchische Kontrollstruktur, die zur parallel Verarbeitung gut geeignet ist, und Single-Cycle-Latenz Rekonfiguration von Teilen der Verarbeitungs-Pipeline ermöglicht. Dieses wird unter Verwendung relativ weniger Ressourcen fĂŒr die verteiltes Steuerung Strukturen erzielt. Um das entwickelte Architektur zu testen ein komplexer Film-Korn-RauschunterdrĂŒckung Algorithmus wurde auf einer von Thomson-Grass Valley entwickelt standard Hardware-Plattform umgesetzt. Das System erfĂŒllt alle Anforderungen und hatte sehr wenig Last auf den hierarchischen Kontrollstrukturen, es gibt viel Wachstum Spielraum fĂŒr viel kompliziertere Steuerunganforderungen. Die Architektur ist zu anderen Hardwareplattformen portiert worden, und andere Anwendungen wurden ebenfalls implementiert. Der Laufzeitreconfigurability ist ein SchlĂŒsselfaktor im Erfolg des FlexWAFE gewesen
