21 research outputs found

    Towards Optimised FPGA Realisation of Microprogrammed Control Unit Based FIR Filters

    Get PDF
    Finite impulse response (FIR) filter is one of the most common type of digital filter used in digital signal processing (DSP) applications. An FIR filter is usually realised in hardware using multipliers, adders and registers. Field programmable gate arrays (FPGAs) have been widely explored for the hardware realisation of FIR filters using different algorithms and techniques. One such technique that has recently gained considerable attention is the use of microprogrammed control unit (MPCU) in designing FIR filters. In this chapter, we further explore MPCU technique for optimised hardware realisation of digital FIR filter. To evaluate the performance, two different architectures of FIR filter are designed using Wallace tree multiplier. Both the architectures are coded in Verilog hardware description language (HDL). The performance is analysed by evaluating the resource utilisation and timing reports of Virtex-5 FPGA generated by the Synopsys Synplify Pro tool. Based on the implementation results, as compared to conventional design, Wallace tree multiplier using carry skip adder (CSKA) provides optimal digital FIR filter

    Control Theory in Engineering

    Get PDF
    The subject matter of this book ranges from new control design methods to control theory applications in electrical and mechanical engineering and computers. The book covers certain aspects of control theory, including new methodologies, techniques, and applications. It promotes control theory in practical applications of these engineering domains and shows the way to disseminate researchers’ contributions in the field. This project presents applications that improve the properties and performance of control systems in analysis and design using a higher technical level of scientific attainment. The authors have included worked examples and case studies resulting from their research in the field. Readers will benefit from new solutions and answers to questions related to the emerging realm of control theory in engineering applications and its implementation

    Acquisition systems and decoding algorithms of peripheral neural signals for prosthetic applications

    Get PDF
    During the years, neuroprosthetic applications have obtained a great deal of attention by the international research, especially in the bioengineering field, thanks to the huge investments on several proposed projects funded by the political institutions which consider the treatment of this particular disease of fundamental importance for the global community. The aim of these projects is to find a possible solution to restore the functionalities lost by a patient subjected to an upper limb amputation trying to develop, according to physiological considerations, a communication link between the brain in which the significant signals are generated and a motor prosthesis device able to perform the desired action. Moreover, the designed system must be able to give back to the brain a sensory feedback about the surrounding world in terms of pressure or temperature acquired by tactile biosensors placed at the surface of the cybernetic hand. It in fact allows to execute involuntarymovements when for example the armcomes in contact with hot objects. The development of such a closed-loop architecture involves the need to address some critical issues which depend on the chosen approach. Several solutions have been proposed by the researches of the field, each one differing with respect to where the neural signals are acquired, either at the central nervous systemor at the peripheral one,most of themfollowing the former even that the latter is always considered by the amputees amore natural way to handle the artificial limb. This research work is based on the use of intrafascicular electrodes directly implanted in the residual peripheral nerves of the stump which represents a good compromise choice in terms of invasiveness and selectivity extracting electroneurographic (ENG) signals from which it is possible to identify the significant activity of a quite limited number of neuronal cells. In the perspective of the hardware implementation of the resulting solution which can work autonomously without any intervention by the amputee in an adaptive way according to the current characteristics of the processed signal and by using batteries as power source allowing portability, it is necessary to fulfill the tight constraints imposed by the application under consideration involved in each of the various phases which compose the considered closed-loop system. Regarding to the recording phase, the implementation must be able to remove the unwanted interferences mainly due to the electro-stimulations of themuscles placed near the electrodes featured by an order of magnitude much greater in comparison to that of the signals of interest amplifying the frequency components belonging to the significant bandwidth, and to convert them with a high resolution in order to obtain good performance at the next processing phases. To this aim, a recording module for peripheral neural signals will be presented, based on the use of a sigma-delta architecture which is composed by two main parts: an analog front-end stage for neural signal acquisition, pre-filtering and sigma-delta modulation and a digital unit for sigma-delta decimation and system configuration. Hardware/software cosimulations exploiting the Xilinx System Generator tool in Matlab Simulink environment and then transistor-level simulations confirmed that the system is capable of recording neural signals in the order of magnitude of tens of μV rejecting the huge low-frequency noise due to electromyographic interferences. The same architecture has been then exploited to implement a prototype of an 8-channel implantable electronic bi-directional interface between the peripheral nervous system and the neuro-controlled hand prosthesis. The solution includes a custom designed Integrated Circuit (0.35μm CMOS technology), responsible of the signal pre-filtering and sigma-delta modulation for each channel and the neural stimuli generation (in the opposite path) based on the directives sent by a digital control systemmapped on a low-cost Xilinx FPGA Spartan-3E 1600 development board which also involves the multi-channel sigma-delta decimation with a high-order band-pass filter as first stage in order to totally remove the unwanted interferences. In this way, the analog chip can be implanted near the electrodes thanks to its limited size avoiding to add a huge noise to theweak neural signals due to longwires connections and to cause heat-related infections, shifting the complexity to the digital part which can be hosted on a separated device in the stump of the amputeewithout using complex laboratory instrumentations. The system has been successfully tested from the electrical point of view and with in-vivo experiments exposing good results in terms of output resolution and noise rejection even in case of critical conditions. The various output channels at the Nyquist sampling frequency coming from the acquisition system must be processed in order to decode the intentions of movements of the amputee, applying the correspondent electro-mechanical stimulation in input to the cybernetic hand in order to perform the desired motor action. Different decoding approaches have been presented in the past, the majority of them were conceived starting from the relative implementation and performance evaluation of their off-line version. At the end of the research, it is necessary to develop these solutions on embedded systems performing an online processing of the peripheral neural signals. However, it is often possible only by using complex hardware platforms clocked at very high operating frequencies which are not be compliant with the low-power requirements needed to allow portability for the prosthetic device. At present, in fact, the important aspect of the real-time implementation of sophisticated signal processing algorithms on embedded systems has been often overlooked, notwithstanding the impact that limited resources of the former may have on the efficiency/effectiveness of any given algorithm. In this research work it has been addressed the optimization of a state-of-the-art algorithmfor PNS signals decoding that is a step forward for its real-time, full implementation onto a floating-point Digital Signal Processor (DSP). Beyond low-level optimizations, different solutions have been proposed at an high level in order to find the best trade-off in terms of effectiveness/efficiency. A latency model, obtained through cycle accurate profiling of the different code sections, has been drawn in order to perform a fair performance assessment. The proposed optimized real-time algorithmachieves up to 96% of correct classification on real PNS signals acquired through tf-LIFE electrodes on animals, and performs as the best off-line algorithmfor spike clustering on a synthetic cortical dataset characterized by a reasonable dissimilarity between the spikemorphologies of different neurons. When the real-time requirements are joined to the fulfilment of area and power minimization for implantable/portable applications, such as for the target neuroprosthetic devices, only custom VLSI implementations can be adopted. In this case, every part of the algorithmshould be carefully tuned. To this aim, the first preprocessing stage of the decoding algorithmbased on the use of aWavelet Denoising solution able to remove also the in-band noise sources has been deeply analysed in order to obtain an optimal hardware implementation. In particular, the usually overlooked part related to threshold estimation has been evaluated in terms of required hardware resources and functionality, exploiting the commercial Xilinx System Generator tool for the design of the architecture and the co-simulation. The analysis has revealed how the widely used Median Absolute Deviation (MAD) could lead o hardware implementations highly inefficient compared to other dispersion estimators demonstrating better scalability, relatively to the specific application. Finally, two different hardware implementations of the reference decoding algorithm have been presented highlighting pros and cons of each one of them. Firstly, a novel approach based on high-level dataflow description and automatic hardware generation is presented and evaluated on the on-line template-matching spike sorting algorithmwhich represents the most complex processing stage. It starts from the identification of the single kernels with the greater computational complexity and using their dataflow description to generate the HDL implementation of a coarse-grained reconfigurable global kernel characterized by theminimumresources in order to reduce the area and the energy dissipation for the fulfilment of the low-power requirements imposed by the application. Results in the best case have revealed a 71%of area saving compared tomore traditional solutions,without any accuracy penalty. With respect to single kernels execution, better latency performance are achievable stillminimizing the number of adopted resources. The performance in terms of latency can also be improved by tuning the implemented parallelismin the light of a defined number of channels and real-time constraints, by using more than one reconfigurable global kernel in order that they can be exploited to perform the same or different kernels at the same time in a parallel way, due to the fact that each one can execute the relative processing only in a sequential way. For this reason, a second FPGA-based prototype has been proposed based on the use of aMulti-Processor System-on-Chip (MPSoC) embedded architecture. This prototype is capable of respecting the real-time constraints posed by the application when clocked at less than 50 MHz, in comparison to 300 MHz of the previous DSP implementation. Considering that the application workload is extremely data dependent and unpredictable due to the sparsity of the neural signals, the architecture has to be dimensioned taking into account critical worst-case operating conditions in order to always ensure the correct functionality. To compensate the resulting overprovisioning of the system architecture, a software-controllable power management based on the use of clock gating techniques has been integrated in order tominimize the dynamic power consumption of the resulting solution. Summarizing, this research work can be considered a sort of proof-of-concept for the proposed techniques considering all the design issues which characterize each stage of the closed-loop system in the perspective of a portable low-power real-time hardware implementation of the neuro-controlled prosthetic device

    Acquisition systems and decoding algorithms of peripheral neural signals for prosthetic applications

    Get PDF
    During the years, neuroprosthetic applications have obtained a great deal of attention by the international research, especially in the bioengineering field, thanks to the huge investments on several proposed projects funded by the political institutions which consider the treatment of this particular disease of fundamental importance for the global community. The aim of these projects is to find a possible solution to restore the functionalities lost by a patient subjected to an upper limb amputation trying to develop, according to physiological considerations, a communication link between the brain in which the significant signals are generated and a motor prosthesis device able to perform the desired action. Moreover, the designed system must be able to give back to the brain a sensory feedback about the surrounding world in terms of pressure or temperature acquired by tactile biosensors placed at the surface of the cybernetic hand. It in fact allows to execute involuntarymovements when for example the armcomes in contact with hot objects. The development of such a closed-loop architecture involves the need to address some critical issues which depend on the chosen approach. Several solutions have been proposed by the researches of the field, each one differing with respect to where the neural signals are acquired, either at the central nervous systemor at the peripheral one,most of themfollowing the former even that the latter is always considered by the amputees amore natural way to handle the artificial limb. This research work is based on the use of intrafascicular electrodes directly implanted in the residual peripheral nerves of the stump which represents a good compromise choice in terms of invasiveness and selectivity extracting electroneurographic (ENG) signals from which it is possible to identify the significant activity of a quite limited number of neuronal cells. In the perspective of the hardware implementation of the resulting solution which can work autonomously without any intervention by the amputee in an adaptive way according to the current characteristics of the processed signal and by using batteries as power source allowing portability, it is necessary to fulfill the tight constraints imposed by the application under consideration involved in each of the various phases which compose the considered closed-loop system. Regarding to the recording phase, the implementation must be able to remove the unwanted interferences mainly due to the electro-stimulations of themuscles placed near the electrodes featured by an order of magnitude much greater in comparison to that of the signals of interest amplifying the frequency components belonging to the significant bandwidth, and to convert them with a high resolution in order to obtain good performance at the next processing phases. To this aim, a recording module for peripheral neural signals will be presented, based on the use of a sigma-delta architecture which is composed by two main parts: an analog front-end stage for neural signal acquisition, pre-filtering and sigma-delta modulation and a digital unit for sigma-delta decimation and system configuration. Hardware/software cosimulations exploiting the Xilinx System Generator tool in Matlab Simulink environment and then transistor-level simulations confirmed that the system is capable of recording neural signals in the order of magnitude of tens of μV rejecting the huge low-frequency noise due to electromyographic interferences. The same architecture has been then exploited to implement a prototype of an 8-channel implantable electronic bi-directional interface between the peripheral nervous system and the neuro-controlled hand prosthesis. The solution includes a custom designed Integrated Circuit (0.35μm CMOS technology), responsible of the signal pre-filtering and sigma-delta modulation for each channel and the neural stimuli generation (in the opposite path) based on the directives sent by a digital control systemmapped on a low-cost Xilinx FPGA Spartan-3E 1600 development board which also involves the multi-channel sigma-delta decimation with a high-order band-pass filter as first stage in order to totally remove the unwanted interferences. In this way, the analog chip can be implanted near the electrodes thanks to its limited size avoiding to add a huge noise to theweak neural signals due to longwires connections and to cause heat-related infections, shifting the complexity to the digital part which can be hosted on a separated device in the stump of the amputeewithout using complex laboratory instrumentations. The system has been successfully tested from the electrical point of view and with in-vivo experiments exposing good results in terms of output resolution and noise rejection even in case of critical conditions. The various output channels at the Nyquist sampling frequency coming from the acquisition system must be processed in order to decode the intentions of movements of the amputee, applying the correspondent electro-mechanical stimulation in input to the cybernetic hand in order to perform the desired motor action. Different decoding approaches have been presented in the past, the majority of them were conceived starting from the relative implementation and performance evaluation of their off-line version. At the end of the research, it is necessary to develop these solutions on embedded systems performing an online processing of the peripheral neural signals. However, it is often possible only by using complex hardware platforms clocked at very high operating frequencies which are not be compliant with the low-power requirements needed to allow portability for the prosthetic device. At present, in fact, the important aspect of the real-time implementation of sophisticated signal processing algorithms on embedded systems has been often overlooked, notwithstanding the impact that limited resources of the former may have on the efficiency/effectiveness of any given algorithm. In this research work it has been addressed the optimization of a state-of-the-art algorithmfor PNS signals decoding that is a step forward for its real-time, full implementation onto a floating-point Digital Signal Processor (DSP). Beyond low-level optimizations, different solutions have been proposed at an high level in order to find the best trade-off in terms of effectiveness/efficiency. A latency model, obtained through cycle accurate profiling of the different code sections, has been drawn in order to perform a fair performance assessment. The proposed optimized real-time algorithmachieves up to 96% of correct classification on real PNS signals acquired through tf-LIFE electrodes on animals, and performs as the best off-line algorithmfor spike clustering on a synthetic cortical dataset characterized by a reasonable dissimilarity between the spikemorphologies of different neurons. When the real-time requirements are joined to the fulfilment of area and power minimization for implantable/portable applications, such as for the target neuroprosthetic devices, only custom VLSI implementations can be adopted. In this case, every part of the algorithmshould be carefully tuned. To this aim, the first preprocessing stage of the decoding algorithmbased on the use of aWavelet Denoising solution able to remove also the in-band noise sources has been deeply analysed in order to obtain an optimal hardware implementation. In particular, the usually overlooked part related to threshold estimation has been evaluated in terms of required hardware resources and functionality, exploiting the commercial Xilinx System Generator tool for the design of the architecture and the co-simulation. The analysis has revealed how the widely used Median Absolute Deviation (MAD) could lead o hardware implementations highly inefficient compared to other dispersion estimators demonstrating better scalability, relatively to the specific application. Finally, two different hardware implementations of the reference decoding algorithm have been presented highlighting pros and cons of each one of them. Firstly, a novel approach based on high-level dataflow description and automatic hardware generation is presented and evaluated on the on-line template-matching spike sorting algorithmwhich represents the most complex processing stage. It starts from the identification of the single kernels with the greater computational complexity and using their dataflow description to generate the HDL implementation of a coarse-grained reconfigurable global kernel characterized by theminimumresources in order to reduce the area and the energy dissipation for the fulfilment of the low-power requirements imposed by the application. Results in the best case have revealed a 71%of area saving compared tomore traditional solutions,without any accuracy penalty. With respect to single kernels execution, better latency performance are achievable stillminimizing the number of adopted resources. The performance in terms of latency can also be improved by tuning the implemented parallelismin the light of a defined number of channels and real-time constraints, by using more than one reconfigurable global kernel in order that they can be exploited to perform the same or different kernels at the same time in a parallel way, due to the fact that each one can execute the relative processing only in a sequential way. For this reason, a second FPGA-based prototype has been proposed based on the use of aMulti-Processor System-on-Chip (MPSoC) embedded architecture. This prototype is capable of respecting the real-time constraints posed by the application when clocked at less than 50 MHz, in comparison to 300 MHz of the previous DSP implementation. Considering that the application workload is extremely data dependent and unpredictable due to the sparsity of the neural signals, the architecture has to be dimensioned taking into account critical worst-case operating conditions in order to always ensure the correct functionality. To compensate the resulting overprovisioning of the system architecture, a software-controllable power management based on the use of clock gating techniques has been integrated in order tominimize the dynamic power consumption of the resulting solution. Summarizing, this research work can be considered a sort of proof-of-concept for the proposed techniques considering all the design issues which characterize each stage of the closed-loop system in the perspective of a portable low-power real-time hardware implementation of the neuro-controlled prosthetic device

    The Fifth NASA Symposium on VLSI Design

    Get PDF
    The fifth annual NASA Symposium on VLSI Design had 13 sessions including Radiation Effects, Architectures, Mixed Signal, Design Techniques, Fault Testing, Synthesis, Signal Processing, and other Featured Presentations. The symposium provides insights into developments in VLSI and digital systems which can be used to increase data systems performance. The presentations share insights into next generation advances that will serve as a basis for future VLSI design

    A hardware-software codesign framework for cellular computing

    Get PDF
    Until recently, the ever-increasing demand of computing power has been met on one hand by increasing the operating frequency of processors and on the other hand by designing architectures capable of exploiting parallelism at the instruction level through hardware mechanisms such as super-scalar execution. However, both these approaches seem to have reached a plateau, mainly due to issues related to design complexity and cost-effectiveness. To face the stabilization of performance of single-threaded processors, the current trend in processor design seems to favor a switch to coarser-grain parallelization, typically at the thread level. In other words, high computational power is achieved not only by a single, very fast and very complex processor, but through the parallel operation of several processors, each executing a different thread. Extrapolating this trend to take into account the vast amount of on-chip hardware resources that will be available in the next few decades (either through further shrinkage of silicon fabrication processes or by the introduction of molecular-scale devices), together with the predicted features of such devices (e.g., the impossibility of global synchronization or higher failure rates), it seems reasonable to foretell that current design techniques will not be able to cope with the requirements of next-generation electronic devices and that novel design tools and programming methods will have to be devised. A tempting source of inspiration to solve the problems implied by a massively parallel organization and inherently error-prone substrates is biology. In fact, living beings possess characteristics, such as robustness to damage and self-organization, which were shown in previous research as interesting to be implemented in hardware. For instance, it was possible to realize relatively simple systems, such as a self-repairing watch. Overall, these bio-inspired approaches seem very promising but their interest for a wider audience is problematic because their heavily hardware-oriented designs lack some of the flexibility achievable with a general purpose processor. In the context of this thesis, we will introduce a processor-grade processing element at the heart of a bio-inspired hardware system. This processor, based on a single-instruction, features some key properties that allow it to maintain the versatility required by the implementation of bio-inspired mechanisms and to realize general computation. We will also demonstrate that the flexibility of such a processor enables it to be evolved so it can be tailored to different types of applications. In the second half of this thesis, we will analyze how the implementation of a large number of these processors can be used on a hardware platform to explore various bio-inspired mechanisms. Based on an extensible platform of many FPGAs, configured as a networked structure of processors, the hardware part of this computing framework is backed by an open library of software components that provides primitives for efficient inter-processor communication and distributed computation. We will show that this dual software–hardware approach allows a very quick exploration of different ways to solve computational problems using bio-inspired techniques. In addition, we also show that the flexibility of our approach allows it to exploit replication as a solution to issues that concern standard embedded applications

    Design and evaluation of a VLIW processor for real-time systems

    Get PDF
    Tese (doutorado) - Universidade Federal de Santa Catarina, Centro Tecnológico, Programa de Pós-Graduação em Engenharia de Automação e Sistemas, Florianópolis, 2016.Atualmente, aplicações de tempo estão tornando-se cada vez mais complexas e, conforme os requisitos destes sistemas aumentam, maior é a demanda por capacidade de processamento. Contudo, o correto funcionamento destas aplicações não está em função somente da correta resposta lógica, mas também no tempo que ela é produzida. O projeto de processadores de propósito geral gera dificuldades para análises de tempo real devido ao seu comportamento não determinista causado pelo uso de memórias cache, previsores de fluxo dinâmicos, execução especulativa e fora de ordem. Nesta tese, investiga-se uma arquitetura de processador Very-Long Instruction Word (VLIW) especificamente projetada para sistemas de tempo real considerando sua análise do pior tempo de computação (Worst-case Execution Time WCET). Técnicas para obtenção do WCET para máquinas VLIW são consideradas e quantifica-se a importância de técnicas de hardware como previsor de fluxo estático, predicação, bem como velocidade do processador para instruções complexas como acesso a memória e multiplicação. Arquitetura de memória não faz parte do escopo deste trabalho e para tal utilizamos uma estrutura determinista formada por uma memória cache com mapeamento direto para instruções e uma memória de rascunho (scratchpad) para dados. Nós também consideramos a implementação em VHDL do protótipo para inferir suas características temporais mantendo compatibilidade com o conjunto de instruções (ISA) HP VLIW ST231. Em termos de avaliação, foi utilizado um conjunto representativo de código exemplos da Universidade de Mälardalen que é amplamente utilizado em avaliações de sistemas de tempo real.Abstract : Nowadays, many real-time applications are very complex and as the complexity and the requirements of those applications become more demanding, more hardware processing capacity is necessary. The correct functioning of real-time systems depends not only on the logically correct response, but also on the time when it is produced. General purpose processor design fails to deliver analyzability due to their non-deterministic behavior caused by the use of cache memories, dynamic branch prediction, speculative execution and out-of-order pipelines. In this thesis, we design and evaluate the performance of VLIW (Very Long Instruction Word) architectures for real-time systems with an in-order pipeline considering WCET (Worst-case Execution Time) performance. Techniques on obtaining the WCET of VLIW machines are also considered and we make a quantification on how important are hardware techniques such as static branch prediction, predication, pipeline speed of complex operations such as memory access and multiplication for high-performance real-time systems. The memory hierarchy is out of scope of this thesis and we used a classic deterministic structure formed by a direct mapped instruction cache and a data scratchpad memory. A VLIW prototype was implemented in VHDL from scratch considering the HP VLIW ST231 ISA. We also show some compiler insights and we use a representative subset of the Mälardalen s WCET benchmarks for validation and performance quantification. Supporting our objective to investigate and evaluate hardware features which reconcile determinism and performance, we made the following contributions: design space investigation and evaluation regarding VLIW processors, complete WCET analysis for the proposed design, complete VHDL design and timing characterization, detailed branch architecture, low-overhead full-predication system for VLIW processors

    General Course Catalog [January-June 2015]

    Get PDF
    Undergraduate Course Catalog, January-June 2015https://repository.stcloudstate.edu/undergencat/1121/thumbnail.jp
    corecore