Search CORE

12 research outputs found

A Scalable and Adaptive Network on Chip for Many-Core Architectures

Author: Heißwolf Jan
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2014
Field of study

In this work, a scalable network on chip (NoC) for future many-core architectures is proposed and investigated. It supports different QoS mechanisms to ensure predictable communication. Self-optimization is introduced to adapt the energy footprint and the performance of the network to the communication requirements. A fault tolerance concept allows to deal with permanent errors. Moreover, a template-based automated evaluation and design methodology and a synthesis flow for NoCs is introduced

KITopen

Fault-tolerant satellite computing with modern semiconductors

Author: Fuchs C.M.
Publication venue
Publication date: 17/12/2019
Field of study

Miniaturized satellites enable a variety space missions which were in the past infeasible, impractical or uneconomical with traditionally-designed heavier spacecraft. Especially CubeSats can be launched and manufactured rapidly at low cost from commercial components, even in academic environments. However, due to their low reliability and brief lifetime, they are usually not considered suitable for life- and safety-critical services, complex multi-phased solar-system-exploration missions, and missions with a longer duration. Commercial electronics are key to satellite miniaturization, but also responsible for their low reliability: Until 2019, there existed no reliable or fault-tolerant computer architectures suitable for very small satellites. To overcome this deficit, a novel on-board-computer architecture is described in this thesis.Robustness is assured without resorting to radiation hardening, but through software measures implemented within a robust-by-design multiprocessor-system-on-chip. This fault-tolerant architecture is component-wise simple and can dynamically adapt to changing performance requirements throughout a mission. It can support graceful aging by exploiting FPGA-reconfiguration and mixed-criticality. Experimentally, we achieve 1.94W power consumption at 300Mhz with a Xilinx Kintex Ultrascale+ proof-of-concept, which is well within the powerbudget range of current 2U CubeSats. To our knowledge, this is the first COTS-based, reproducible on-board-computer architecture that can offer strong fault coverage even for small CubeSats.European Space AgencyComputer Systems, Imagery and Medi

Leiden University Scholary Publications

Prototyping Methodologies and Design of Communication-centric Heterogeneous Many-core Architectures

Author: Masing Leonard Jannik
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2020
Field of study

KITopen

Acquisition systems and decoding algorithms of peripheral neural signals for prosthetic applications

Author: Carta Nicola
Publication venue
Publication date: 14/04/2014
Field of study

During the years, neuroprosthetic applications have obtained a great deal of attention by the international research, especially in the bioengineering field, thanks to the huge investments on several proposed projects funded by the political institutions which consider the treatment of this particular disease of fundamental importance for the global community. The aim of these projects is to find a possible solution to restore the functionalities lost by a patient subjected to an upper limb amputation trying to develop, according to physiological considerations, a communication link between the brain in which the significant signals are generated and a motor prosthesis device able to perform the desired action. Moreover, the designed system must be able to give back to the brain a sensory feedback about the surrounding world in terms of pressure or temperature acquired by tactile biosensors placed at the surface of the cybernetic hand. It in fact allows to execute involuntarymovements when for example the armcomes in contact with hot objects. The development of such a closed-loop architecture involves the need to address some critical issues which depend on the chosen approach. Several solutions have been proposed by the researches of the field, each one differing with respect to where the neural signals are acquired, either at the central nervous systemor at the peripheral one,most of themfollowing the former even that the latter is always considered by the amputees amore natural way to handle the artificial limb. This research work is based on the use of intrafascicular electrodes directly implanted in the residual peripheral nerves of the stump which represents a good compromise choice in terms of invasiveness and selectivity extracting electroneurographic (ENG) signals from which it is possible to identify the significant activity of a quite limited number of neuronal cells. In the perspective of the hardware implementation of the resulting solution which can work autonomously without any intervention by the amputee in an adaptive way according to the current characteristics of the processed signal and by using batteries as power source allowing portability, it is necessary to fulfill the tight constraints imposed by the application under consideration involved in each of the various phases which compose the considered closed-loop system. Regarding to the recording phase, the implementation must be able to remove the unwanted interferences mainly due to the electro-stimulations of themuscles placed near the electrodes featured by an order of magnitude much greater in comparison to that of the signals of interest amplifying the frequency components belonging to the significant bandwidth, and to convert them with a high resolution in order to obtain good performance at the next processing phases. To this aim, a recording module for peripheral neural signals will be presented, based on the use of a sigma-delta architecture which is composed by two main parts: an analog front-end stage for neural signal acquisition, pre-filtering and sigma-delta modulation and a digital unit for sigma-delta decimation and system configuration. Hardware/software cosimulations exploiting the Xilinx System Generator tool in Matlab Simulink environment and then transistor-level simulations confirmed that the system is capable of recording neural signals in the order of magnitude of tens of μV rejecting the huge low-frequency noise due to electromyographic interferences. The same architecture has been then exploited to implement a prototype of an 8-channel implantable electronic bi-directional interface between the peripheral nervous system and the neuro-controlled hand prosthesis. The solution includes a custom designed Integrated Circuit (0.35μm CMOS technology), responsible of the signal pre-filtering and sigma-delta modulation for each channel and the neural stimuli generation (in the opposite path) based on the directives sent by a digital control systemmapped on a low-cost Xilinx FPGA Spartan-3E 1600 development board which also involves the multi-channel sigma-delta decimation with a high-order band-pass filter as first stage in order to totally remove the unwanted interferences. In this way, the analog chip can be implanted near the electrodes thanks to its limited size avoiding to add a huge noise to theweak neural signals due to longwires connections and to cause heat-related infections, shifting the complexity to the digital part which can be hosted on a separated device in the stump of the amputeewithout using complex laboratory instrumentations. The system has been successfully tested from the electrical point of view and with in-vivo experiments exposing good results in terms of output resolution and noise rejection even in case of critical conditions. The various output channels at the Nyquist sampling frequency coming from the acquisition system must be processed in order to decode the intentions of movements of the amputee, applying the correspondent electro-mechanical stimulation in input to the cybernetic hand in order to perform the desired motor action. Different decoding approaches have been presented in the past, the majority of them were conceived starting from the relative implementation and performance evaluation of their off-line version. At the end of the research, it is necessary to develop these solutions on embedded systems performing an online processing of the peripheral neural signals. However, it is often possible only by using complex hardware platforms clocked at very high operating frequencies which are not be compliant with the low-power requirements needed to allow portability for the prosthetic device. At present, in fact, the important aspect of the real-time implementation of sophisticated signal processing algorithms on embedded systems has been often overlooked, notwithstanding the impact that limited resources of the former may have on the efficiency/effectiveness of any given algorithm. In this research work it has been addressed the optimization of a state-of-the-art algorithmfor PNS signals decoding that is a step forward for its real-time, full implementation onto a floating-point Digital Signal Processor (DSP). Beyond low-level optimizations, different solutions have been proposed at an high level in order to find the best trade-off in terms of effectiveness/efficiency. A latency model, obtained through cycle accurate profiling of the different code sections, has been drawn in order to perform a fair performance assessment. The proposed optimized real-time algorithmachieves up to 96% of correct classification on real PNS signals acquired through tf-LIFE electrodes on animals, and performs as the best off-line algorithmfor spike clustering on a synthetic cortical dataset characterized by a reasonable dissimilarity between the spikemorphologies of different neurons. When the real-time requirements are joined to the fulfilment of area and power minimization for implantable/portable applications, such as for the target neuroprosthetic devices, only custom VLSI implementations can be adopted. In this case, every part of the algorithmshould be carefully tuned. To this aim, the first preprocessing stage of the decoding algorithmbased on the use of aWavelet Denoising solution able to remove also the in-band noise sources has been deeply analysed in order to obtain an optimal hardware implementation. In particular, the usually overlooked part related to threshold estimation has been evaluated in terms of required hardware resources and functionality, exploiting the commercial Xilinx System Generator tool for the design of the architecture and the co-simulation. The analysis has revealed how the widely used Median Absolute Deviation (MAD) could lead o hardware implementations highly inefficient compared to other dispersion estimators demonstrating better scalability, relatively to the specific application. Finally, two different hardware implementations of the reference decoding algorithm have been presented highlighting pros and cons of each one of them. Firstly, a novel approach based on high-level dataflow description and automatic hardware generation is presented and evaluated on the on-line template-matching spike sorting algorithmwhich represents the most complex processing stage. It starts from the identification of the single kernels with the greater computational complexity and using their dataflow description to generate the HDL implementation of a coarse-grained reconfigurable global kernel characterized by theminimumresources in order to reduce the area and the energy dissipation for the fulfilment of the low-power requirements imposed by the application. Results in the best case have revealed a 71%of area saving compared tomore traditional solutions,without any accuracy penalty. With respect to single kernels execution, better latency performance are achievable stillminimizing the number of adopted resources. The performance in terms of latency can also be improved by tuning the implemented parallelismin the light of a defined number of channels and real-time constraints, by using more than one reconfigurable global kernel in order that they can be exploited to perform the same or different kernels at the same time in a parallel way, due to the fact that each one can execute the relative processing only in a sequential way. For this reason, a second FPGA-based prototype has been proposed based on the use of aMulti-Processor System-on-Chip (MPSoC) embedded architecture. This prototype is capable of respecting the real-time constraints posed by the application when clocked at less than 50 MHz, in comparison to 300 MHz of the previous DSP implementation. Considering that the application workload is extremely data dependent and unpredictable due to the sparsity of the neural signals, the architecture has to be dimensioned taking into account critical worst-case operating conditions in order to always ensure the correct functionality. To compensate the resulting overprovisioning of the system architecture, a software-controllable power management based on the use of clock gating techniques has been integrated in order tominimize the dynamic power consumption of the resulting solution. Summarizing, this research work can be considered a sort of proof-of-concept for the proposed techniques considering all the design issues which characterize each stage of the closed-loop system in the perspective of a portable low-power real-time hardware implementation of the neuro-controlled prosthetic device

Archivio istituzionale della ricerca - Università di Cagliari

UniCA Eprints

Acquisition systems and decoding algorithms of peripheral neural signals for prosthetic applications

Author
Publication venue: Università degli Studi di Cagliari
Publication date: 14/04/2014
Field of study

Archivio istituzionale della ricerca - Università di Cagliari

Contributions to Phase Two of AGATA electronics

Author: Collado Ruiz Javier
Publication venue
Publication date: 01/01/2020
Field of study

En el campo de la física nuclear, la espectroscopia de rayos gamma de alta resolución es un método preciso para estudiar la estructura del núcleo, extrayendo la energía y la distribución angular de los fotones gamma emitidos en las transiciones entre estados nucleares. Para obtener núcleos en un estado excitado y por tanto emitan rayos gamma, hemos de hacer chocar la materia, produciendo reacciones nucleares (espectroscopia de haz) o recurrir a desintegraciones radiactivas (espectroscopia de desintegración). Los detectores de semiconductor de germanio de alta pureza (HPGe) han demostrado tener una buena respuesta interaccionando con rayos gamma. Al igual que otros detectores de basados en semiconductores, cuando se los somete a alto voltaje, los detectores HPGe producen una alta corriente de medida proporcional a la energía de los rayos gamma incidentes. El multi-detector HPGe AGATA (Advanced GAmma Tracking Array) es uno de los espectrómetros gamma de alta resolución más avanzados que existen dedicado al estudio de la física nuclear. Para maximizar la sensibilidad, los detectores HPGe de AGATA tienen los contactos exteriores divididos en 36 segmentos, de este modo se puede determinar la posición del fotón y la energía depositada en cada una de estas partes. Con la información sobre la posición y la energía de los fotones es posible reconstruir las interacciones de los rayos gamma a través de los algoritmos de tracking. Gracias a esta técnica, es posible maximizar la sensibilidad del detector (resolución energética y factor P/T) sin necesidad de utilizar parte del ángulo sólido de detección para otros detectores dedicados a la supresión del efecto Compton. Además de los detectores mismos, los detectores de HPGe sensibles al posicionamiento requieren una electrónica de muestreo con ratios señal a ruido de calidad espectroscópica, que capturen y digitalicen las trazas para ser procesadas por los algoritmos de análisis de forma de pulso (Pulse Shape Analysis). Para conseguir la máxima sensibilidad y eficiencia, el proyecto AGATA busca construir el multi-detector cubriendo una superficie total con 4π de ángulo sólido, optimizando la información obtenida, algo especialmente crítico en experimentos que usan costosos haces de iones radiactivos. Otro objetivo en la construcción de AGATA es su movilidad. El multi-detector AGATA se instala en diferentes laboratorios para aprovechar la variedad de haces e instrumentación complementaria que existen en los diferentes centros europeos. El proyecto AGATA se encuentra actualmente en su Fase 1, que busca cubrir hasta 1π de ángulo sólido y se encuentra funcionando con la segunda generación de electrónica. Los 45 detectores instalados actualmente utilizan en parte la anterior generación o Fase 0 de electrónica, que fue diseñada y producida entre 2005 y 2007. El principal objetivo a nivel de electrónica en la colaboración AGATA es el desarrollo de la nueva generación para la Fase 2, que busca instrumentar 180 detectores y la cual se ha desarrollado parcialmente en esta tesis. Los principales objetivos de la electrónica para la Fase 2 son la integración de en un solo dispositivo, desde la digitalización hasta la salida de datos y el protocolo Ethernet como comunicación para dicha salida. La tecnología Ethernet permitirá una conexión multipunto y la posibilidad de leer los datos desde cualquier sitio de la granja de procesado de AGATA. También se han tenido en cuenta, en el diseño, facilitar el mantenimiento y evitar la obsolescencia de los componentes utilizados. Uno de los grandes problemas que se encuentran en la integración del sistema electrónico de AGATA es la optimización de los recursos en la FPGA por parte del Pre-procesado. Con el avance de la tecnología, a pesar del aumento de la tasa de datos por transceptores de alta velocidad en estos dispositivos (entre 16 y 32 Gbps), el número de transceptores en las FPGAs no se ha incrementado sustancialmente. Además, el coste de los dispositivos FPGA aumenta considerablemente con el número de transceptores. Esto es un problema crítico en AGATA, ya que requiere un gran número de canales digitalizados por dispositivo, pero no a una velocidad especialmente alta (sobre 2 Gbps). Para reducir la complejidad del sistema, el coste y la potencia total, el número de líneas de alta velocidad se ha optimizado mediante agregación de datos por multiplexado en tiempo, incrementando la velocidad de tasa de datos, pero con una reducción en el número total de éstas de 4 a 1. Esta solución se ha llevado a cabo a través de la tarjeta Input Data Mezzanine, concebida y desarrollada enteramente en esta tesis. El objetivo principal desde el punto de vista científico es demostrar la posibilidad de leer 40 canales bajo el protocolo JESD204 o uno equivalente, vía fibra óptica o por cable físico, únicamente con 10 transceptores de alta velocidad de una FPGA, gracias a la técnica de multiplexado por división en el tiempo. La base de la que se parte es la electrónica actual de AGATA y se apoya en tecnología del estado del arte sobre diseño hardware y software para FPGA, diseño digital de alta velocidad y comunicaciones digitales. A pesar de que este diseño se ha realizado principalmente para el proyecto AGATA, consideramos que esta tecnología será de interés para otros instrumentos y aplicaciones.In the field of Nuclear Physics, high-resolution gamma ray spectroscopy is an accurate method to perform nuclear structure studies, retrieving the energy and angular distributions from gamma photons emitted in the transition between nuclear states. In order to obtain the nucleus in an excited state, such that will emit gamma-rays, we are forced to collide matter, doing nuclear reactions (in the in-beam spectroscopy) or resort to the radioactive decay (decay spectroscopy). The High Purity Germanium (HPGe) semiconductor detectors have shown to provide good response as gamma-ray detector. As other semiconductor detectors, HPGe produce, with high sensitiveness, a current proportional to gamma ray energies while there are subject to high voltage inverse polarization, in cryogenic conditions. The AGATA (Advanced GAmma Tracking Array) HPGe detector array is a state-of-the-art detector array for the gamma ray spectroscopy technique in nuclear physics. In order to improve the sensitivity, AGATA HPGe detectors have the outer contact divided in 36 segments in order to determine photon position and energy deposited in each segment. With the interaction energy and position information is possible to reconstruct (Track) the gamma-ray interaction sequence using tracking algorithms. With such technique is possible to maximize the sensitivity of the detector array (energy resolution and P/T) without using part of the detection solid angle for the anti-Compton active shields. In addition to the segmented detectors, the positions sensitive HPGe arrays require sampling electronics with spectroscopic signal-to-noise ratios, which provides the traces to be processed by the Pulse Shape Analysis algorithms. To provide maximum efficiency and sensitivity, the AGATA project aims to construct a 4π solid angle detector array. This geometry optimizes as well the information obtained, something that is especially important in experiments using expensive radioactive ion beams. Another goal in the construction of AGATA is the mobility of the array. AGATA is installed in different laboratories to take advantage of the variety of beams and complementary instrumentation existing in different European centres. The AGATA project is currently in its Phase 1, using a second generation electronics, which aims at building a 1 π solid angle coverage. This requires 45 detectors, that today are partly instrumented with the previous Phase 0 electronics, mostly design and produced in the period from 2005 to 2007. Presently, the main goal for the AGATA collaboration, regarding electronics, is the development of the Phase 2 version, with the objective of instrumenting 180 detectors, which is partly done by the work described in this thesis. The main improvements for this Phase 2 electronics are: the integration of all the electronics from digitizers to readout, including Pre-processing, in one standalone system and the use of Ethernet as the readout protocol. The Ethernet technology will enable a multipoint connection and the possibility to distribute the data anywhere within the AGATA processing farm. One of the main problems found in the integration of all the system is the optimization of the FPGA resources used in the Pre-processing. Despite of the increase in the high-speed transceiver data rates of the last FPGA developed in the industry, the number of transceivers on the devices is limited. Furthermore, the FPGA cost increases largely with the amount of transceivers, which is an issue for the AGATA detectors, with a need for a large number of transceivers but not at an especially high data rate. To reduce system complexity, cost and power, the number of high speed digital lines is optimized through data aggregation, increasing the speed data rate of each line but with a reduction of 4 to 1 in the total number of transceiver lines. The solution is carried out through the Input Data Mezzanine board, conceived and developed completely under this thesis work. From a technological point of view, the main objective of the thesis is to prove the possibility of reading up to 40 optical or copper low rate inputs, using JESD204 or equivalent protocol, in the FPGA using only 10 transceivers through a time division multiplexing technique. The work is done with state-of-the-art in hardware-software FPGA design, high-speed digital design and digital communications, as well as with the knowhow of the AGATA current electronics. Although this device is designed for AGATA, we consider that this technology will be of interest for other instruments and applications

Repositori d'Objectes Digitals per a l'Ensenyament la Recerca i la Cultura

Digital.CSIC

3rd Many-core Applications Research Community (MARC) Symposium. (KIT Scientific Reports ; 7598)

Author: Becker Jürgen
Göhringer Diana
Hübner Michael
Publication venue: KIT Scientific Publishing, Karlsruhe
Publication date: 01/01/2011
Field of study

This manuscript includes recent scientific work regarding the Intel Single Chip Cloud computer and describes approaches for novel approaches for programming and run-time organization

KITopen

Methoden zur applikationsspezifischen Effizienzsteigerung adaptiver Prozessorplattformen

Author: Tradowsky Carsten
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2016
Field of study

General-Purpose Prozessoren sind für den durchschnittlichen Anwendungsfall optimiert, wodurch vorhandene Ressourcen nicht effizient genutzt werden. In der vorliegenden Arbeit wird untersucht, in wie weit es möglich ist, einen General-Purpose Prozessor an einzelne Anwendungen anzupassen und so die Effizienz zu steigern. Die Adaption kann zur Laufzeit durch das Prozessor- oder Laufzeitsystem anhand der jeweiligen Systemparameter erfolgen, um eine Effizienzsteigerung zu erzielen

KITopen

Runtime Management of Dynamic Dataflows with Partially Reconfigurable Pipelines on FPGAs

Author: Mätas Kaspar
Publication venue
Publication date: 31/12/2023
Field of study

The University of Manchester - Institutional Repository