118 research outputs found

    Real-Time neural signal decoding on heterogeneous MPSocs based on VLIW ASIPs

    Get PDF
    An important research problem, at the basis of the development of embedded systems for neuroprosthetic applications, is the development of algorithms and platforms able to extract the patient's motion intention by decoding the information encoded in neural signals. At the state of the art, no portable and reliable integrated solutions implementing such a decoding task have been identified. To this aim, in this paper, we investigate the possibility of using the MPSoC paradigm in this application domain. We perform a design space exploration that compares different custom MPSoC embedded architectures, implementing two versions of a on-line neural signal decoding algorithm, respectively targeting decoding of single and multiple acquisition channels. Each considered design points features a different application configuration, with a specific partitioning and mapping of parallel software tasks, executed on customized VLIW ASIP processing cores. Experimental results, obtained by means of FPGA-based prototyping and post-floorplanning power evaluation on a 40nm technology library, assess the performance and hardware-related costs of the considered configurations. The reported power figures demonstrate the usability of the MPSoC paradigm within the processing of bio-electrical signals and show the benefits achievable by the exploitation of the instruction-level parallelism within tasks

    Low power architectures for streaming applications

    Get PDF

    Turbo NOC: a framework for the design of Network-on-Chip-basedturbo decoder architectures

    Get PDF
    This paper proposes a general framework for the design and simulation of network-on-chip-based turbo decoder architectures. Several parameters in the design space are investigated, namely, network topology, parallelism degree, the rate at which messages are sent by processing nodes over the network, and routing strategy. The main results of this analysis are as follows: 1) the most suited topologies to achieve high throughput with a limited complexity overhead are generalized de Bruijn and generalized Kautz topologies and 2) depending on the throughput requirements, different parallelism degrees, message injection rates, and routing algorithms can be used to minimize the network area overhead

    Energy analysis and optimisation techniques for automatically synthesised coprocessors

    Get PDF
    The primary outcome of this research project is the development of a methodology enabling fast automated early-stage power and energy analysis of configurable processors for system-on-chip platforms. Such capability is essential to the process of selecting energy efficient processors during design-space exploration, when potential savings are highest. This has been achieved by developing dynamic and static energy consumption models for the constituent blocks within the processors. Several optimisations have been identified, specifically targeting the most significant blocks in terms of energy consumption. Instruction encoding mechanism reduces both the energy and area requirements of the instruction cache; modifications to the multiplier unit reduce energy consumption during inactive cycles. Both techniques are demonstrated to offer substantial energy savings. The aforementioned techniques have undergone detailed evaluation and, based on the positive outcomes obtained, have been incorporated into Cascade, a system-on-chip coprocessor synthesis tool developed by Critical Blue, to provide automated analysis and optimisation of processor energy requirements. This thesis details the process of identifying and examining each method, along with the results obtained. Finally, a case study demonstrates the benefits of the developed functionality, from the perspective of someone using Cascade to automate the creation of an energy-efficient configurable processor for system-on-chip platforms

    Architectures multi-Asip pour turbo récepteur flexible

    Get PDF
    Rapidly evolving wireless standards use modern techniques such as turbo codes, Bit Interleaved coded Modulation (BICM), high order QAM constellation, Signal Space Diversity (SSD), Multi-Input Multi-Output (MIMO) Spatial Multiplexing (SM) and Space Time Codes (STC) with different parameters for reliable high rate data transmissions. Adoption of such techniques in the transmitter can impact the receiver architecture in three ways: (1) the complex processing related to advanced techniques such as turbo codes, encourage to perform iterative processing in the receiver to improve error rate performance (2) to satisfy high throughput requirement for an iterative receiver, parallel processing is mandatory and finally (3) to allow the support of different techniques and parameters imposed, programmable yet high throughput hardware processing elements are required. In this thesis, to address the high throughput requirement with turbo processing, first of all a study of parallelism on turbo decoding is extended for turbo demodulation and turbo equalization. Based on the results acquired from the parallelism study a flexible high throughput heterogeneous multi-ASIP NoC based unified turbo receiver is proposed. The proposed architecture fulfils the target requirements in a way that: (a) Application Specific Instruction-set Processor (ASIP) exploits metric generation level parallelism and implements the required flexibility, (b) throughputs beyond the capacity of single ASIP in a turbo process are achieved through multiple ASIP elements implementing sub-block parallelism and shuffled processing and finally (c) Network on Chip is used to handle communication conflicts during parallel processing of multiple ASIPs. In pursuit to achieve a hardware model of the proposed architecture two ASIPs are conceived where the first one, namely EquASIP, is dedicated for MMSE-IC equalization and provides a flexible solution for multiple MIMO techniques adopted in multiple wireless standards with a capability to work in turbo equalization context. The second ASIP, named as DemASIP, is a flexible demapper which can be used in MIMO or single antenna environment for any modulation till 256-QAM with or without iterative demodulation. Using available TurbASIP and NoC components, the thesis concludes on an FPGA prototype of heterogeneous multi-ASIP NoC based unified turbo receiver which integrates 9 instances of 3 different ASIPs with 2 NoCs.Les normes de communication sans fil, sans cesse en évolution, imposent l'utilisation de techniques modernes telles que les turbocodes, modulation codée à entrelacement bit (BICM), constellation MAQ d'ordre élevé, diversité de constellation (SSD), multiplexage spatial et codage espace-temps multi-antennes (MIMO) avec des paramètres différents pour des transmissions fiables et de haut débit. L'adoption de ces techniques dans l'émetteur peut influencer l'architecture du récepteur de trois façons: (1) les traitement complexes relatifs aux techniques avancées comme les turbocodes, encourage à effectuer un traitement itératif dans le récepteur pour améliorer la performance en termes de taux d'erreur (2) pour satisfaire l'exigence de haut débit avec un récepteur itératif, le recours au parallélisme est obligatoire et enfin (3) pour assurer le support des différentes techniques et paramètres imposées, des processeurs de traitement matériel flexibles, mais aussi de haute performance, sont nécessaires. Dans cette thèse, pour répondre aux besoins de haut débit dans un contexte de traitement itératif, tout d'abord une étude de parallélisme sur le turbo décodage a été étendue aux applications de turbo démodulation et turbo égalisation. Partant des résultats obtenus à partir de l'étude du parallélisme, un récepteur itératif unifié basé sur un modèle d'architecture multi-ASIP hétérogène intégrant un réseau sur puce (NoC) a été proposé. L'architecture proposée répond aux exigences visées d'une manière où: (a) le concept de processeur à jeu d'instruction dédié à l'application (ASIP) exploite le parallélisme du niveau de génération de métriques et met en oeuvre la flexibilité nécessaire, (b) les débits au-delà de la capacité d'un seul ASIP dans un processus itératif sont obtenus au moyen de multiples ASIP implémentant le parallélisme de sous-blocs et le traitement combiné et enfin (c) le concept de réseau sur puce (NoC) est utilisé pour gérer les conflits de communication au cours du traitement parallèle itératif multi-ASIP. Dans le but de parvenir à un modèle matériel de l'architecture proposée, deux ASIP ont été conçus où le premier, nommé EquASIP, est dédié à l'égalisation MMSE-IC et fournit une solution flexible pour de multiples techniques multi-antennes adoptés dans plusieurs normes sans fil avec la capacité de travailler dans un contexte de turbo égalisation. Le deuxième ASIP, nommé DemASIP, est un démappeur flexible qui peut être utilisé dans un environnement multi-antennes et pour tout type de modulation jusqu'à MAQ-256 avec ou sans démodulation itérative. En intégrant ces ASIP, en plus des NoC et TurbASIP disponibles à Télécom Bretagne, la thèse conclut sur un prototype FPGA d'un récepteur itératif unifié multi-ASIP qui intègre 9 coeurs de 3 différents types d'ASIP avec 2 NoC

    Domain specific high performance reconfigurable architecture for a communication platform

    Get PDF

    Cost-Efficient Soft-Error Resiliency for ASIP-based Embedded Systems

    Full text link
    Recent decades have witnessed the rapid growth of embedded systems. At present, embedded systems are widely applied in a broad range of critical applications including automotive electronics, telecommunication, healthcare, industrial electronics, consumer electronics military and aerospace. Human society will continue to be greatly transformed by the pervasive deployment of embedded systems. Consequently, substantial amount of efforts from both industry and academic communities have contributed to the research and development of embedded systems. Application-specific instruction-set processor (ASIP) is one of the key advances in embedded processor technology, and a crucial component in some embedded systems. Soft errors have been directly observed since the 1970s. As devices scale, the exponential increase in the integration of computing systems occurs, which leads to correspondingly decrease in the reliability of computing systems. Today, major research forums state that soft errors are one of the major design technology challenges at and beyond the 22 nm technology node. Therefore, a large number of soft-error solutions, including error detection and recovery, have been proposed from differing perspectives. Nonetheless, most of the existing solutions are designed for general or high-performance systems which are different to embedded systems. For embedded systems, the soft-error solutions must be cost-efficient, which requires the tailoring of the processor architecture with respect to the feature of the target application. This thesis embodies a series of explorations for cost-efficient soft-error solutions for ASIP-based embedded systems. In this exploration, five major solutions are proposed. The first proposed solution realizes checkpoint recovery in ASIPs. By generating customized instructions, ASIP-implemented checkpoint recovery can perform at a finer granularity than what was previously possible. The fault-free performance overhead of this solution is only 1.45% on average. The recovery delay is only 62 cycles at the worst case. The area and leakage power overheads are 44.4% and 45.6% on average. The second solution explores utilizing two primitive error recovery techniques jointly. This solution includes three application-specific optimization methodologies. This solution generates the optimized error-resilient ASIPs, based on the characteristics of primitive error recovery techniques, static reliability analysis and design constraints. The resultant ASIP can be configured to perform at runtime according to the optimized recovery scheme. This solution can strategically enhance cost-efficiency for error recovery. In order to guarantee cost-efficiency in unpredictable runtime situations, the third solution explores runtime adaptation for error recovery. This solution aims to budget and adapt the error recovery operations, so as to spend the resources intelligently and to tolerate adverse influences of runtime variations. The resultant ASIP can make runtime decisions to determine the activation of spatial and temporal redundancies, according to the runtime situations. At the best case, this solution can achieve almost 50x reliability gain over the state of the art solutions. Given the increasing demand for multi-core computing systems, the last two proposed solutions target error recovery in multi-core ASIPs. The first solution of these two explores ASIP-implemented fine-grained process migration. This solution is a key infrastructure, which allows cost-efficient task management, for realizing cost-efficient soft-error recovery in multi-core ASIPs. The average time cost is only 289 machine cycles to perform process migration. The last solution explores using dynamic and adaptive mapping to assign heterogeneous recovery operations to the tasks in the multi-core context. This solution allows each individual ASIP-based processing core to dynamically adapt its specific error recovery functionality according to the corresponding task's characteristics, in terms of soft error vulnerability and execution time deadline. This solution can significantly improve the reliability of the system by almost two times, with graceful constraint penalty, in comparison to the state-of-the-art counterparts
    corecore