263 research outputs found

    Method for run time hardware code profiling for algorithm acceleration

    Get PDF
    In this paper we propose a method for run time profiling of applications on instruction level by analysis of loops. Instead of looking for coarse grain blocks we concentrate on fine grain but still costly blocks in terms of execution times. Most code profiling is done in software by introducing code into the application under profile witch has time overhead, while in this work data for the position of a loop, loop body, size and number of executions is stored and analysed using a small non intrusive hardware block. The paper describes the system mapping to runtime reconfigurable systems. The fine grain code detector block synthesis results and its functionality verification are also presented in the paper. To demonstrate the concept MediaBench multimedia benchmark running on the chosen development platform is use

    Design of multimedia processor based on metric computation

    Get PDF
    Media-processing applications, such as signal processing, 2D and 3D graphics rendering, and image compression, are the dominant workloads in many embedded systems today. The real-time constraints of those media applications have taxing demands on today's processor performances with low cost, low power and reduced design delay. To satisfy those challenges, a fast and efficient strategy consists in upgrading a low cost general purpose processor core. This approach is based on the personalization of a general RISC processor core according the target multimedia application requirements. Thus, if the extra cost is justified, the general purpose processor GPP core can be enforced with instruction level coprocessors, coarse grain dedicated hardware, ad hoc memories or new GPP cores. In this way the final design solution is tailored to the application requirements. The proposed approach is based on three main steps: the first one is the analysis of the targeted application using efficient metrics. The second step is the selection of the appropriate architecture template according to the first step results and recommendations. The third step is the architecture generation. This approach is experimented using various image and video algorithms showing its feasibility

    Energy Consumption Saving in Embedded Microprocessors Using Hardware Accelerators

    Get PDF
    This paper deals with the reduction of power consumption in embedded microprocessors. Computing power and energy efficiency are becoming the main challenges for embedded system applications. This is, in particular, the caseof wearable systems. When the power supply is provided by batteries, an important requirement for these systems is the long service life. This work investigates a method for the reduction of microprocessor energy consumption, based on the use of hardware accelerators. Their use allows to reduce the execution time and to decrease the clock frequency, so reducing the power consumption. In order to provide experimental results, authors analyze a case of study in the field of wearable devices for the processing of ECG signals. The experimental results show that the use of hardware accelerator significantly reduces the power consumption

    A Reconfigurable Processor for Heterogeneous Multi-Core Architectures

    Get PDF
    A reconfigurable processor is a general-purpose processor coupled with an FPGA-like reconfigurable fabric. By deploying application-specific accelerators, performance for a wide range of applications can be improved with such a system. In this work concepts are designed for the use of reconfigurable processors in multi-tasking scenarios and as part of multi-core systems

    Ein flexibles, heterogenes Bildverarbeitungs-Framework fĂĽr weltraumbasierte, rekonfigurierbare Datenverarbeitungsmodule

    Get PDF
    Scientific instruments as payload of current space missions are often equipped with high-resolution sensors. Thereby, especially camera-based instruments produce a vast amount of data. To obtain the desired scientific information, this data usually is processed on ground. Due to the high distance of missions within the solar system, the data rate for downlink to the ground station is strictly limited. The volume of scientific relevant data is usually less compared to the obtained raw data. Therefore, processing already has to be carried out on-board the spacecraft. An example of such an instrument is the Polarimetric and Helioseismic Imager (PHI) on-board Solar Orbiter. For acquisition, storage and processing of images, the instrument is equipped with a Data Processing Module (DPM). It makes use of heterogeneous computing based on a dedicated LEON3 processor in combination with two reconfigurable Xilinx Virtex-4 Field-Programmable Gate Arrays (FPGAs). The thesis will provide an overview of the available space-grade processing components (processors and FPGAs) which fulfill the requirements of deepspace missions. It also presents existing processing platforms which are based upon a heterogeneous system combining processors and FPGAs. This also includes the DPM of the PHI instrument, whose architecture will be introduced in detail. As core contribution of this thesis, a framework will be presented which enables high-performance image processing on such hardware-based systems while retaining software-like flexibility. This framework mainly consists of a variety of modules for hardware acceleration which are integrated seamlessly into the data flow of the on-board software. Supplementary, it makes extensive use of the dynamic in-flight reconfigurability of the used Virtex-4 FPGAs. The flexibility of the presented framework is proven by means of multiple examples from within the image processing of the PHI instrument. The framework is analyzed with respect to processing performance as well as power consumption.Wissenschaftliche Instrumente auf aktuellen Raumfahrtmissionen sind oft mit hochauflösenden Sensoren ausgestattet. Insbesondere kamerabasierte Instrumente produzieren dabei eine große Menge an Daten. Diese werden üblicherweise nach dem Empfang auf der Erde weiterverarbeitet, um daraus wissenschaftlich relevante Informationen zu gewinnen. Aufgrund der großen Entfernung von Missionen innerhalb unseres Sonnensystems ist die Datenrate zur Übertragung an die Bodenstation oft sehr begrenzt. Das Volumen der wissenschaftlich relevanten Daten ist meist deutlich kleiner als die aufgenommenen Rohdaten. Daher ist es vorteilhaft, diese bereits an Board der Sonde zu verarbeiten. Ein Beispiel für solch ein Instrument ist der Polarimetric and Helioseismic Imager (PHI) an Bord von Solar Orbiter. Um die Daten aufzunehmen, zu speichern und zu verarbeiten, ist das Instrument mit einem Data Processing Module (DPM) ausgestattet. Dieses nutzt ein heterogenes Rechnersystem aus einem dedizierten LEON3 Prozessor, zusammen mit zwei rekonfigurierbaren Xilinx Virtex-4 Field-Programmable Gate Arrays (FPGAs). Die folgende Arbeit gibt einen Überblick über verfügbare Komponenten zur Datenverarbeitung (Prozessoren und FPGAs), die den Anforderungen von Raumfahrtmissionen gerecht werden, und stellt einige existierende Plattformen vor, die auf einem heterogenen System aus Prozessor und FPGA basieren. Hierzu gehört auch das Data Processing Module des PHI Instrumentes, dessen Architektur im Verlauf dieser Arbeit beschrieben wird. Als Kernelement der Dissertation wird ein Framework vorgestellt, das sowohl eine performante, als auch eine flexible Bilddatenverarbeitung auf einem solchen System ermöglicht. Dieses Framework besteht aus verschiedenen Modulen zur Hardwarebeschleunigung und bindet diese nahtlos in den Datenfluss der On-Board Software ein. Dabei wird außerdem die Möglichkeit genutzt, die eingesetzten Virtex-4 FPGAs dynamisch zur Laufzeit zu rekonfigurieren. Die Flexibilität des vorgestellten Frameworks wird anhand mehrerer Fallbeispiele aus der Bildverarbeitung von PHI dargestellt. Das Framework wird bezüglich der Verarbeitungsgeschwindigkeit und Energieeffizienz analysiert

    Real-Time Dense Stereo Matching With ELAS on FPGA Accelerated Embedded Devices

    Full text link
    For many applications in low-power real-time robotics, stereo cameras are the sensors of choice for depth perception as they are typically cheaper and more versatile than their active counterparts. Their biggest drawback, however, is that they do not directly sense depth maps; instead, these must be estimated through data-intensive processes. Therefore, appropriate algorithm selection plays an important role in achieving the desired performance characteristics. Motivated by applications in space and mobile robotics, we implement and evaluate a FPGA-accelerated adaptation of the ELAS algorithm. Despite offering one of the best trade-offs between efficiency and accuracy, ELAS has only been shown to run at 1.5-3 fps on a high-end CPU. Our system preserves all intriguing properties of the original algorithm, such as the slanted plane priors, but can achieve a frame rate of 47fps whilst consuming under 4W of power. Unlike previous FPGA based designs, we take advantage of both components on the CPU/FPGA System-on-Chip to showcase the strategy necessary to accelerate more complex and computationally diverse algorithms for such low power, real-time systems.Comment: 8 pages, 7 figures, 2 table

    Embedded electronic systems driven by run-time reconfigurable hardware

    Get PDF
    Abstract This doctoral thesis addresses the design of embedded electronic systems based on run-time reconfigurable hardware technology –available through SRAM-based FPGA/SoC devices– aimed at contributing to enhance the life quality of the human beings. This work does research on the conception of the system architecture and the reconfiguration engine that provides to the FPGA the capability of dynamic partial reconfiguration in order to synthesize, by means of hardware/software co-design, a given application partitioned in processing tasks which are multiplexed in time and space, optimizing thus its physical implementation –silicon area, processing time, complexity, flexibility, functional density, cost and power consumption– in comparison with other alternatives based on static hardware (MCU, DSP, GPU, ASSP, ASIC, etc.). The design flow of such technology is evaluated through the prototyping of several engineering applications (control systems, mathematical coprocessors, complex image processors, etc.), showing a high enough level of maturity for its exploitation in the industry.Resumen Esta tesis doctoral abarca el diseño de sistemas electrónicos embebidos basados en tecnología hardware dinámicamente reconfigurable –disponible a través de dispositivos lógicos programables SRAM FPGA/SoC– que contribuyan a la mejora de la calidad de vida de la sociedad. Se investiga la arquitectura del sistema y del motor de reconfiguración que proporcione a la FPGA la capacidad de reconfiguración dinámica parcial de sus recursos programables, con objeto de sintetizar, mediante codiseño hardware/software, una determinada aplicación particionada en tareas multiplexadas en tiempo y en espacio, optimizando así su implementación física –área de silicio, tiempo de procesado, complejidad, flexibilidad, densidad funcional, coste y potencia disipada– comparada con otras alternativas basadas en hardware estático (MCU, DSP, GPU, ASSP, ASIC, etc.). Se evalúa el flujo de diseño de dicha tecnología a través del prototipado de varias aplicaciones de ingeniería (sistemas de control, coprocesadores aritméticos, procesadores de imagen, etc.), evidenciando un nivel de madurez viable ya para su explotación en la industria.Resum Aquesta tesi doctoral està orientada al disseny de sistemes electrònics empotrats basats en tecnologia hardware dinàmicament reconfigurable –disponible mitjançant dispositius lògics programables SRAM FPGA/SoC– que contribueixin a la millora de la qualitat de vida de la societat. S’investiga l’arquitectura del sistema i del motor de reconfiguració que proporcioni a la FPGA la capacitat de reconfiguració dinàmica parcial dels seus recursos programables, amb l’objectiu de sintetitzar, mitjançant codisseny hardware/software, una determinada aplicació particionada en tasques multiplexades en temps i en espai, optimizant així la seva implementació física –àrea de silici, temps de processat, complexitat, flexibilitat, densitat funcional, cost i potència dissipada– comparada amb altres alternatives basades en hardware estàtic (MCU, DSP, GPU, ASSP, ASIC, etc.). S’evalúa el fluxe de disseny d’aquesta tecnologia a través del prototipat de varies aplicacions d’enginyeria (sistemes de control, coprocessadors aritmètics, processadors d’imatge, etc.), demostrant un nivell de maduresa viable ja per a la seva explotació a la indústria

    Crypto-processeur architecture, programmation et évaluation de la sécurité

    Get PDF
    Les architectures des processeurs et coprocesseurs cryptographiques se montrent fréquemment vulnérables aux différents types d attaques ; en particulier, celles qui ciblent une révélation des clés chiffrées. Il est bien connu qu une manipulation des clés confidentielles comme des données standards par un processeur peut être considérée comme une menace. Ceci a lieu par exemple lors d un changement du code logiciel (malintentionné ou involontaire) qui peut provoquer que la clé confidentielle sorte en clair de la zone sécurisée. En conséquence, la sécurité de tout le système serait irréparablement menacée. L objectif que nous nous sommes fixé dans le travail présenté, était la recherche d architectures matérielles reconfigurables qui peuvent fournir une sécurité élevée des clés confidentielles pendant leur génération, leur enregistrement et leur échanges en implantant des modes cryptographiques de clés symétriques et des protocoles. La première partie de ce travail est destinée à introduire les connaissances de base de la cryptographie appliquée ainsi que de l électronique pour assurer une bonne compréhension des chapitres suivants. Deuxièmement, nous présentons un état de l art des menaces sur la confidentialité des clés secrètes dans le cas où ces dernières sont stockées et traitées dans un système embarqué. Pour lutter contre les menaces mentionnées, nous proposons alors de nouvelles règles au niveau du design de l architecture qui peuvent augmenter la résistance des processeurs et coprocesseurs cryptographiques contre les attaques logicielles. Ces règles prévoient une séparation des registres dédiés à l enregistrement de clés et ceux dédiés à l enregistrement de données : nous proposons de diviser le système en zones : de données, du chiffreur et des clés et à isoler ces zones les unes des autres au niveau du protocole, du système, de l architecture et au niveau physique. Ensuite, nous présentons un nouveau crypto-processeur intitulé HCrypt, qui intègre ces règles de séparation et qui assure ainsi une gestion sécurisée des clés. Mises à part les instructions relatives à la gestion sécurisée de clés, quelques instructions supplémentaires sont dédiées à une réalisation simple des modes de chiffrement et des protocoles cryptographiques. Dans les chapitres suivants, nous explicitons le fait que les règles de séparation suggérées, peuvent également être étendues à l architecture d un processeur généraliste et coprocesseur. Nous proposons ainsi un crypto-coprocesseur sécurisé qui est en mesure d être utilisé en relation avec d autres processeurs généralistes. Afin de démontrer sa flexibilité, le crypto-coprocesseur est interconnecté avec les processeurs soft-cores de NIOS II, de MicroBlaze et de Cortex M1. Par la suite, la résistance du crypto-processeur par rapport aux attaques DPA est testée. Sur la base de ces analyses, l architecture du processeur HCrypt est modifiée afin de simplifier sa protection contre les attaques par canaux cachés (SCA) et les attaques par injection de fautes (FIA). Nous expliquons aussi le fait qu une réorganisation des blocs au niveau macroarchitecture du processeur HCrypt, augmente la résistance du nouveau processeur HCrypt2 par rapport aux attaques de type DPA et FIA. Nous étudions ensuite les possibilités pour pouvoir reconfigurer dynamiquement les parties sélectionnées de l architecture du processeur crypto-coprocesseur. La reconfiguration dynamique peut être très utile lorsque l algorithme de chiffrement ou ses implantations doivent être changés en raison de l apparition d une vulnérabilité Finalement, la dernière partie de ces travaux de thèse, est destinée à l exécution des tests de fonctionnalité et des optimisations stricts des deux versions du cryptoprocesseur HCryptArchitectures of cryptographic processors and coprocessors are often vulnerable to different kinds of attacks, especially those targeting the disclosure of encryption keys. It is well known that manipulating confidential keys by the processor as ordinary data can represent a threat: a change in the program code (malicious or unintentional) can cause the unencrypted confidential key to leave the security area. This way, the security of the whole system would be irrecoverably compromised. The aim of our work was to search for flexible and reconfigurable hardware architectures, which can provide high security of confidential keys during their generation, storage and exchange while implementing common symmetric key cryptographic modes and protocols. In the first part of the manuscript, we introduce the bases of applied cryptography and of reconfigurable computing that are necessary for better understanding of the work. Second, we present threats to security of confidential keys when stored and processed within an embedded system. To counteract these threats, novel design rules increasing robustness of cryptographic processors and coprocessors against software attacks are presented. The rules suggest separating registers dedicated to key storage from those dedicated to data storage: we propose to partition the system into the data, cipher and key zone and to isolate the zones from each other at protocol, system, architectural and physical levels. Next, we present a novel HCrypt crypto-processor complying with the separation rules and thus ensuring secure key management. Besides instructions dedicated to secure key management, some additional instructions are dedicated to easy realization of block cipher modes and cryptographic protocols in general. In the next part of the manuscript, we show that the proposed separation principles can be extended also to a processor-coprocessor architecture. We propose a secure crypto-coprocessor, which can be used in conjunction with any general-purpose processor. To demonstrate its flexibility, the crypto-coprocessor is interconnected with the NIOS II, MicroBlaze and Cortex M1 soft-core processors. In the following part of the work, we examine the resistance of the HCrypt cryptoprocessor to differential power analysis (DPA) attacks. Following this analysis, we modify the architecture of the HCrypt processor in order to simplify its protection against side channel attacks (SCA) and fault injection attacks (FIA). We show that by rearranging blocks of the HCrypt processor at macroarchitecture level, the new HCrypt2 processor becomes natively more robust to DPA and FIA. Next, we study possibilities of dynamically reconfiguring selected parts of the processor - crypto-coprocessor architecture. The dynamic reconfiguration feature can be very useful when the cipher algorithm or its implementation must be changed in response to appearance of some vulnerability. Finally, the last part of the manuscript is dedicated to thorough testing and optimizations of both versions of the HCrypt crypto-processor. Architectures of crypto-processors and crypto-coprocessors are often vulnerable to software attacks targeting the disclosure of encryption keys. The thesis introduces separation rules enabling crypto-processor/coprocessors to support secure key management. Separation rules are implemented on novel HCrypt crypto-processor resistant to software attacks targetting the disclosure of encryption keysST ETIENNE-Bib. électronique (422189901) / SudocSudocFranceF
    • …
    corecore