34 research outputs found

    Implantation sur circuit SoC-FPGA d'un système de chiffrement/déchiffrement AES-128 bits en utilisant deux approches de différents niveaux d'abstraction

    Get PDF
    RÉSUMÉ : La sécurité des données est une priorité absolue dans le monde technologique. Pour garantir la sécurité et la confidentialité des données, l'usage des systèmes de chiffrement/déchiffrement devient une nécessité dans plusieurs domaines. Dans ce mémoire nous présentons une architecture simple de système de chiffrement avancé à 128 bits en mode compteur (AES-CTR-128 bit), implantée sur une carte PYNQ-Z2 pour chiffrer/déchiffrer des signaux d'électrocardiogramme ECG (ElectroCardioGram) de la base de données MIT-BIH. Le système n'utilise que 13% des ressources matérielles du circuit Xilinx ZYNQ XC7Z020. Il consomme une puissance de 43 mW et opère à une fréquence maximale de 109.43 MHz, qui correspond à un débit maximal de 14 Gbps. Le temps d'exécution de chiffrement et de déchiffrement d'un fichier de valeurs séparée par des virgules CSV (Comma Separated Value) par rapport d'un fichier texte TXT (Text) est environ deux fois plus court dans les deux plateformes utilisant deux approches ayant des niveaux d'abstraction différents. La première utilise la programmation bas-niveaux via la plateforme Xilinx Vitis alors que la seconde utilise l'outil Jupyter/Python. L'architecture matérielle proposée est environ quatre fois plus rapide que l'implantation logicielle et il y a une légère différence au niveau du temps d'exécution pour l'implantation de notre architecture sur les deux plateformes présentées (Vivado/Vitis ou Jupyter/Python). Nous avons aussi testé notre architecture matérielle avec d'autres types de données tels que les signaux audio et des images. Nous avons utilisé la plateforme Jupyter/Python pour sa simplicité de manipulation. Le chiffrement/déchiffrement d'un signal audio d'une durée de 7 secondes et d'une fréquence d'échantillonnage de 8 kHz est réduit respectivement à 4.6 ms et 4.87 ms, par rapport à 16.18 ms et 15.8 ms pour le chiffrement/déchiffrement d'un signal audio par l'implantation logicielle. De même pour l'image couleur et l'image en niveau de gris. Ainsi que le temps de chiffrement d'une image couleur prend entre trois à quatre fois le temps de chiffrement d'une image en niveau de gris dans les deux implantations logicielle et matérielle. L'architecture matérielle présentée peut être utilisée dans un large éventail d'applications embarquées. Les résultats présentés ont montré que l'architecture proposée a surpassé toutes les autres implantations existantes sur FPGA. -- Mot(s) clé(s) en français : Cryptographie, AES, Circuit FPGA, Signal ECG, Circuit ZYNQ, Chiffrement/Déchiffrement, Cryptage/Décryptage. -- ABSTRACT : Data security is a top priority in the technological world. To ensure data security and privacy, the use of encryption/decryption systems becomes a necessity in several areas. In this dissertation, we present a simple architecture of advanced 128-bit counter mode encryption systems (AES-CTR-128 bit), implemented on a PYNQ-Z2 board to encrypt/decrypt electrocardiogram (ECG) signals from the MIT-BIH database. The system uses only 13% of the hardware resources of the Xilinx ZYNQ XC7Z020 chip. It consumes 43 mW of power and operates at a maximum frequency of 109.43 MHz, which corresponds to a maximum through of 14 Gbps. The execution time of encryption and decryption of the comma-separated value (CSV) file compared to the text file (TXT) is about twice as short in both platforms using two approaches with different abstraction levels. The first use low-level programming via the Xilinx Vitis platform while the second uses the Jupyter/Python tool. The proposed hardware architecture is about four times faster than the software implementation and there is a slight difference in execution time for the implementation of our architecture on the two platforms presented (Vivado/Vitis or Jupyter/Python). We also tested our hardware architecture with other types of data such as audio signals and images. We used the Jupyter/Python platform for its simplicity of handling. The encryption/decryption of an audio signal with a duration of 7 seconds and a sampling rate of 8 kHz are reduced to 4.6 ms and 4.87 ms, respectively, compared to 16.18 ms and 15.8 ms for the encryption/decryption of an audio signal by the software implementation. The same applies to the color and grayscale image. Thus, the encryption time of a color image takes between three and four times the encryption time of a grayscale image in both software and hardware implementations. The presented hardware architecture can be used in a wide range of embedded applications. The presented results showed that the proposed architecture outperformed all other existing FPGA-based implementations. -- Mot(s) clé(s) en anglais : Cryptography, AES, FPGA circuit, ECG signal, ZYNQ circuit, Encryption/Decryption

    Fast and Clean: Auditable high-performance assembly via constraint solving

    Get PDF
    Handwritten assembly is a widely used tool in the development of high-performance cryptography: By providing full control over instruction selection, instruction scheduling, and register allocation, highest performance can be unlocked. On the flip side, developing handwritten assembly is not only time-consuming, but the artifacts produced also tend to be difficult to review and maintain – threatening their suitability for use in practice. In this work, we present SLOTHY (Super (Lazy) Optimization of Tricky Handwritten assemblY), a framework for the automated superoptimization of assembly with respect to instruction scheduling, register allocation, and loop optimization (software pipelining): With SLOTHY, the developer controls and focuses on algorithm and instruction selection, providing a readable “base” implementation in assembly, while SLOTHY automatically finds optimal and traceable instruction scheduling and register allocation strategies with respect to a model of the target (micro)architecture. We demonstrate the flexibility of SLOTHY by instantiating it with models of the Cortex-M55, Cortex-M85, Cortex-A55 and Cortex-A72 microarchitectures, implementing the Armv8.1-M+Helium and AArch64+Neon architectures. We use the resulting tools to optimize three workloads: First, for Cortex-M55 and Cortex-M85, a radix-4 complex Fast Fourier Transform (FFT) in fixed-point and floating-point arithmetic, fundamental in Digital Signal Processing. Second, on Cortex-M55, Cortex-M85, Cortex-A55 and Cortex-A72, the instances of the Number Theoretic Transform (NTT) underlying CRYSTALS-Kyber and CRYSTALS-Dilithium, two recently announced winners of the NIST Post-Quantum Cryptography standardization project. Third, for Cortex-A55, the scalar multiplication for the elliptic curve key exchange X25519. The SLOTHY-optimized code matches or beats the performance of prior art in all cases, while maintaining compactness and readability

    Modulare, verteilte Hardware-Software-Architektur für humanoide Roboter

    Get PDF
    Humanoide Roboter sind hochkomplexe Systeme. Sie zeichnen sich durch ein sehr heterogenes Sensor- und Aktorsystem aus, welches wiederum sehr hohe und breit gefächerte Anforderungen an die verwendete Architektur stellt. Es wird sowohl der Entwurf einer funktionalen Steuerungsarchitektur, das verwendete Softwarerahmenwerk als auch die Abbildung auf eine dezidierte Hardwarearchitektur beschrieben

    Modulare, verteilte Hardware-Software-Architektur für humanoide Roboter

    Get PDF
    Humanoide Roboter sind hochkomplexe Systeme. Sie zeichnen sich durch ein sehr heterogenes Sensor- und Aktorsystem aus, welches wiederum sehr hohe und breit gefächerte Anforderungen an die verwendete Architektur stellt. Es wird sowohl der Entwurf einer funktionalen Steuerungsarchitektur, das verwendete Softwarerahmenwerk als auch die Abbildung auf eine dezidierte Hardwarearchitektur beschrieben

    Hardware/software architectures for iris biometrics

    Get PDF
    Nowadays, the necessity of identifying users of facilities and services has become quite important not only to determine who accesses a system and/or service, but also to determine which privileges should be provided to each user. For achieving such identification, Biometrics is emerging as a technology that provides a high level of security, as well as being convenient and comfortable for the citizen. Most biometric systems are based on computer solutions, where the identification process is performed by servers or workstations, whose cost and processing time make them not feasible for some situations. However, Microelectronics can provide a suitable solution without the need of complex and expensive computer systems. Microelectronics is a subfield of Electronics and as the name suggests, is related to the study, development and/or manufacturing of electronic components, i.e. integrated circuits (ICs). We have focused our research in a concrete field of Microelectronics: hardware/software co-design. This technique is widely used for developing specific and high computational cost devices. Its basis relies on using both hardware and software solutions in an effective way, thus, obtaining a device faster than just a software solution, or smaller devices that use dedicated hardware developed for all the processes. The questions on how we can obtain an effective solution for Biometrics will be solved considering all the different aspects of these systems. In this Thesis, we have made two important contributions: the first one for a verification system based on ID token and secondly, a search engine used for massive recognition systems, both of them related to Iris Biometrics. The first relevant contribution is a biometric system architecture proposal based on ID tokens in a distributed system. In this contribution, we have specified some considerations to be done in the system and describe the different functionalities of the elements which form it, such as the central servers and/or the terminals. The main functionality of the terminal is just left to acquiring the initial biometric raw data, which will be transmitted under security cryptographic methods to the token, where all the biometric process will be performed. The ID token architecture is based on Hardware/software co-design. The architecture proposed, independent of the modality, divides the biometric process into hardware and software in order to achieve further performance functions, more than in the existing tokens. This partition considers not only the decrease of computational time hardware can provide, but also the reduction of area and power consumption, the increase in security levels and the effects on performance in all the design. To prove the proposal made, we have implemented an ID token based on Iris Biometrics following our premises. We have developed different modules for an iris algorithm both in hardware and software platforms to obtain results necessary for an effective combination of same. We have also studied different alternatives for solving the partition problem in the Hardware/software co-design issue, leading to results which point out tabu search as the fastest algorithm for this purpose. Finally, with all the data obtained, we have been able to obtain different architectures according to different constraints. We have presented architectures where the time is a major requirement, and we have obtained 30% less processing time than in all software solutions. Likewise, another solution has been proposed which provides less area and power consumption. When considering the performance as the most important constraint, two architectures have been presented, one which also tries to minimize the processing time and another which reduces hardware area and power consumption. In regard the security we have also shown two architectures considering time and hardware area as secondary requirements. Finally, we have presented an ultimate architecture where all these factors were considered. These architectures have allowed us to study how hardware improves the security against authentication attacks, how the performance is influenced by the lack of floating point operations in hardware modules, how hardware reduces time with software reducing the hardware area and the power consumption. The other singular contribution made is the development of a search engine for massive identification schemes, where time is a major constraint as the comparison should be performed over millions of users. We have initially proposed two implementations: following a centralized architecture, where memories are connected to the microprocessor, although the comparison is performed by a dedicated hardware co-processor, and a second approach, where we have connected the memory driver directly in the hardware coprocessor. This last architecture has showed us the importance of a correct connection between the elements used when time is a major requirement. A graphical representation of the different aspects covered in this Thesis is presented in Fig.1, where the relation between the different topics studied can be seen. The main topics, Biometrics and Hardware/Software Co-design have been studied, where several aspects of them have been described, such as the different Biometric modalities, where we have focussed on Iris Biometrics and the security related to these systems. Hardware/Software Co-design has been studied by presenting different design alternatives and by identifying the most suitable configuration for ID Tokens. All the data obtained from this analysis has allowed us to offer two main proposals: The first focuses on the development of a fast search engine device, and the second combines all the factors related to both sciences with regards ID tokens, where different aspects have been combined in its Hardware/Software Design. Both approaches have been implemented to show the feasibility of our proposal. Finally, as a result of the investigation performed and presented in this thesis, further work and conclusions can be presented as a consequence of the work developed.-----------------------------------------------------------------------------------------Actualmente la identificación usuarios para el acceso a recintos o servicios está cobrando importancia no sólo para poder permitir el acceso, sino además para asignar los correspondientes privilegios según el usuario del que se trate. La Biometría es una tecnología emergente que además de realizar estas funciones de identificación, aporta mayores niveles de seguridad que otros métodos empleados, además de resultar más cómodo para el usuario. La mayoría de los sistemas biométricos están basados en ordenadores personales o servidores, sin embargo, la Microelectrónica puede aportar soluciones adecuadas para estos sistemas, con un menor coste y complejidad. La Microelectrónica es un campo de la Electrónica, que como su nombre sugiere, se basa en el estudio, desarrollo y/o fabricación de componentes electrónicos, también denominados circuitos integrados. Hemos centrado nuestra investigación en un campo específico de la Microelectrónica llamado co-diseño hardware/software. Esta técnica se emplea en el desarrollo de dispositivos específicos que requieren un alto gasto computacional. Se basa en la división de tareas a realizar entre hardware y software, consiguiendo dispositivos más rápidos que aquellos únicamente basados en una de las dos plataformas, y más pequeños que aquellos que se basan únicamente en hardware. Las cuestiones sobre como podemos crear soluciones aplicables a la Biometría son las que intentan ser cubiertas en esta tesis. En esta tesis, hemos propuesto dos importantes contribuciones: una para aquellos sistemas de verificación que se apoyan en dispositivos de identificación y una segunda que propone el desarrollo de un sistema de búsqueda masiva. La primera aportación es la metodología para el desarrollo de un sistema distribuido basado en dispositivos de identificación. En nuestra propuesta, el sistema de identificación está formado por un proveedor central de servicios, terminales y dichos dispositivos. Los terminales propuestos únicamente tienen la función de adquirir la muestra necesaria para la identificación, ya que son los propios dispositivos quienes realizan este proceso. Los dispositivos se apoyan en una arquitectura basada en codiseño hardware/software, donde los procesos biométricos se realizan en una de las dos plataformas, independientemente de la modalidad biométrica que se trate. El reparto de tareas se realiza de tal manera que el diseñador pueda elegir que parámetros le interesa más enfatizar, y por tanto se puedan obtener distintas arquitecturas según se quiera optimizar el tiempo de procesado, el área o consumo, minimizar los errores de identificación o incluso aumentar la seguridad del sistema por medio de la implementación en hardware de aquellos módulos que sean más susceptibles a ser atacados por intrusos. Para demostrar esta propuesta, hemos implementado uno de estos dispositivos basándonos en un algoritmo de reconocimiento por iris. Hemos desarrollado todos los módulos de dicho algoritmo tanto en hardware como en software, para posteriormente realizar combinaciones de ellos, en busca de arquitecturas que cumplan ciertos requisitos. Hemos estudiado igualmente distintas alternativas para la solucionar el problema propuesto, basándonos en algoritmos genéticos, enfriamiento simulado y búsqueda tabú. Con los datos obtenidos del estudio previo y los procedentes de los módulos implementados, hemos obtenido una arquitectura que minimiza el tiempo de ejecución en un 30%, otra que reduce el área y el consumo del dispositivo, dos arquitecturas distintas que evitan la pérdida de precisión y por tanto minimizan los errores en la identificación: una que busca reducir el área al máximo posible y otra que pretende que el tiempo de procesado sea mínimo; dos arquitecturas que buscan aumentar la seguridad, minimizando ya sea el tiempo o el área y por último, una arquitectura donde todos los factores antes nombrados son considerados por igual. La segunda contribución de la tesis se refiere al desarrollo de un motor de búsqueda para identificación masiva. La premisa seguida en esta propuesta es la de minimizar el tiempo lo más posible para que los usuarios no deban esperar mucho tiempo para ser identificados. Para ello hemos propuesto dos alternativas: una arquitectura clásica donde las memorias están conectadas a un microprocesador central, el cual a su vez se comunica con un coprocesador que realiza las funciones de comparación. Una segunda alternativa, donde las memorias se conectan directamente a dicho co-procesador, evitándose el uso del microprocesador en el proceso de comparación. Ambas propuestas son comparadas y analizadas, mostrando la importancia de una correcta y apropiada conexión de los distintos elementos que forman un sistema. La Fig. 2 muestra los distintos temas tratados en esta tesis, señalando la relación existente entre ellos. Los principales temas estudiados son la Biometría y el co-diseño hardware/software, describiendo distintos aspectos de ellos, como las diferentes modalidades biométricas, centrándonos en la Biometría por iris o la seguridad relativa a estos sistemas. En el caso del co-diseño hardware/software se presenta un estado de la técnica donde se comentan diversas alternativas para el desarrollo de sistemas empotrados, el trabajo propuesto por otros autores en el ¶ambito del co-diseño y por último qué características deben cumplir los dispositivos de identificación como sistemas empotrados. Con toda esta información pasamos al desarrollo de las propuestas antes descritas y los desarrollos realizados. Finalmente, conclusiones y trabajo futuro son propuestos a raíz de la investigación realizada

    Induction heating converter's design, control and modeling applied to continuous wire heating

    Get PDF
    Induction heating is a heating method for electrically conductive materials that takes advantage of the heat generated by the Eddy currents originated by means of a varying magnetic field. Since Michael Faraday discovered electromagnetic induction in 1831, this phenomena has been widely studied in many applications like transformers, motors or generators' design. At the turn of the 20th century, induction started to be studied as a heating method, leading to the construction of the first industrial induction melting equipment by the Electric Furnace Company in 1927. At first, the varying magnetic fields were obtained with spark-gap generators, vacuum-tube generators and low frequency motor-generator sets. With the emergence of reliable semiconductors in the late 1960's, motor-generators were replaced by solid-state converters for low frequency applications. With regard to the characterization of the inductor-workpiece system, the first models used to understand the load's behavior were based on analytical methods. These methods were useful to analyze the overall behavior of the load, but they were not accurate enough for a precise analysis and were limited to simple geometries. With the emergence of computers, numerical methods experienced a tremendous growth in the 1990's and started to be applied in the induction heating field. Nowadays, the development of commercial softwares that allow this type of analysis have started to make the use of numerical methods popular among research centers and enterprises. This type of softwares allow a great variety of complex analysis with high precision, consequently diminishing the trial and error process. The research realized in last decades, the increase in the utilization of numerical modeling and the appearance and improvement of semiconductor devices, with their corresponding cost reduction, have caused the spread of induction heating in many fields. Induction heating equipments can be found in many applications, since domestic cookers to high-power aluminum melting furnaces or automotive sealing equipments, and are becoming more and more popular thanks to their easy control, quick heating and the energy savings obtained. The present thesis focuses on the application of induction heating to wire heating. The wire heating is a continuous heating method in which the wire is continuously feeding the heating inductor. This heating method allows high production rates with reduced space requirements and is usually found in medium to high power industrial processes working 24 hours per day. The first chapters of this study introduce the induction heating phenomena, its modeling and the converters and tanks used. Afterwards, a multichannel converter for high-power and high-frequency applications is designed and implemented with the aim of providing modularity to the converter and reduce the designing time, the production cost and its maintenance. Moreover, this type of structure provides reliability to the system and enables low repairing times, which is an extremely interesting feature for 24 hours processes. Additionally, a software phase-locked loop for induction heating applications is designed and implemented to prove its flexibility and reliability. This type of control allows the use of the same hardware for different applications, which is attractive for the case of industrial applications. This phase-locked loop is afterwards used to design and implement a load-adaptative control that varies the references to have soft-switching according to load's variation, improving converter's performance. Finally, the modeling of a continuous induction wire hardening system is realized, solving the difficulty of considering the mutual influence between the thermal, electromagnetic and electric parameters. In this thesis, a continuous process is modeled and tested using numerical methods and considering converter's operation and influence in the process.Postprint (published version

    Heterogeneity-awareness in multithreaded multicore processors

    Get PDF
    During the last decades, Computer Architecture has experienced a great series of revolutionary changes. The increasing transistor count on a single chip has led to some of the main milestones in the field, from the release of the first Superscalar (1965) to the state-of-the-art Multithreaded Multicore Architectures, like the Intel Core i7 (2009).Moore's Law has continued for almost half of a century and is not expected to stop for at least another decade, and perhaps much longer. Moore observed a trend in the process technology advances. So, the number of transistors that can be placed inexpensively on an integrated circuit has increased exponentially, doubling approximately every two years. Nevertheless, having more available transistors can not be always directly translated into having more performance.The complexity of state-of-the-art software has reached heights unthinkable in prior ages, both in terms of the amount of computation and the complexity involved. If we deeply analyze this complexity in software we would realize that software is comprised of smaller execution processes that, although maintaining certain spatial/temporal locality, imply an inherently heterogeneous behavior. That is, during execution time the hardware executes very different portions of software, with huge differences in terms of behavior and hardware requirements. This heterogeneity in the behaviour of the software is not specific of the latest videogame, but it is inherent to software programming itself, since the very beginning of Algorithmics.In this PhD dissertation we deeply analyze the inherent heterogeneity present in software behavior. We identify the main issues and sources of this heterogeneity, that hamper most of the state-of-the-art processor designs from obtaining their maximum potential. Hence, the heterogeneity in software turns most of the current processors, commonly called general-purpose processors, into overdesigned. That is, they have much more hardware resources than really needed to execute the software running on them. This fact would not represent a main problem if we were not concerned on the additional power consumption involved in software computation.The final goal of this PhD dissertation consists in assigning each portion of software exactly the amount of hardware resources really needed to fully exploit its maximal potential; without consuming more energy than the strictly needed. That is, obtaining complexity-effective executions using the inherent heterogeneity in software behavior as steering indicator. Thus, we start deeply analyzing the heterogenous behaviour of the software run on top of general-purpose processors and then matching it on top of a heterogeneously distributed hardware, which explicitly exploit heterogeneous hardware requirements. Only by being heterogeneity-aware in software, and appropriately matching this software heterogeneity on top of hardware heterogeneity, may we effectively obtain better processor designs.The PhD dissertation is comprised of four main contributions that cover both multithreaded single-core (hdSMT) and multicore (TCA Algorithm, hTCA Framework and MFLUSH) scenarios, deeply explained in their corresponding chapters in the PhD dissertation memory. Overall, these contributions cover a significant range of the Heterogeneity-Aware Processors' design space. Within this design space, we have focused on the state-of-the-art trend in processor design: Multithreaded Multicore (CMP+SMT) Processors.We make special emphasis on the MPsim simulation tool, specifically designed and developed for this PhD dissertation. This tool has already gone beyond this PhD dissertation, becoming a reference tool by an important group of researchers spread over the Computer Architecture Department (DAC) at the Polytechnic University of Catalonia (UPC), the Barcelona Supercomputing Center (BSC) and the University of Las Palmas de Gran Canaria (ULPGC)

    Specialization and reconfiguration of lightweight mobile processors for data-parallel applications

    Get PDF
    The worldwide utilization of mobile devices makes the segment of low power mobile processors leading in the entire computer industry. Customers demand low-cost, high-performance and energy-efficient mobile devices, which execute sophisticated mobile applications such as multimedia and 3D games. State-of-the-art mobile devices already utilize chip multiprocessors (CMP) with dedicated accelerators that exploit data-level parallelism (DLP) in these applications. Such heterogeneous system design enable the mobile processors to deliver the desired performance and efficiency. The heterogeneity however increases the processors complexity and manufacturing cost when adding extra special-purpose hardware for the accelerators. In this thesis, we propose new hardware techniques that leverage the available resources of a mobile CMP to achieve cost-effective acceleration of DLP workloads. Our techniques are inspired by classic vector architectures and the latest reconfigurable architectures, which both achieve high power efficiency when running DLP workloads. The high requirement of additional resources for these two architectures limits their applicability beyond high-performance computers. To achieve their advantages in mobile devices, we propose techniques that: 1) specialize the lightweight mobile cores for classic vector execution of DLP workloads; 2) dynamically tune the number of cores for the specialized execution; and 3) reconfigure a bulk of the existing general purpose execution resources into a compute hardware accelerator. Specialization enables one or more cores to process configurable large vector operands with new special purpose vector instructions. Reconfiguration goes one step further and allow the compute hardware in mobile cores to dynamically implement the entire functionality of diverse compute algorithms. The proposed specialization and reconfiguration techniques are applicable to a diverse range of general purpose processors available in mobile devices nowadays. However, we chose to implement and evaluate them on a lightweight processor based on the Explicit Data Graph Execution architecture, which we find promising for the research of low-power processors. The implemented techniques improve the mobile processor performance and the efficiency on its existing general purpose resources. The processor with enabled specialization/reconfiguration techniques efficiently exploits DLP without the extra cost of special-purpose accelerators.La utilización de dispositivos móviles a nivel mundial hace que el segmento de procesadores móviles de bajo consumo lidere la industria de computación. Los clientes piden dispositivos móviles de bajo coste, alto rendimiento y bajo consumo, que ejecuten aplicaciones móviles sofisticadas, tales como multimedia y juegos 3D.Los dispositivos móviles más avanzados utilizan chips con multiprocesadores (CMP) con aceleradores dedicados que explotan el paralelismo a nivel de datos (DLP) en estas aplicaciones. Tal diseño de sistemas heterogéneos permite a los procesadores móviles ofrecer el rendimiento y la eficiencia deseada. La heterogeneidad sin embargo aumenta la complejidad y el coste de fabricación de los procesadores al agregar hardware de propósito específico adicional para implementar los aceleradores. En esta tesis se proponen nuevas técnicas de hardware que aprovechan los recursos disponibles en un CMP móvil para lograr una aceleración con bajo coste de las aplicaciones con DLP. Nuestras técnicas están inspiradas por los procesadores vectoriales clásicos y por las recientes arquitecturas reconfigurables, pues ambas logran alta eficiencia en potencia al ejecutar cargas de trabajo DLP. Pero la alta exigencia de recursos adicionales que estas dos arquitecturas necesitan, limita sus aplicabilidad más allá de las computadoras de alto rendimiento. Para lograr sus ventajas en dispositivos móviles, en esta tesis se proponen técnicas que: 1) especializan núcleos móviles ligeros para la ejecución vectorial clásica de cargas de trabajo DLP; 2) ajustan dinámicamente el número de núcleos de ejecución especializada; y 3) reconfiguran en bloque los recursos existentes de ejecución de propósito general en un acelerador hardware de computación. La especialización permite a uno o más núcleos procesar cantidades configurables de operandos vectoriales largos con nuevas instrucciones vectoriales. La reconfiguración da un paso más y permite que el hardware de cómputo en los núcleos móviles ejecute dinámicamente toda la funcionalidad de diversos algoritmos informáticos. Las técnicas de especialización y reconfiguración propuestas son aplicables a diversos procesadores de propósito general disponibles en los dispositivos móviles de hoy en día. Sin embargo, en esta tesis se ha optado por implementarlas y evaluarlas en un procesador ligero basado en la arquitectura "Explicit Data Graph Execution", que encontramos prometedora para la investigación de procesadores de baja potencia. Las técnicas aplicadas mejoraran el rendimiento del procesador móvil y la eficiencia energética de sus recursos para propósito general ya existentes. El procesador con técnicas de especialización/reconfiguración habilitadas explota eficientemente el DLP sin el coste adicional de los aceleradores de propósito especial
    corecore