197 research outputs found

    Accuracy-Guaranteed Fixed-Point Optimization in Hardware Synthesis and Processor Customization

    Get PDF
    RÉSUMÉ De nos jours, le calcul avec des nombres fractionnaires est essentiel dans une vaste gamme d’applications de traitement de signal et d’image. Pour le calcul numérique, un nombre fractionnaire peut être représenté à l’aide de l’arithmétique en virgule fixe ou en virgule flottante. L’arithmétique en virgule fixe est largement considérée préférable à celle en virgule flottante pour les architectures matérielles dédiées en raison de sa plus faible complexité d’implémentation. Dans la mise en œuvre du matériel, la largeur de mot attribuée à différents signaux a un impact significatif sur des métriques telles que les ressources (transistors), la vitesse et la consommation d'énergie. L'optimisation de longueur de mot (WLO) en virgule fixe est un domaine de recherche bien connu qui vise à optimiser les chemins de données par l'ajustement des longueurs de mots attribuées aux signaux. Un nombre en virgule fixe est composé d’une partie entière et d’une partie fractionnaire. Il y a une limite inférieure au nombre de bits alloués à la partie entière, de façon à prévenir les débordements pour chaque signal. Cette limite dépend de la gamme de valeurs que peut prendre le signal. Le nombre de bits de la partie fractionnaire, quant à lui, détermine la taille de l'erreur de précision finie qui est introduite dans les calculs. Il existe un compromis entre la précision et l'efficacité du matériel dans la sélection du nombre de bits de la partie fractionnaire. Le processus d'attribution du nombre de bits de la partie fractionnaire comporte deux procédures importantes: la modélisation de l'erreur de quantification et la sélection de la taille de la partie fractionnaire. Les travaux existants sur la WLO ont porté sur des circuits spécialisés comme plate-forme cible. Dans cette thèse, nous introduisons de nouvelles méthodologies, techniques et algorithmes pour améliorer l’implémentation de calculs en virgule fixe dans des circuits et processeurs spécialisés. La thèse propose une approche améliorée de modélisation d’erreur, basée sur l'arithmétique affine, qui aborde certains problèmes des méthodes existantes et améliore leur précision. La thèse introduit également une technique d'accélération et deux algorithmes semi-analytiques pour la sélection de la largeur de la partie fractionnaire pour la conception de circuits spécialisés. Alors que le premier algorithme suit une stratégie de recherche progressive, le second utilise une méthode de recherche en forme d'arbre pour l'optimisation de la largeur fractionnaire. Les algorithmes offrent deux options de compromis entre la complexité de calcul et le coût résultant. Le premier algorithme a une complexité polynomiale et obtient des résultats comparables avec des approches heuristiques existantes. Le second algorithme a une complexité exponentielle, mais il donne des résultats quasi-optimaux par rapport à une recherche exhaustive. Cette thèse propose également une méthode pour combiner l'optimisation de la longueur des mots dans un contexte de conception de processeurs configurables. La largeur et la profondeur des blocs de registres et l'architecture des unités fonctionnelles sont les principaux objectifs ciblés par cette optimisation. Un nouvel algorithme d'optimisation a été développé pour trouver la meilleure combinaison de longueurs de mots et d'autres paramètres configurables dans la méthode proposée. Les exigences de précision, définies comme l'erreur pire cas, doivent être respectées par toute solution. Pour faciliter l'évaluation et la mise en œuvre des solutions retenues, un nouvel environnement de conception de processeur a également été développé. Cet environnement, qui est appelé PolyCuSP, supporte une large gamme de paramètres, y compris ceux qui sont nécessaires pour évaluer les solutions proposées par l'algorithme d'optimisation. L’environnement PolyCuSP soutient l’exploration rapide de l'espace de solution et la capacité de modéliser différents jeux d'instructions pour permettre des comparaisons efficaces.----------ABSTRACT Fixed-point arithmetic is broadly preferred to floating-point in hardware development due to the reduced hardware complexity of fixed-point circuits. In hardware implementation, the bitwidth allocated to the data elements has significant impact on efficiency metrics for the circuits including area usage, speed and power consumption. Fixed-point word-length optimization (WLO) is a well-known research area. It aims to optimize fixed-point computational circuits through the adjustment of the allocated bitwidths of their internal and output signals. A fixed-point number is composed of an integer part and a fractional part. There is a minimum number of bits for the integer part that guarantees overflow and underflow avoidance in each signal. This value depends on the range of values that the signal may take. The fractional word-length determines the amount of finite-precision error that is introduced in the computations. There is a trade-off between accuracy and hardware cost in fractional word-length selection. The process of allocating the fractional word-length requires two important procedures: finite-precision error modeling and fractional word-length selection. Existing works on WLO have focused on hardwired circuits as the target implementation platform. In this thesis, we introduce new methodologies, techniques and algorithms to improve the hardware realization of fixed-point computations in hardwired circuits and customizable processors. The thesis proposes an enhanced error modeling approach based on affine arithmetic that addresses some shortcomings of the existing methods and improves their accuracy. The thesis also introduces an acceleration technique and two semi-analytical fractional bitwidth selection algorithms for WLO in hardwired circuit design. While the first algorithm follows a progressive search strategy, the second one uses a tree-shaped search method for fractional width optimization. The algorithms offer two different time-complexity/cost efficiency trade-off options. The first algorithm has polynomial complexity and achieves comparable results with existing heuristic approaches. The second algorithm has exponential complexity but achieves near-optimal results compared to an exhaustive search. The thesis further proposes a method to combine word-length optimization with application-specific processor customization. The supported datatype word-length, the size of register-files and the architecture of the functional units are the main target objectives to be optimized. A new optimization algorithm is developed to find the best combination of word-length and other customizable parameters in the proposed method. Accuracy requirements, defined as the worst-case error bound, are the key consideration that must be met by any solution. To facilitate evaluation and implementation of the selected solutions, a new processor design environment was developed. This environment, which is called PolyCuSP, supports necessary customization flexibility to realize and evaluate the solutions given by the optimization algorithm. PolyCuSP supports rapid design space exploration and capability to model different instruction-set architectures to enable effective compari

    Autonomous Recovery Of Reconfigurable Logic Devices Using Priority Escalation Of Slack

    Get PDF
    Field Programmable Gate Array (FPGA) devices offer a suitable platform for survivable hardware architectures in mission-critical systems. In this dissertation, active dynamic redundancy-based fault-handling techniques are proposed which exploit the dynamic partial reconfiguration capability of SRAM-based FPGAs. Self-adaptation is realized by employing reconfiguration in detection, diagnosis, and recovery phases. To extend these concepts to semiconductor aging and process variation in the deep submicron era, resilient adaptable processing systems are sought to maintain quality and throughput requirements despite the vulnerabilities of the underlying computational devices. A new approach to autonomous fault-handling which addresses these goals is developed using only a uniplex hardware arrangement. It operates by observing a health metric to achieve Fault Demotion using Recon- figurable Slack (FaDReS). Here an autonomous fault isolation scheme is employed which neither requires test vectors nor suspends the computational throughput, but instead observes the value of a health metric based on runtime input. The deterministic flow of the fault isolation scheme guarantees success in a bounded number of reconfigurations of the FPGA fabric. FaDReS is then extended to the Priority Using Resource Escalation (PURE) online redundancy scheme which considers fault-isolation latency and throughput trade-offs under a dynamic spare arrangement. While deep-submicron designs introduce new challenges, use of adaptive techniques are seen to provide several promising avenues for improving resilience. The scheme developed is demonstrated by hardware design of various signal processing circuits and their implementation on a Xilinx Virtex-4 FPGA device. These include a Discrete Cosine Transform (DCT) core, Motion Estimation (ME) engine, Finite Impulse Response (FIR) Filter, Support Vector Machine (SVM), and Advanced Encryption Standard (AES) blocks in addition to MCNC benchmark circuits. A iii significant reduction in power consumption is achieved ranging from 83% for low motion-activity scenes to 12.5% for high motion activity video scenes in a novel ME engine configuration. For a typical benchmark video sequence, PURE is shown to maintain a PSNR baseline near 32dB. The diagnosability, reconfiguration latency, and resource overhead of each approach is analyzed. Compared to previous alternatives, PURE maintains a PSNR within a difference of 4.02dB to 6.67dB from the fault-free baseline by escalating healthy resources to higher-priority signal processing functions. The results indicate the benefits of priority-aware resiliency over conventional redundancy approaches in terms of fault-recovery, power consumption, and resource-area requirements. Together, these provide a broad range of strategies to achieve autonomous recovery of reconfigurable logic devices under a variety of constraints, operating conditions, and optimization criteria

    Simulated annealing based datapath synthesis

    Get PDF

    Block-level test scheduling under power dissipation constraints

    Get PDF
    As dcvicc technologies such as VLSI and Multichip Module (MCM) become mature, and larger and denser memory ICs arc implemented for high-performancc digital systems, power dissipation becomes a critical factor and can no longer be ignored cither in normal operation of the system or under test conditions. One of the major considerations in test scheduling is the fact that heat dissipated during test application is significantly higher than during normal operation (sometimes 100 - 200% higher). Therefore, this is one of the recent major considerations in test scheduling. Test scheduling is strongly related to test concurrency. Test concurrency is a design property which strongly impacts testability and power dissipation. To satisfy high fault coverage goals with reduced test application time under certain power dissipation constraints, the testing of all components on the system should be performed m parallel to the greatest extent possible. Some theoretical analysis of this problem has been carried out, but only at IC level. The problem was basically described as a compatible test clustering, where the compatibility among tests was given by test resource and power dissipation conflicts at the same time. From an implementation point of view this problem was identified as an Non-Polynomial (NP) complete problem In this thesis, an efficient scheme for overlaying the block-tcsts, called the extended tree growing technique, is proposed together with classical scheduling algorithms to search for power-constrained blocktest scheduling (PTS) profiles m a polynomial time Classical algorithms like listbased scheduling and distribution-graph based scheduling arc employed to tackle at high level the PTS problem. This approach exploits test parallelism under power constraints. This is achieved by overlaying the block-tcst intervals of compatible subcircuits to test as many of them as possible concurrently so that the maximum accumulated power dissipation is balanced and does not exceed the given limit. The test scheduling discipline assumed here is the partitioned testing with run to completion. A constant additive model is employed for power dissipation analysis and estimation throughout the algorithm

    MLCAD: A Survey of Research in Machine Learning for CAD Keynote Paper

    Get PDF

    Modelling and Test Generation for Crosstalk Faults in DSM Chips

    Get PDF
    In the era of deep submicron technology (DSM), many System-on-Chip (SoC) applications require the components to be operating at high clock speeds. With the shrinking feature size and ever increasing clock frequencies, the DSM technology has led to a well-known problem of Signal Integrity (SI) more especially in the connecting layout design. The increasing aspect ratios of metal wires and also the ratio of coupling capacitance over substrate capacitance result in electrical coupling of interconnects which leads to crosstalk problems. In this thesis, first the work carried out to model the crosstalk behaviour between aggressor and victim by considering the distributed RLGC parameters of interconnect and the coupling capacitance and mutual conductance between the two nets is presented. The proposed model also considers the RC linear models of the CMOS drivers and receivers. The behaviour of crosstalk in case of under etching problem has been studied and modelled by distributing and approximating the defect behaviour throughout the nets. Next, the proposed model has also been extended to model the behaviour of crosstalk in case of one victim is influenced by several aggressors by considering all aggressors have similar effect (worst-case) on victim. In all the above cases simulation experiments were also carried out and compared with well-known circuit simulation tool PSPICE. It has been proved that the generated crosstalk model is faster and the results generated are within 10% of error margin compared to latter simulation tool. Because of the accuracy and speed of the proposed model, the model is very useful for both SoC designers and test engineers to analyse the crosstalk behaviour. Each manufactured device needs to be tested thoroughly to ensure the functionality before its delivery. The test pattern generation for crosstalk faults is also necessary to test the corresponding crosstalk faults. In this thesis, the well-known PODEM algorithm for stuck-at faults is extended to generate the test patterns for crosstalk faults between single aggressor and single victim. To apply modified PODEM for crosstalk faults, the transition behaviour has been divided into two logic parts as before transition and after transition. After finding individually required test patterns for before transition and after transition, the generated logic vectors are appended to create transition test patterns for crosstalk faults. The developed algorithm is also applied for a few ISCAS 85 benchmark circuits and the fault coverage is found excellent in most circuits. With the incorporation of proposed algorithm into the ATPG tools, the efficiency of testing will be improved by generating the test patterns for crosstalk faults besides for the conventional stuck-at faults. In generating test patterns for crosstalk faults on single victim due to multiple aggressors, the modified PODEM algorithm is found to be more time consuming. The search capability of Genetic Algorithms in finding the required combination of several input factors for any optimized problem fascinated to apply GA for generating test patterns as generating the test pattern is also similar to finding the required vector out of several input transitions. Initially the GA is applied for generating test patterns for stuck-at faults and compared the results with PODEM algorithm. As the fault coverage is almost similar to the deterministic algorithm PODEM, the GA developed for stuck-at faults is extended to find test patterns for crosstalk faults between single aggressor and single victim. The elitist GA is also applied for a few ISCAS 85 benchmark circuits. Later the algorithm is extended to generate test patterns for worst-case crosstalk faults. It has been proved that elitist GA developed in this thesis is also very useful in generating test patterns for crosstalk faults especially for multiple aggressor and single victim crosstalk faults

    Hardware design of cryptographic algorithms for low-cost RFID tags

    Get PDF
    Mención Internacional en el título de doctorRadio Frequency Identification (RFID) is a wireless technology for automatic identification that has experienced a notable growth in the last years. RFID is an important part of the new trend named Internet of Things (IoT), which describes a near future where all the objects are connected to the Internet and can interact between them. The massive deployment of RFID technology depends on device costs and dependability. In order to make these systems dependable, security needs to be added to RFID implementations, as RF communications can be accessed by an attacker who could extract or manipulate private information from the objects. On the other hand, reduced costs usually imply resource-constrained environments. Due to these resource limitations necessary to low-cost implementations, typical cryptographic primitives cannot be used to secure low-cost RFID systems. A new concept emerged due to this necessity, Lightweight Cryptography. This term was used for the first time in 2003 by Vajda et al. and research on this topic has been done widely in the last decade. Several proposals oriented to low-cost RFID systems have been reported in the literature. Many of these proposals do not tackle in a realistic way the multiple restrictions required by the technology or the specifications imposed by the different standards that have arose for these technologies. The objective of this thesis is to contribute in the field of lightweight cryptography oriented to low-cost RFID tags from the microelectronics point of view. First, a study about the implementation of lightweight cryptographic primitives is presented . Specifically, the area used in the implementation, which is one of the most important requirements of the technology as it is directly related to the cost. After this analysis, a footprint area estimator of lightweight algorithms has been developed. This estimator calculates an upper-bound of the area used in the implementation. This estimator will help in making some choices at the algorithmic level, even for designers without hardware design skills. Second, two pseudo-random number generators have been proposed. Pseudorandom number generators are essential cryptographic blocks in RFID systems. According to the most extended RFID standard, EPC Class-1 Gen-2, it is mandatory to include a generator in RFID tags. Several architectures for the two proposed generators have been presented in this thesis and they have been integrated in two authentication protocols, and the main metrics (area, throughput and power consumption) have been analysed. Finally, the topic of True Random Number Generators is studied. These generators are also very important in secure RFID, and are currently a trending research line. A novel generator, presented by Cherkaoui et al., has been evaluated under different attack scenarios. A new true random number generator based on coherent sampling and suitable for low-cost RFID systems has been proposed.La tecnología de Identificación por Radio Frecuencia, más conocida por sus siglas en inglés RFID, se ha convertido en una de las tecnologías de autoidentificación más importantes dentro de la nueva corriente de identificación conocida como Internet de las Cosas (IoT). Esta nueva tendencia describe un futuro donde todos los objetos están conectados a internet y son capaces de identificarse ante otros objetos. La implantación masiva de los sistemas RFID está hoy en día limitada por el coste de los dispositivos y la fiabilidad. Para que este tipo de sistemas sea fiable, es necesario añadir seguridad a las implementaciones RFID, ya que las comunicaciones por radio frecuencia pueden ser fácilmente atacadas y la información sobre objetos comprometida. Por otro lado, para que todos los objetos estén conectados es necesario que el coste de la tecnología de identificación sea muy reducido, lo que significa una gran limitación de recursos en diferentes ámbitos. Dada la limitación de recursos necesaria en implementaciones de bajo coste, las primitivas criptográficas típicas no pueden ser usadas para dotar de seguridad a un sistema RFID de bajo coste. El concepto de primitiva criptográfica ligera fue introducido por primera vez 2003 por Vajda et al. y ha sido desarrollado ampliamente en los últimos años, dando como resultados una serie de algoritmos criptográficos ligeros adecuados para su uso en tecnología RFID de bajo coste. El principal problema de muchos de los algoritmos presentados es que no abordan de forma realista las múltiples limitaciones de la tecnología. El objetivo de esta tesis es el de contribuir en el campo de la criptografía ligera orientada a etiquetas RFID de bajo coste desde el punto de vista de la microelectrónica. En primer lugar se presenta un estudio de la implementación de las primitivas criptográficas ligeras más utilizadas, concretamente analizando el área ocupado por dichas primitivas, ya que es uno de los parámetros críticos considerados a la hora de incluir dichas primitivas criptográficas en los dispositivos RFID de bajo coste. Tras el análisis de estas primitivas se ha desarrollado un estimador de área para algoritmos criptográficos ultraligeros que trata de dar una cota superior del área total ocupada por el algoritmo (incluyendo registros y lógica de control). Este estimador permite al diseñador, en etapas tempranas del diseño y sin tener ningún conocimiento sobre implementaciones, saber si el algoritmo está dentro de los límites de área mpuestos por la tecnología RFID. También se proponen 2 generadores de números pseudo-aleatorios. Estos generadores son uno de los bloques criptográficos más importantes en un sistema RFID. El estándar RFID más extendido entre la industria, EPC Class-1 Gen-2, establece el uso obligatorio de dicho tipo de generadores en las etiquetas RFID. Los generadores propuestos han sido implementados e integrados en 2 protocolos de comunicación orientados a RFID, obteniendo buenos resultados en las principales características del sistema. Por último, se ha estudiado el tema de los generadores de números aleatorios. Este tipo de generadores son frecuentemente usados en seguridad RFID. Actualmente esta línea de investigación es muy popular. En esta tesis, se ha evaluado la seguridad de un novedoso TRNG, presentado por Cherkaoui et al., frente ataques típicos considerados en la literatura. Además, se ha presentado un nuevo TRNG de bajo coste basado en la técnica de muestreo por pares.Programa Oficial de Doctorado en Ingeniería Eléctrica, Electrónica y AutomáticaPresidente: Teresa Riesgo Alcaide.- Secretario: Emilio Olías Ruiz.- Vocal: Giorgio di Natal
    corecore