17 research outputs found

    Architectural Exploration of KeyRing Self-Timed Processors

    Get PDF
    RÉSUMÉ Les dernières décennies ont vu l’augmentation des performances des processeurs contraintes par les limites imposées par la consommation d’énergie des systèmes électroniques : des très basses consommations requises pour les objets connectés, aux budgets de dépenses électriques des serveurs, en passant par les limitations thermiques et la durée de vie des batteries des appareils mobiles. Cette forte demande en processeurs efficients en énergie, couplée avec les limitations de la réduction d’échelle des transistors—qui ne permet plus d’améliorer les performances à densité de puissance constante—, conduit les concepteurs de circuits intégrés à explorer de nouvelles microarchitectures permettant d’obtenir de meilleures performances pour un budget énergétique donné. Cette thèse s’inscrit dans cette tendance en proposant une nouvelle microarchitecture de processeur, appelée KeyRing, conçue avec l’intention de réduire la consommation d’énergie des processeurs. La fréquence d’opération des transistors dans les circuits intégrés est proportionnelle à leur consommation dynamique d’énergie. Par conséquent, les techniques de conception permettant de réduire dynamiquement le nombre de transistors en opération sont très largement adoptées pour améliorer l’efficience énergétique des processeurs. La technique de clock-gating est particulièrement usitée dans les circuits synchrones, car elle réduit l’impact de l’horloge globale, qui est la principale source d’activité. La microarchitecture KeyRing présentée dans cette thèse utilise une méthode de synchronisation décentralisée et asynchrone pour réduire l’activité des circuits. Elle est dérivée du processeur AnARM, un processeur développé par Octasic sur la base d’une microarchitecture asynchrone ad hoc. Bien qu’il soit plus efficient en énergie que des alternatives synchrones, le AnARM est essentiellement incompatible avec les méthodes de synthèse et d’analyse temporelle statique standards. De plus, sa technique de conception ad hoc ne s’inscrit que partiellement dans les paradigmes de conceptions asynchrones. Cette thèse propose une approche rigoureuse pour définir les principes généraux de cette technique de conception ad hoc, en faisant levier sur la littérature asynchrone. La microarchitecture KeyRing qui en résulte est développée en association avec une méthode de conception automatisée, qui permet de s’affranchir des incompatibilités natives existant entre les outils de conception et les systèmes asynchrones. La méthode proposée permet de pleinement mettre à profit les flots de conception standards de l’industrie microélectronique pour réaliser la synthèse et la vérification des circuits KeyRing. Cette thèse propose également des protocoles expérimentaux, dont le but est de renforcer la relation de causalité entre la microarchitecture KeyRing et une réduction de la consommation énergétique des processeurs, comparativement à des alternatives synchrones équivalentes.----------ABSTRACT Over the last years, microprocessors have had to increase their performances while keeping their power envelope within tight bounds, as dictated by the needs of various markets: from the ultra-low power requirements of the IoT, to the electrical power consumption budget in enterprise servers, by way of passive cooling and day-long battery life in mobile devices. This high demand for power-efficient processors, coupled with the limitations of technology scaling—which no longer provides improved performances at constant power densities—, is leading designers to explore new microarchitectures with the goal of pulling more performances out of a fixed power budget. This work enters into this trend by proposing a new processor microarchitecture, called KeyRing, having a low-power design intent. The switching activity of integrated circuits—i.e. transistors switching on and off—directly affects their dynamic power consumption. Circuit-level design techniques such as clock-gating are widely adopted as they dramatically reduce the impact of the global clock in synchronous circuits, which constitutes the main source of switching activity. The KeyRing microarchitecture presented in this work uses an asynchronous clocking scheme that relies on decentralized synchronization mechanisms to reduce the switching activity of circuits. It is derived from the AnARM, a power-efficient ARM processor developed by Octasic using an ad hoc asynchronous microarchitecture. Although it delivers better power-efficiency than synchronous alternatives, it is for the most part incompatible with standard timing-driven synthesis and Static Timing Analysis (STA). In addition, its design style does not fit well within the existing asynchronous design paradigms. This work lays the foundations for a more rigorous definition of this rather unorthodox design style, using circuits and methods coming from the asynchronous literature. The resulting KeyRing microarchitecture is developed in combination with Electronic Design Automation (EDA) methods that alleviate incompatibility issues related to ad hoc clocking, enabling timing-driven optimizations and verifications of KeyRing circuits using industry-standard design flows. In addition to bridging the gap with standard design practices, this work also proposes comprehensive experimental protocols that aims to strengthen the causal relation between the reported asynchronous microarchitecture and a reduced power consumption compared with synchronous alternatives. The main achievement of this work is a framework that enables the architectural exploration of circuits using the KeyRing microarchitecture

    New FPGA design tools and architectures

    Get PDF

    A Co-Processor Approach for Efficient Java Execution in Embedded Systems

    Get PDF
    This thesis deals with a hardware accelerated Java virtual machine, named REALJava. The REALJava virtual machine is targeted for resource constrained embedded systems. The goal is to attain increased computational performance with reduced power consumption. While these objectives are often seen as trade-offs, in this context both of them can be attained simultaneously by using dedicated hardware. The target level of the computational performance of the REALJava virtual machine is initially set to be as fast as the currently available full custom ASIC Java processors. As a secondary goal all of the components of the virtual machine are designed so that the resulting system can be scaled to support multiple co-processor cores. The virtual machine is designed using the hardware/software co-design paradigm. The partitioning between the two domains is flexible, allowing customizations to the resulting system, for instance the floating point support can be omitted from the hardware in order to decrease the size of the co-processor core. The communication between the hardware and the software domains is encapsulated into modules. This allows the REALJava virtual machine to be easily integrated into any system, simply by redesigning the communication modules. Besides the virtual machine and the related co-processor architecture, several performance enhancing techniques are presented. These include techniques related to instruction folding, stack handling, method invocation, constant loading and control in time domain. The REALJava virtual machine is prototyped using three different FPGA platforms. The original pipeline structure is modified to suit the FPGA environment. The performance of the resulting Java virtual machine is evaluated against existing Java solutions in the embedded systems field. The results show that the goals are attained, both in terms of computational performance and power consumption. Especially the computational performance is evaluated thoroughly, and the results show that the REALJava is more than twice as fast as the fastest full custom ASIC Java processor. In addition to standard Java virtual machine benchmarks, several new Java applications are designed to both verify the results and broaden the spectrum of the tests.Siirretty Doriast

    A Verified Software Toolchain for Quantum Programming

    Get PDF
    Quantum computing is steadily moving from theory into practice, with small-scale quantum computers available for public use. Now quantum programmers are faced with a classical problem: How can they be sure that their code does what they intend it to do? I aim to show that techniques for classical program verification can be adapted to the quantum setting, allowing for the development of high-assurance quantum software, without sacrificing performance or programmability. In support of this thesis, I present several results in the application of formal methods to the domain of quantum programming, aiming to provide a high-assurance software toolchain for quantum programming. I begin by presenting SQIR, a small quantum intermediate representation deeply embedded in the Coq proof assistant, which has been used to implement and prove correct quantum algorithms such as Grover’s search and Shor’s factorization algorithm. Next, I present VOQC, a verified optimizer for quantum circuits that contains state-of-the-art SQIR program optimizations with performance on par with unverified tools. I additionally discuss VQO, a framework for specifying and verifying oracle programs, which can then be optimized with VOQC. Finally, I present exploratory work on providing high assurance for a high-level industry quantum programming language, Q#, in the F* proof assistant

    Interpreted graph models

    Get PDF
    A model class called an Interpreted Graph Model (IGM) is defined. This class includes a large number of graph-based models that are used in asynchronous circuit design and other applications of concurrecy. The defining characteristic of this model class is an underlying static graph-like structure where behavioural semantics are attached using additional entities, such as tokens or node/arc states. The similarities in notation and expressive power allow a number of operations on these formalisms, such as visualisation, interactive simulation, serialisation, schematic entry and model conversion to be generalised. A software framework called Workcraft was developed to take advantage of these properties of IGMs. Workcraft provides an environment for rapid prototyping of graph-like models and related tools. It provides a large set of standardised functions that considerably facilitate the task of providing tool support for any IGM. The concept of Interpreted Graph Models is the result of research on methods of application of lower level models, such as Petri nets, as a back-end for simulation and verification of higher level models that are more easily manipulated. The goal is to achieve a high degree of automation of this process. In particular, a method for verification of speed-independence of asynchronous circuits is presented. Using this method, the circuit is specified as a gate netlist and its environment is specified as a Signal Transition Graph. The circuit is then automatically translated into a behaviourally equivalent Petri net model. This model is then composed with the specification of the environment. A number of important properties can be established on this compound model, such as the absence of deadlocks and hazards. If a trace is found that violates the required property, it is automatically interpreted in terms of switching of the gates in the original gate-level circuit specification and may be presented visually to the circuit designer. A similar technique is also used for the verification of a model called Static Data Flow Structure (SDFS). This high level model describes the behaviour of an asynchronous data path. SDFS is particularly interesting because it models complex behaviours such as preemption, early evaluation and speculation. Preemption is a technique which allows to destroy data objects in a computation pipeline if the result of computation is no longer needed, reducing the power consumption. Early evaluation allows a circuit to compute the output using a subset of its inputs and preempting the inputs which are not needed. In speculation, all conflicting branches of computation run concurrently without waiting for the selecting condition; once the selecting condition is computed the unneeded branches are preempted. The automated Petri net based verification technique is especially useful in this case because of the complex nature of these features. As a result of this work, a number of cases are presented where the concept of IGMs and the Workcraft tool were instrumental. These include the design of two different types of arbiter circuits, the design and debugging of the SDFS model, synthesis of asynchronous circuits from the Conditional Partial Order Graph model and the modification of the workflow of Balsa asynchronous circuit synthesis system.EThOS - Electronic Theses Online ServiceEPSRCGBUnited Kingdo

    A Behavioral Design Flow for Synthesis and Optimization of Asynchronous Systems

    Get PDF
    Asynchronous or clockless design is believed to hold the promise of alleviating many of the challenges currently facing microelectronic design. Distributing a high-speed clock signal across an entire chip is an increasing challenge, particularly as the number of transistors on chip continues to rise. With increasing heterogeneity in massively multi- core processors, the top-level system integration is already elastic in nature. Future computing technologies (e.g., nano, quantum, etc.) are expected to have unpredictable timing as well. Therefore, asynchronous design techniques are gaining relevance in mainstream design. Unfortunately, the field of asynchronous design lacks mature design tools for creating large-scale, high-performance or energy-efficient systems. This thesis attempts to fill the void by contributing a set of design methods and automated tools for synthesizing asynchronous systems from high-level specifications. In particular, this thesis provides methods and tools for: (i) generating high-speed pipelined implementations from behavioral specifications, (ii) sharing and scheduling resources to conserve area while providing high performance, and (iii) incorporating energy and power considerations into high-level design. These methods are incorporated into a comprehensive design flow that provides a choice of synthesis paths to the designer, and a mechanism to explore the spectrum between them. The first path specifically targets the highest-performance implementations using data-driven pipelined circuits. The second path provides an alternative approach that targets low-area implementations, providing for optimal resource sharing and optimal scheduling techniques to achieve performance targets. Finally, the third path through the design flow allows the entire spectrum between the two extremes to be explored. In particular, it is a hybrid approach that preserves a pipelined architecture but still allows sharing of resources. By varying performance targets, a wide range of designs can be realized. A variety of metrics are incorporated as constraints or cost functions: area, latency, cycle time, energy consumption, and peak power. Experimental results demonstrate the capability of the proposed design flow to quickly produce optimized specifications. By automating synthesis and optimization, this thesis shows that the designer effort necessary to produce a high-quality solution can be significantly reduced. It is hoped that this work provides a path towards more mature automation and design tools for asynchronous design
    corecore