    Electronic System-Level Synthesis Methodologies

    Multipumping flexible DSP blocks for resource reduction on Xilinx FPGAs

    For complex datapaths, resource sharing can help reduce area consumption. Traditionally, resource sharing is applied when the same resource can be scheduled for different uses in different cycles, often resulting in a longer schedule. Multipumping is a method whereby a resource is clocked at a frequency that is a multiple of the surrounding circuit, thereby offering multiple executions per global clock cycle. This allows a single resource to be shared among multiple uses in the same cycle. This concept maps well to modern field-programmable gate arrays (FPGAs), where hard macro blocks are typically capable of running at higher frequencies than most designs implemented in the logic fabric. While this technique has been demonstrated for static resources, modern digital signal processing (DSP) blocks are flexible, supporting varied operations at runtime. In this paper, we demonstrate multipumping for resource sharing of the flexible DSP48E1 macros in Xilinx FPGAs. We exploit their dynamic programmability to enable resource sharing for the full set of supported DSP block operations, and compare this to multipumping only multipliers and DSP blocks with fixed configurations. The proposed approach saves on average 48% DSP blocks at a cost of 74% more LUTs, effectively saving 30% equivalent LUT area and is feasible for the majority of designs, in which clock frequency is typically below half the maximum supported by the DSP blocks

    Block level voltage

    Over the past years, state-of-art power optimization methods move towards higher abstraction levels that result in more efficient power savings. Among existing power optimization approaches, dynamic power management (DPM) is considered to be one of the most effective strategies. Depending on abstraction levels, DPM can be implemented in different formats but here we focus on scheduling that is more suitable for real-time system design use. This differs from the concurrent scheduling approaches that start from either the HLS (High-Level Synthesis) or RTS (Real-Time System) point of view, we propose a synergy solution of both approaches, namely block-level voltage/frequency scheduling (BLVFS). The presented block-level voltage/ frequency scheduling approach shows a generic solution for low power SoC (System on Chip) system design while the approaches which belong to the HLS and RTS categories have a strong dependency on the system functionalities. Consider a SoC as a combination of heterogeneous functional blocks, our approach provides efficient power savings by dynamically scheduling the scaling of voltage and frequency at the same time. Simulation results indicate that by using heuristic based strategies significant power savings can be achieved

    System-level power optimization:techniques and tools

    This tutorial surveys design methods for energy-efficient system-level design. We consider electronic sytems consisting of a hardware platform and software layers. We consider the three major constituents of hardware that consume energy, namely computation, communication, and storage units, and we review methods of reducing their energy consumption. We also study models for analyzing the energy cost of software, and methods for energy-efficient software design and compilation. This survery is organized around three main phases of a system design: conceptualization and modeling design and implementation, and runtime management. For each phase, we review recent techniques for energy-efficient design of both hardware and software

    Automatic low-cost IP watermarking technique based on output mark insertions

    International audienceToday, although intellectual properties (IP) and their reuse are common, their use is causing design security issues: illegal copying, counterfeiting, and reverse engineering. IP watermarking is an efficient way to detect an unauthorized IP copy or a counterfeit. In this context, many interesting solutions have been proposed. However, few combine the watermarking process with synthesis. This article presents a new solution, i.e. automatic low cost IP watermarking included in the high-level synthesis process. The proposed method differs from those cited in the literature as the marking is not material, but is based on mathematical relationships between numeric values as inputs and outputs at specified times. Some implementation results with Xilinx Virtex-5 FPGA that the proposed solution required a lower area and timing overhead than existing solutions

    Parallelization of dynamic programming recurrences in computational biology

    The rapid growth of biosequence databases over the last decade has led to a performance bottleneck in the applications analyzing them. In particular, over the last five years DNA sequencing capacity of next-generation sequencers has been doubling every six months as costs have plummeted. The data produced by these sequencers is overwhelming traditional compute systems. We believe that in the future compute performance, not sequencing, will become the bottleneck in advancing genome science. In this work, we investigate novel computing platforms to accelerate dynamic programming algorithms, which are popular in bioinformatics workloads. We study algorithm-specific hardware architectures that exploit fine-grained parallelism in dynamic programming kernels using field-programmable gate arrays: FPGAs). We advocate a high-level synthesis approach, using the recurrence equation abstraction to represent dynamic programming and polyhedral analysis to exploit parallelism. We suggest a novel technique within the polyhedral model to optimize for throughput by pipelining independent computations on an array. This design technique improves on the state of the art, which builds latency-optimal arrays. We also suggest a method to dynamically switch between a family of designs using FPGA reconfiguration to achieve a significant performance boost. We have used polyhedral methods to parallelize the Nussinov RNA folding algorithm to build a family of accelerators that can trade resources for parallelism and are between 15-130x faster than a modern dual core CPU implementation. A Zuker RNA folding accelerator we built on a single workstation with four Xilinx Virtex 4 FPGAs outperforms 198 3 GHz Intel Core 2 Duo processors. Furthermore, our design running on a single FPGA is an order of magnitude faster than competing implementations on similar-generation FPGAs and graphics processors. Our work is a step toward the goal of automated synthesis of hardware accelerators for dynamic programming algorithms

    Design Techniques for Energy-Quality Scalable Digital Systems

    Energy efficiency is one of the key design goals in modern computing. Increasingly complex tasks are being executed in mobile devices and Internet of Things end-nodes, which are expected to operate for long time intervals, in the orders of months or years, with the limited energy budgets provided by small form-factor batteries. Fortunately, many of such tasks are error resilient, meaning that they can toler- ate some relaxation in the accuracy, precision or reliability of internal operations, without a significant impact on the overall output quality. The error resilience of an application may derive from a number of factors. The processing of analog sensor inputs measuring quantities from the physical world may not always require maximum precision, as the amount of information that can be extracted is limited by the presence of external noise. Outputs destined for human consumption may also contain small or occasional errors, thanks to the limited capabilities of our vision and hearing systems. Finally, some computational patterns commonly found in domains such as statistics, machine learning and operational research, naturally tend to reduce or eliminate errors. Energy-Quality (EQ) scalable digital systems systematically trade off the quality of computations with energy efficiency, by relaxing the precision, the accuracy, or the reliability of internal software and hardware components in exchange for energy reductions. This design paradigm is believed to offer one of the most promising solutions to the impelling need for low-energy computing. Despite these high expectations, the current state-of-the-art in EQ scalable design suffers from important shortcomings. First, the great majority of techniques proposed in literature focus only on processing hardware and software components. Nonetheless, for many real devices, processing contributes only to a small portion of the total energy consumption, which is dominated by other components (e.g. I/O, memory or data transfers). Second, in order to fulfill its promises and become diffused in commercial devices, EQ scalable design needs to achieve industrial level maturity. This involves moving from purely academic research based on high-level models and theoretical assumptions to engineered flows compatible with existing industry standards. Third, the time-varying nature of error tolerance, both among different applications and within a single task, should become more central in the proposed design methods. This involves designing “dynamic” systems in which the precision or reliability of operations (and consequently their energy consumption) can be dynamically tuned at runtime, rather than “static” solutions, in which the output quality is fixed at design-time. This thesis introduces several new EQ scalable design techniques for digital systems that take the previous observations into account. Besides processing, the proposed methods apply the principles of EQ scalable design also to interconnects and peripherals, which are often relevant contributors to the total energy in sensor nodes and mobile systems respectively. Regardless of the target component, the presented techniques pay special attention to the accurate evaluation of benefits and overheads deriving from EQ scalability, using industrial-level models, and on the integration with existing standard tools and protocols. Moreover, all the works presented in this thesis allow the dynamic reconfiguration of output quality and energy consumption. More specifically, the contribution of this thesis is divided in three parts. In a first body of work, the design of EQ scalable modules for processing hardware data paths is considered. Three design flows are presented, targeting different technologies and exploiting different ways to achieve EQ scalability, i.e. timing-induced errors and precision reduction. These works are inspired by previous approaches from the literature, namely Reduced-Precision Redundancy and Dynamic Accuracy Scaling, which are re-thought to make them compatible with standard Electronic Design Automation (EDA) tools and flows, providing solutions to overcome their main limitations. The second part of the thesis investigates the application of EQ scalable design to serial interconnects, which are the de facto standard for data exchanges between processing hardware and sensors. In this context, two novel bus encodings are proposed, called Approximate Differential Encoding and Serial-T0, that exploit the statistical characteristics of data produced by sensors to reduce the energy consumption on the bus at the cost of controlled data approximations. The two techniques achieve different results for data of different origins, but share the common features of allowing runtime reconfiguration of the allowed error and being compatible with standard serial bus protocols. Finally, the last part of the manuscript is devoted to the application of EQ scalable design principles to displays, which are often among the most energy- hungry components in mobile systems. The two proposals in this context leverage the emissive nature of Organic Light-Emitting Diode (OLED) displays to save energy by altering the displayed image, thus inducing an output quality reduction that depends on the amount of such alteration. The first technique implements an image-adaptive form of brightness scaling, whose outputs are optimized in terms of balance between power consumption and similarity with the input. The second approach achieves concurrent power reduction and image enhancement, by means of an adaptive polynomial transformation. Both solutions focus on minimizing the overheads associated with a real-time implementation of the transformations in software or hardware, so that these do not offset the savings in the display. For each of these three topics, results show that the aforementioned goal of building EQ scalable systems compatible with existing best practices and mature for being integrated in commercial devices can be effectively achieved. Moreover, they also show that very simple and similar principles can be applied to design EQ scalable versions of different system components (processing, peripherals and I/O), and to equip these components with knobs for the runtime reconfiguration of the energy versus quality tradeoff

    Automated and dynamic multi-level negotiation framework applied to an efficient cloud provisioning

    L’approvisionnement du Cloud est le processus de dĂ©ploiement et de gestion des applications sur les infrastructures publiques du Cloud. Il est de plus en plus utilisĂ© car il permet aux fournisseurs de services mĂ©tiers de se concentrer sur leurs activitĂ©s sans avoir Ă  gĂ©rer et Ă  investir dans l’infrastructure. Il comprend deux niveaux d’interaction : (1) entre les utilisateurs finaux et les fournisseurs de services pour l’approvisionnement des applications, et (2) entre les fournisseurs de services et les fournisseurs de ressources pour l’approvisionnement des ressources virtuelles. L’environnement Cloud est devenu un marchĂ© complexe oĂč tout fournisseur veut maximiser son profit monĂ©taire et oĂč les utilisateurs finaux recherchent les services les plus efficaces tout en minimisant leurs coĂ»ts. Avec la croissance de la concurrence dans le Cloud, les fournisseurs de services mĂ©tiers doivent assurer un approvisionnement efficace qui maximise la satisfaction de la clientĂšle et optimise leurs profits.Ainsi, les fournisseurs et les utilisateurs doivent ĂȘtre satisfaits en dĂ©pit de leurs besoins contradictoires. La nĂ©gociation est une solution prometteuse qui permet de rĂ©soudre les conflits en comblant le gap entre les capacitĂ©s des fournisseurs et les besoins des utilisateurs. Intuitivement, la nĂ©gociation automatique des contrats (SLA) permet d’aboutir Ă  un compromis qui satisfait les deux parties. Cependant, pour ĂȘtre efficace, la nĂ©gociation automatique doit considĂ©rer les propriĂ©tĂ©s de l’approvisionnement du Cloud et les complexitĂ©s liĂ©es Ă  la dynamicitĂ© (dynamicitĂ© de la disponibilitĂ© des ressources, dynamicitĂ© des prix). En fait ces critĂšres ont un impact important sur le succĂšs de la nĂ©gociation. Les principales contributions de cette thĂšse rĂ©pondant au dĂ©fi de la nĂ©gociation multi-niveau dans un contexte dynamique sont les suivantes: (1) Nous proposons un modĂšle de nĂ©gociateur gĂ©nĂ©rique qui considĂšre la nature dynamique de l’approvisionnement du Cloud et son impact potentiel sur les rĂ©sultats dĂ©cisionnels. Ensuite, nous construisons un cadre de nĂ©gociation multicouche fondĂ© sur ce modĂšle en l’instanciant entre les couches du Cloud. Le cadre comprend des agents nĂ©gociateurs en communication avec les modules en relation avec la qualitĂ© et le prix du service Ă  fournir (le planificateur, le moniteur, le prospecteur de marchĂ©). (2) Nous proposons une approche de nĂ©gociation bilatĂ©rale entre les utilisateurs finaux et les fournisseurs de service basĂ©e sur une approche d’approvisionnement existante. Les stratĂ©gies de nĂ©gociation sont basĂ©es sur la communication avec les modules d’approvisionnement (le planificateur et l’approvisionneur de machines virtuelles) afin d’optimiser les bĂ©nĂ©fices du fournisseur de service et de maximiser la satisfaction du client. (3) Afin de maximiser le nombre de clients, nous proposons une approche de nĂ©gociation adaptative et simultanĂ©e comme extension de la nĂ©gociation bilatĂ©rale. Nous proposons d’exploiter les changements de charge de travail en termes de disponibilitĂ© et de tarification des ressources afin de renĂ©gocier simultanĂ©ment avec plusieurs utilisateurs non acceptĂ©s (c’est-Ă -dire rejetĂ©s lors de la premiĂšre session de nĂ©gociation) avant la crĂ©ation du contrat SLA. (4) Afin de gĂ©rer toute violation possible de SLA, nous proposons une approche proactive de renĂ©gociation aprĂšs l’établissement de SLA. La renĂ©gociation est lancĂ©e lors de la dĂ©tection d’un Ă©vĂ©nement inattendu (par exemple, une panne de ressources) pendant le processus d’approvisionnement. Les stratĂ©gies de renĂ©gociation proposĂ©es visent Ă  minimiser la perte de profit pour le fournisseur et Ă  assurer la continuitĂ© du service pour le consommateur. Les approches proposĂ©es sont mises en Ɠuvre et les expĂ©riences prouvent les avantages d’ajouter la (re)nĂ©gociation au processus d’approvisionnement. L’utilisation de la (re)nĂ©gociation amĂ©liore le bĂ©nĂ©fice du fournisseur, le nombre de demandes acceptĂ©es et la satisfaction du client.Cloud provisioning is the process of deployment and management of applications on public cloud infrastructures. Cloud provisioning is used increasingly because it enables business providers to focus on their business without having to manage and invest in infrastructure. Cloud provisioning includes two levels of interaction: (1) between end-users and business providers for application provisioning; and (2) between business providers and resource providers for virtual resource provisioning.The cloud market nowadays is a complex environment where business providers need to maximize their monetary profit, and where end-users look for the most efficient services with the lowest prices. With the growth of competition in the cloud, business providers must ensure efficient provisioning that maximizes customer satisfaction and optimizes the providers’ profit. So, both providers and users must be satisfied in spite of their conflicting needs. Negotiation is an appealing solution to solve conflicts and bridge the gap between providers’ capabilities and users’ requirements. Intuitively, automated Service Level Agreement (SLA) negotiation helps in reaching an agreement that satisfies both parties. However, to be efficient, automated negotiation should consider the properties of cloud provisioning mainly the two interaction levels, and complexities related to dynamicity (e.g., dynamically-changing resource availability, dynamic pricing, dynamic market factors related to offers and demands), which greatly impact the success of the negotiation. The main contributions of this thesis tackling the challenge of multi-level negotiation in a dynamic context are as follows: (1) We propose a generic negotiator model that considers the dynamic nature of cloud provisioning and its potential impact on the decision-making outcome. Then, we build a multi-layer negotiation framework built upon that model by instantiating it among Cloud layers. The framework includes negotiator agents. These agents are in communication with the provisioning modules that have an impact on the quality and the price of the service to be provisioned (e.g, the scheduler, the monitor, the market prospector). (2) We propose a bilateral negotiation approach between end-users and business providers extending an existing provisioning approach. The proposed decision-making strategies for negotiation are based on communication with the provisioning modules (the scheduler and the VM provisioner) in order to optimize the business provider’s profit and maximize customer satisfaction. (3) In order to maximize the number of clients, we propose an adaptive and concurrent negotiation approach as an extension of the bilateral negotiation. We propose to harness the workload changes in terms of resource availability and pricing in order to renegotiate simultaneously with multiple non-accepted users (i.e., rejected during the first negotiation session) before the establishment of the SLA. (4) In order to handle any potential SLA violation, we propose a proactive renegotiation approach after SLA establishment. The renegotiation is launched upon detecting an unexpected event (e.g., resource failure) during the provisioning process. The proposed renegotiation decision-making strategies aim to minimize the loss in profit for the provider and to ensure the continuity of the service for the consumer. The proposed approaches are implemented and experiments prove the benefits of adding (re)negotiation to the provisioning process. The use of (re)negotiation improves the provider’s profit, the number of accepted requests, and the client’s satisfaction

    Enhancing Power Efficient Design Techniques in Deep Submicron Era

    Excessive power dissipation has been one of the major bottlenecks for design and manufacture in the past couple of decades. Power efficient design has become more and more challenging when technology scales down to the deep submicron era that features the dominance of leakage, the manufacture variation, the on-chip temperature variation and higher reliability requirements, among others. Most of the computer aided design (CAD) tools and algorithms currently used in industry were developed in the pre deep submicron era and did not consider the new features explicitly and adequately. Recent research advances in deep submicron design, such as the mechanisms of leakage, the source and characterization of manufacture variation, the cause and models of on-chip temperature variation, provide us the opportunity to incorporate these important issues in power efficient design. We explore this opportunity in this dissertation by demonstrating that significant power reduction can be achieved with only minor modification to the existing CAD tools and algorithms. First, we consider peak current, which has become critical for circuit's reliability in deep submicron design. Traditional low power design techniques focus on the reduction of average power. We propose to reduce peak current while keeping the overhead on average power as small as possible. Second, dual Vt technique and gate sizing have been used simultaneously for leakage savings. However, this approach becomes less effective in deep submicron design. We propose to use the newly developed process-induced mechanical stress to enhance its performance. Finally, in deep submicron design, the impact of on-chip temperature variation on leakage and performance becomes more and more significant. We propose a temperature-aware dual Vt approach to alleviate hot spots and achieve further leakage reduction. We also consider this leakage-temperature dependency in the dynamic voltage scaling approach and discover that a commonly accepted result is incorrect for the current technology. We conduct extensive experiments with popular design benchmarks, using the latest industry CAD tools and design libraries. The results show that our proposed enhancements are promising in power saving and are practical to solve the low power design challenges in deep submicron era

    Fault-Tolerant Distributed Deployment of Embedded Control Software

