23 research outputs found

    Performance Tuning of Streaming Applications via Search-space Decomposition

    Get PDF
    High-performance streaming applications are typically pipelined and deployed on architecturally diverse (hybrid)systems. Developers of such applications are interested in customizing components used, so as to benefit application performance. We present an efficient and automatic technique for design-space exploration of applications in this problem domain. We solve performance tuning as an optimization problem by formulating cost functions using results from queueing theory. This results in a mixed-integer nonlinear optimization problem which is NP-hard. We reduce the search complexity by decomposing the search space. We have developed a domain-specific decomposition technique using topological information of the application embodied in the queueing network models. Our analysis includes when our decomposition preserves optimality. Our preliminary empirical results confirm two-fold benefits--solving a problem that is currently not solvable using state-of-the-art solvers and in some problem instances, improving initial solution value from the solver by over two orders of magnitude

    Robust low-power digital circuit design in nano-CMOS technologies

    Get PDF
    Device scaling has resulted in large scale integrated, high performance, low-power, and low cost systems. However the move towards sub-100 nm technology nodes has increased variability in device characteristics due to large process variations. Variability has severe implications on digital circuit design by causing timing uncertainties in combinational circuits, degrading yield and reliability of memory elements, and increasing power density due to slow scaling of supply voltage. Conventional design methods add large pessimistic safety margins to mitigate increased variability, however, they incur large power and performance loss as the combination of worst cases occurs very rarely. In-situ monitoring of timing failures provides an opportunity to dynamically tune safety margins in proportion to on-chip variability that can significantly minimize power and performance losses. We demonstrated by simulations two delay sensor designs to detect timing failures in advance that can be coupled with different compensation techniques such as voltage scaling, body biasing, or frequency scaling to avoid actual timing failures. Our simulation results using 45 nm and 32 nm technology BSIM4 models indicate significant reduction in total power consumption under temperature and statistical variations. Future work involves using dual sensing to avoid useless voltage scaling that incurs a speed loss. SRAM cache is the first victim of increased process variations that requires handcrafted design to meet area, power, and performance requirements. We have proposed novel 6 transistors (6T), 7 transistors (7T), and 8 transistors (8T)-SRAM cells that enable variability tolerant and low-power SRAM cache designs. Increased sense-amplifier offset voltage due to device mismatch arising from high variability increases delay and power consumption of SRAM design. We have proposed two novel design techniques to reduce offset voltage dependent delays providing a high speed low-power SRAM design. Increasing leakage currents in nano-CMOS technologies pose a major challenge to a low-power reliable design. We have investigated novel segmented supply voltage architecture to reduce leakage power of the SRAM caches since they occupy bulk of the total chip area and power. Future work involves developing leakage reduction methods for the combination logic designs including SRAM peripherals

    Architecting a One-to-many Traffic-Aware and Secure Millimeter-Wave Wireless Network-in-Package Interconnect for Multichip Systems

    Get PDF
    With the aggressive scaling of device geometries, the yield of complex Multi Core Single Chip(MCSC) systems with many cores will decrease due to the higher probability of manufacturing defects especially, in dies with a large area. Disintegration of large System-on-Chips(SoCs) into smaller chips called chiplets has shown to improve the yield and cost of complex systems. Therefore, platform-based computing modules such as embedded systems and micro-servers have already adopted Multi Core Multi Chip (MCMC) architectures overMCSC architectures. Due to the scaling of memory intensive parallel applications in such systems, data is more likely to be shared among various cores residing in different chips resulting in a significant increase in chip-to-chip traffic, especially one-to-many traffic. This one-to-many traffic is originated mainly to maintain cache-coherence between many cores residing in multiple chips. Besides, one-to-many traffics are also exploited by many parallel programming models, system-level synchronization mechanisms, and control signals. How-ever, state-of-the-art Network-on-Chip (NoC)-based wired interconnection architectures do not provide enough support as they handle such one-to-many traffic as multiple unicast trafficusing a multi-hop MCMC communication fabric. As a result, even a small portion of such one-to-many traffic can significantly reduce system performance as traditional NoC-basedinterconnect cannot mask the high latency and energy consumption caused by chip-to-chipwired I/Os. Moreover, with the increase in memory intensive applications and scaling of MCMC systems, traditional NoC-based wired interconnects fail to provide a scalable inter-connection solution required to support the increased cache-coherence and synchronization generated one-to-many traffic in future MCMC-based High-Performance Computing (HPC) nodes. Therefore, these computation and memory intensive MCMC systems need an energy-efficient, low latency, and scalable one-to-many (broadcast/multicast) traffic-aware interconnection infrastructure to ensure high-performance. Research in recent years has shown that Wireless Network-in-Package (WiNiP) architectures with CMOS compatible Millimeter-Wave (mm-wave) transceivers can provide a scalable, low latency, and energy-efficient interconnect solution for on and off-chip communication. In this dissertation, a one-to-many traffic-aware WiNiP interconnection architecture with a starvation-free hybrid Medium Access Control (MAC), an asymmetric topology, and a novel flow control has been proposed. The different components of the proposed architecture are individually one-to-many traffic-aware and as a system, they collaborate with each other to provide required support for one-to-many traffic communication in a MCMC environment. It has been shown that such interconnection architecture can reduce energy consumption and average packet latency by 46.96% and 47.08% respectively for MCMC systems. Despite providing performance enhancements, wireless channel, being an unguided medium, is vulnerable to various security attacks such as jamming induced Denial-of-Service (DoS), eavesdropping, and spoofing. Further, to minimize the time-to-market and design costs, modern SoCs often use Third Party IPs (3PIPs) from untrusted organizations. An adversary either at the foundry or at the 3PIP design house can introduce a malicious circuitry, to jeopardize an SoC. Such malicious circuitry is known as a Hardware Trojan (HT). An HTplanted in the WiNiP from a vulnerable design or manufacturing process can compromise a Wireless Interface (WI) to enable illegitimate transmission through the infected WI resulting in a potential DoS attack for other WIs in the MCMC system. Moreover, HTs can be used for various other malicious purposes, including battery exhaustion, functionality subversion, and information leakage. This information when leaked to a malicious external attackercan reveals important information regarding the application suites running on the system, thereby compromising the user profile. To address persistent jamming-based DoS attack in WiNiP, in this dissertation, a secure WiNiP interconnection architecture for MCMC systems has been proposed that re-uses the one-to-many traffic-aware MAC and existing Design for Testability (DFT) hardware along with Machine Learning (ML) approach. Furthermore, a novel Simulated Annealing (SA)-based routing obfuscation mechanism was also proposed toprotect against an HT-assisted novel traffic analysis attack. Simulation results show that,the ML classifiers can achieve an accuracy of 99.87% for DoS attack detection while SA-basedrouting obfuscation could reduce application detection accuracy to only 15% for HT-assistedtraffic analysis attack and hence, secure the WiNiP fabric from age-old and emerging attacks

    Biomaterials for bone regeneration

    Get PDF
    The purpose of this Ph.D. research is to investigate and improve two classes of hydroxyapatite (HA)-based biomaterials for bone repair: calcium phosphate microspheres and bioactive silicate glass scaffolds. These biomaterials were prepared with modified compositions and microstructures and then were evaluated for bone regeneration. The open HA microspheres with dense convex surfaces and rough and porous concave surfaces were obtained by sectioning closed hollow HA microspheres. Bone regeneration with the open HA microspheres was greater than with the closed HA microsphere at 12 weeks. Hollow biphasic calcium phosphate (BCP) microspheres have been prepared with different fractions of HA and β-TCP (tricalcium phosphate) and their in vitro and in vivo reactivities determined. The BCP microspheres with higher ß-TCP/HA ratio (70/30) had faster degradation rates both in vitro and in vivo and a better capacity to regenerate bone. Moreover, the more reactive BCP microspheres were associated with significantly more blood vessel formation in the subcutaneous implants. 13-93 glass scaffolds with curved filaments stimulated a greater amount of new bone formation than straight filament scaffolds in rat calvarial defect at six weeks. Scaffolds with thin (6 ± 1 μm) HA-like surface layers were more effective at stimulating new bone formation, with the curved-filament structures again showing significant improvement in new bone growth compared to the surface-modified straight-filament structures --Abstract, page iv

    Méthodologies de conception ASIC pour des systèmes sur puce 3D hétérogènes à base de réseaux sur puce 3D

    Get PDF
    Dans cette thèse, nous étudions les architectures 3D NoC grâce à des implémentations de conception physiques en utilisant la technologie 3D réel mis en oeuvre dans l'industrie. Sur la base des listes d'interconnexions en déroute, nous procédons à l'analyse des performances d'évaluer le bénéfice de l'architecture 3D par rapport à sa mise en oeuvre 2D. Sur la base du flot de conception 3D proposé en se concentrant sur la vérification temporelle tirant parti de l'avantage du retard négligeable de la structure de microbilles pour les connexions verticales, nous avons mené techniques de partitionnement de NoC 3D basé sur l'architecture MPSoC y compris empilement homogène et hétérogène en utilisant Tezzaron 3D IC technlogy. Conception et mise en oeuvre de compromis dans les deux méthodes de partitionnement est étudiée pour avoir un meilleur aperçu sur l'architecture 3D de sorte qu'il peut être exploitée pour des performances optimales. En utilisant l'approche 3D homogène empilage, NoC topologies est explorée afin d'identifier la meilleure topologie entre la topologie 2D et 3D pour la mise en œuvre MPSoC 3D sous l'hypothèse que les chemins critiques est fondée sur les liens inter-routeur. Les explorations architecturales ont également examiné les différentes technologies de traitement. mettant en évidence l'effet de la technologie des procédés à la performance d'architecture 3D en particulier pour l'interconnexion dominant du design. En outre, nous avons effectué hétérogène 3D d'empilage pour la mise en oeuvre MPSoC avec l'approche GALS de style et présenté plusieurs analyses de conception physiques connexes concernant la conception 3D et la mise en œuvre MPSoC utilisant des outils de CAO 2D. Une analyse plus approfondie de l'effet microbilles pas à la performance de l'architecture 3D à l'aide face-à-face d'empilement est également signalé l'identification des problèmes et des limitations à prendre en considération pendant le processus de conception.In this thesis, we study the exploration 3D NoC architectures through physical design implementations using real 3D technology used in the industry. Based on the proposed 3D design flow focusing on timing verification by leveraging the benefit of negligible delay of microbumps structure for vertical connections, we have conducted partitioning techniques for 3D NoC-based MPSoC architecture including homogeneous and heterogeneous stacking using Tezzaron 3D IC technlogy. Design and implementation trade-off in both partitioning methods is investigated to have better insight about 3D architecture so that it can be exploited for optimal performance. Using homogeneous 3D stacking approach, NoC architectures are explored to identify the best topology between 2D and 3D topology for 3D MPSoC implementation. The architectural explorations have also considered different process technologies highlighting the wire delay effect to the 3D architecture performance especially for interconnect-dominated design. Additionally, we performed heterogeneous 3D stacking of NoC-based MPSoC implementation with GALS style approach and presented several physical designs related analyses regarding 3D MPSoC design and implementation using 2D EDA tools. Finally we conducted an exploration of 2D EDA tool on different 3D architecture to evaluate the impact of 2D EDA tools on the 3D architecture performance. Since there is no commercialize 3D design tool until now, the experiment is important on the basis that designing 3D architecture using 2D EDA tools does not have a strong and direct impact to the 3D architecture performance mainly because the tools is dedicated for 2D architecture design.SAVOIE-SCD - Bib.électronique (730659901) / SudocGRENOBLE1/INP-Bib.électronique (384210012) / SudocGRENOBLE2/3-Bib.électronique (384219901) / SudocSudocFranceF

    Reliability in the face of variability in nanometer embedded memories

    Get PDF
    In this thesis, we have investigated the impact of parametric variations on the behaviour of one performance-critical processor structure - embedded memories. As variations manifest as a spread in power and performance, as a first step, we propose a novel modeling methodology that helps evaluate the impact of circuit-level optimizations on architecture-level design choices. Choices made at the design-stage ensure conflicting requirements from higher-levels are decoupled. We then complement such design-time optimizations with a runtime mechanism that takes advantage of adaptive body-biasing to lower power whilst improving performance in the presence of variability. Our proposal uses a novel fully-digital variation tracking hardware using embedded DRAM (eDRAM) cells to monitor run-time changes in cache latency and leakage. A special fine-grain body-bias generator uses the measurements to generate an optimal body-bias that is needed to meet the required yield targets. A novel variation-tolerant and soft-error hardened eDRAM cell is also proposed as an alternate candidate for replacing existing SRAM-based designs in latency critical memory structures. In the ultra low-power domain where reliable operation is limited by the minimum voltage of operation (Vddmin), we analyse the impact of failures on cache functional margin and functional yield. Towards this end, we have developed a fully automated tool (INFORMER) capable of estimating memory-wide metrics such as power, performance and yield accurately and rapidly. Using the developed tool, we then evaluate the #effectiveness of a new class of hybrid techniques in improving cache yield through failure prevention and correction. Having a holistic perspective of memory-wide metrics helps us arrive at design-choices optimized simultaneously for multiple metrics needed for maintaining lifetime requirements

    Energy-efficient architectures for chip-scale networks and memory systems using silicon-photonics technology

    Full text link
    Today's supercomputers and cloud systems run many data-centric applications such as machine learning, graph algorithms, and cognitive processing, which have large data footprints and complex data access patterns. With computational capacity of large-scale systems projected to rise up to 50GFLOPS/W, the target energy-per-bit budget for data movement is expected to reach as low as 0.1pJ/bit, assuming 200bits/FLOP for data transfers. This tight energy budget impacts the design of both chip-scale networks and main memory systems. Conventional electrical links used in chip-scale networks (0.5-3pJ/bit) and DRAM systems used in main memory (>30pJ/bit) fail to provide sustained performance at low energy budgets. This thesis builds on the promising research on silicon-photonic technology to design system architectures and system management policies for chip-scale networks and main memory systems. The adoption of silicon-photonic links as chip-scale networks, however, is hampered by the high sensitivity of optical devices towards thermal and process variations. These device sensitivities result in high power overheads at high-speed communications. Moreover, applications differ in their resource utilization, resulting in application-specific thermal profiles and bandwidth needs. Similarly, optically-controlled memory systems designed using conventional electrical-based architectures require additional circuitry for electrical-to-optical and optical-to-electrical conversions within memory. These conversions increase the energy and latency per memory access. Due to these issues, chip-scale networks and memory systems designed using silicon-photonics technology leave much of their benefits underutilized. This thesis argues for the need to rearchitect memory systems and redesign network management policies such that they are aware of the application variability and the underlying device characteristics of silicon-photonic technology. We claim that such a cross-layer design enables a high-throughput and energy-efficient unified silicon-photonic link and main memory system. This thesis undertakes the cross-layer design with silicon-photonic technology in two fronts. First, we study the varying network bandwidth requirements across different applications and also within a given application. To address this variability, we develop bandwidth allocation policies that account for application needs and device sensitivities to ensure power-efficient operation of silicon-photonic links. Second, we design a novel architecture of an optically-controlled main memory system that is directly interfaced with silicon-photonic links using a novel read and write access protocol. Such a system ensures low-energy and high-throughput access from the processor to a high-density memory. To further address the diversity in application memory characteristics, we explore heterogeneous memory systems with multiple memory modules that provide varied power-performance benefits. We design a memory management policy for such systems that allocates pages at the granularity of memory objects within an application

    Compact on-chip optical interconnects on silicon by heterogeneous integration of III-V microsources and detectors

    Get PDF

    Ultra-low-power SRAM design in high variability advanced CMOS

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.Cataloged from PDF version of thesis.Includes bibliographical references (p. 163-181).Embedded SRAMs are a critical component in modern digital systems, and their role is preferentially increasing. As a result, SRAMs strongly impact the overall power, performance, and area, and, in order to manage these severely constrained trade-offs, they must be specially designed for target applications. Highly energy-constrained systems (e.g. implantable biomedical devices, multimedia handsets, etc.) are an important class of applications driving ultra-low-power SRAMs. This thesis analyzes the energy of an SRAM sub-array. Since supply- and threshold-voltage have a strong effect, targets for these are established in order to optimize energy. Despite the heavy emphasis on leakage-energy, analysis of a high-density 256x256 sub-array in 45nm LP CMOS points to two necessary optimizations: (1) aggressive supply-voltage reduction (in addition to Vt elevation), and (2) performance enhancement. Important SRAM metrics, including read/write/hold-margin and read-current, are also investigated to identify trade-offs of these optimizations. Based on the need to lower supply-voltage, a 0.35V 256kb SRAM is demonstrated in 65nm LP CMOS. It uses an 8T bit-cell with peripheral circuit-assists to improve write-margin and bit-line leakage. Additionally, redundancy, to manage the increasing impact of variability in the periphery, is proposed to improve the area-offset trade-off of sense-amplifiers, demonstrating promise for highly advanced technology nodes. Based on the need to improve performance, which is limited by density constraints, a 64kb SRAM, using an offset-compensating sense-amplifier, is demonstrated in 45nm LP CMOS with high-density 0.25[mu]m2 bit-cells.(cont.) The sense-amplifier is regenerative, but non -strobed, overcoming timing uncertainties limiting performance, and it is single-ended, for compatibility with 8T cells. Compared to a conventional strobed sense-amplifier, it achieves 34% improvement in worst-case access-time and 4x improvement in the standard deviation of the access-time.by Naveen Verma.Ph.D
    corecore