50 research outputs found

    Application Centric Networks-On-Chip Design Solutions for Future Multicore Systems

    Get PDF
    With advances in technology, future multicore systems scaled to 100s and 1000s of cores/accelerators are being touted as an effective solution for extracting huge performance gains using parallel programming paradigms. However with the failure of Dennard Scaling all the components on the chip cannot be run simultaneously without breaking the power and thermal constraints leading to strict chip power envelops. The scaling up of the number of on chip components has also brought upon Networks-On-Chip (NoC) based interconnect designs like 2D mesh. The contribution of NoC to the total on chip power and overall performance has been increasing steadily and hence high performance power-efficient NoC designs are becoming crucial. Future multicore paradigms can be broadly classified, based on the applications they are tailored to, into traditional Chip Multi processor(CMP) based application based systems, characterized by low core and NoC utilization, and emerging big data application based systems, characterized by large amounts of data movement necessitating high throughput requirements. To this order, we propose NoC design solutions for power-savings in future CMPs tailored to traditional applications and higher effective throughput gains in multicore systems tailored to bandwidth intensive applications. First, we propose Fly-over, a light-weight distributed mechanism for power-gating routers attached to switched off cores to reduce NoC power consumption in low load CMP environment. Secondly, we plan on utilizing a promising next generation memory technology, Spin-Transfer Torque Magnetic RAM(STT-MRAM), to achieve enhanced NoC performance to satisfy the high throughput demands in emerging bandwidth intensive applications, while reducing the power consumption simultaneously. Thirdly, we present a hardware data approximation framework for NoCs, APPROX-NoC, with an online data error control mechanism, which can leverage the approximate computing paradigm in the emerging data intensive big data applications to attain higher performance per watt

    Low Energy Solutions for FIFOs in Networks on Chip

    Get PDF
    To continue the progress of Moore's law at the end of Dennard Scaling, computer architects turned to multi-core systems, connected by networks-on-chip (NoCs). As this trend persisted, NoCs became a leading energy consumer in modern multi-core processors, with a significant percent originating from the large number of virtual channel (FIFO) buffers. In this work, two orthogonal methods to reduce the use-phase energy of these FIFO buffers are discussed. The first is a reservation based circuit-switching multi-hop routing design, multi-hop segmented circuit switching (MSCS). In a 2D arrangement of an NoC, MSCS performs network control at most once in each dimension for a packet, compared to leading multi-hop approaches which often require multiple arbitration steps. This design resulted in a reduction of FIFO buffer storage by 50% over the leading multi-hop scheme with a nominal latency improvement (1.4%). The second method discussed is the intelligent replacement of SRAM with Domain-Wall Memory (DWM) FIFOs, enabled by novel control schemes which leverage the ''shift-register'' nature of spintronic DWM to create extremely low-energy FIFO queues. The most efficiently designed shift-based buffer used a dual-nanowire approach to more effectively align read and writes to the FIFO with the limited access ports

    Resource-aware scheduling for 2D/3D multi-/many-core processor-memory systems

    Get PDF
    This dissertation addresses the complexities of 2D/3D multi-/many-core processor-memory systems, focusing on two key areas: enhancing timing predictability in real-time multi-core processors and optimizing performance within thermal constraints. The integration of an increasing number of transistors into compact chip designs, while boosting computational capacity, presents challenges in resource contention and thermal management. The first part of the thesis improves timing predictability. We enhance shared cache interference analysis for set-associative caches, advancing the calculation of Worst-Case Execution Time (WCET). This development enables accurate assessment of cache interference and the effectiveness of partitioned schedulers in real-world scenarios. We introduce TCPS, a novel task and cache-aware partitioned scheduler that optimizes cache partitioning based on task-specific WCET sensitivity, leading to improved schedulability and predictability. Our research explores various cache and scheduling configurations, providing insights into their performance trade-offs. The second part focuses on thermal management in 2D/3D many-core systems. Recognizing the limitations of Dynamic Voltage and Frequency Scaling (DVFS) in S-NUCA many-core processors, we propose synchronous thread migrations as a thermal management strategy. This approach culminates in the HotPotato scheduler, which balances performance and thermal safety. We also introduce 3D-TTP, a transient temperature-aware power budgeting strategy for 3D-stacked systems, reducing the need for Dynamic Thermal Management (DTM) activation. Finally, we present 3QUTM, a novel method for 3D-stacked systems that combines core DVFS and memory bank Low Power Modes with a learning algorithm, optimizing response times within thermal limits. This research contributes significantly to enhancing performance and thermal management in advanced processor-memory systems

    Advanced information processing system: The Army fault tolerant architecture conceptual study. Volume 2: Army fault tolerant architecture design and analysis

    Get PDF
    Described here is the Army Fault Tolerant Architecture (AFTA) hardware architecture and components and the operating system. The architectural and operational theory of the AFTA Fault Tolerant Data Bus is discussed. The test and maintenance strategy developed for use in fielded AFTA installations is presented. An approach to be used in reducing the probability of AFTA failure due to common mode faults is described. Analytical models for AFTA performance, reliability, availability, life cycle cost, weight, power, and volume are developed. An approach is presented for using VHSIC Hardware Description Language (VHDL) to describe and design AFTA's developmental hardware. A plan is described for verifying and validating key AFTA concepts during the Dem/Val phase. Analytical models and partial mission requirements are used to generate AFTA configurations for the TF/TA/NOE and Ground Vehicle missions

    On the Edge of Secure Connectivity via Software-Defined Networking

    Get PDF
    Securing communication in computer networks has been an essential feature ever since the Internet, as we know it today, was started. One of the best known and most common methods for secure communication is to use a Virtual Private Network (VPN) solution, mainly operating with an IP security (IPsec) protocol suite originally published in 1995 (RFC1825). It is clear that the Internet, and networks in general, have changed dramatically since then. In particular, the onset of the Cloud and the Internet-of-Things (IoT) have placed new demands on secure networking. Even though the IPsec suite has been updated over the years, it is starting to reach the limits of its capabilities in its present form. Recent advances in networking have thrown up Software-Defined Networking (SDN), which decouples the control and data planes, and thus centralizes the network control. SDN provides arbitrary network topologies and elastic packet forwarding that have enabled useful innovations at the network level. This thesis studies SDN-powered VPN networking and explains the benefits of this combination. Even though the main context is the Cloud, the approaches described here are also valid for non-Cloud operation and are thus suitable for a variety of other use cases for both SMEs and large corporations. In addition to IPsec, open source TLS-based VPN (e.g. OpenVPN) solutions are often used to establish secure tunnels. Research shows that a full-mesh VPN network between multiple sites can be provided using OpenVPN and it can be utilized by SDN to create a seamless, resilient layer-2 overlay for multiple purposes, including the Cloud. However, such a VPN tunnel suffers from resiliency problems and cannot meet the increasing availability requirements. The network setup proposed here is similar to Software-Defined WAN (SD-WAN) solutions and is extremely useful for applications with strict requirements for resiliency and security, even if best-effort ISP is used. IPsec is still preferred over OpenVPN for some use cases, especially by smaller enterprises. Therefore, this research also examines the possibilities for high availability, load balancing, and faster operational speeds for IPsec. We present a novel approach involving the separation of the Internet Key Exchange (IKE) and the Encapsulation Security Payload (ESP) in SDN fashion to operate from separate devices. This allows central management for the IKE while several separate ESP devices can concentrate on the heavy processing. Initially, our research relied on software solutions for ESP processing. Despite the ingenuity of the architectural concept, and although it provided high availability and good load balancing, there was no anti-replay protection. Since anti-replay protection is vital for secure communication, another approach was required. It thus became clear that the ideal solution for such large IPsec tunneling would be to have a pool of fast ESP devices, but to confine the IKE operation to a single centralized device. This would obviate the need for load balancing but still allow high availability via the device pool. The focus of this research thus turned to the study of pure hardware solutions on an FPGA, and their feasibility and production readiness for application in the Cloud context. Our research shows that FPGA works fluently in an SDN network as a standalone IPsec accelerator for ESP packets. The proposed architecture has 10 Gbps throughput, yet the latency is less than 10 µs, meaning that this architecture is especially efficient for data center use and offers increased performance and latency requirements. The high demands of the network packet processing can be met using several different approaches, so this approach is not just limited to the topics presented in this thesis. Global network traffic is growing all the time, so the development of more efficient methods and devices is inevitable. The increasing number of IoT devices will result in a lot of network traffic utilising the Cloud infrastructures in the near future. Based on the latest research, once SDN and hardware acceleration have become fully integrated into the Cloud, the future for secure networking looks promising. SDN technology will open up a wide range of new possibilities for data forwarding, while hardware acceleration will satisfy the increased performance requirements. Although it still remains to be seen whether SDN can answer all the requirements for performance, high availability and resiliency, this thesis shows that it is a very competent technology, even though we have explored only a minor fraction of its capabilities

    The Fifth NASA Symposium on VLSI Design

    Get PDF
    The fifth annual NASA Symposium on VLSI Design had 13 sessions including Radiation Effects, Architectures, Mixed Signal, Design Techniques, Fault Testing, Synthesis, Signal Processing, and other Featured Presentations. The symposium provides insights into developments in VLSI and digital systems which can be used to increase data systems performance. The presentations share insights into next generation advances that will serve as a basis for future VLSI design

    Embedded electronic systems driven by run-time reconfigurable hardware

    Get PDF
    Abstract This doctoral thesis addresses the design of embedded electronic systems based on run-time reconfigurable hardware technology –available through SRAM-based FPGA/SoC devices– aimed at contributing to enhance the life quality of the human beings. This work does research on the conception of the system architecture and the reconfiguration engine that provides to the FPGA the capability of dynamic partial reconfiguration in order to synthesize, by means of hardware/software co-design, a given application partitioned in processing tasks which are multiplexed in time and space, optimizing thus its physical implementation –silicon area, processing time, complexity, flexibility, functional density, cost and power consumption– in comparison with other alternatives based on static hardware (MCU, DSP, GPU, ASSP, ASIC, etc.). The design flow of such technology is evaluated through the prototyping of several engineering applications (control systems, mathematical coprocessors, complex image processors, etc.), showing a high enough level of maturity for its exploitation in the industry.Resumen Esta tesis doctoral abarca el diseño de sistemas electrónicos embebidos basados en tecnología hardware dinámicamente reconfigurable –disponible a través de dispositivos lógicos programables SRAM FPGA/SoC– que contribuyan a la mejora de la calidad de vida de la sociedad. Se investiga la arquitectura del sistema y del motor de reconfiguración que proporcione a la FPGA la capacidad de reconfiguración dinámica parcial de sus recursos programables, con objeto de sintetizar, mediante codiseño hardware/software, una determinada aplicación particionada en tareas multiplexadas en tiempo y en espacio, optimizando así su implementación física –área de silicio, tiempo de procesado, complejidad, flexibilidad, densidad funcional, coste y potencia disipada– comparada con otras alternativas basadas en hardware estático (MCU, DSP, GPU, ASSP, ASIC, etc.). Se evalúa el flujo de diseño de dicha tecnología a través del prototipado de varias aplicaciones de ingeniería (sistemas de control, coprocesadores aritméticos, procesadores de imagen, etc.), evidenciando un nivel de madurez viable ya para su explotación en la industria.Resum Aquesta tesi doctoral està orientada al disseny de sistemes electrònics empotrats basats en tecnologia hardware dinàmicament reconfigurable –disponible mitjançant dispositius lògics programables SRAM FPGA/SoC– que contribueixin a la millora de la qualitat de vida de la societat. S’investiga l’arquitectura del sistema i del motor de reconfiguració que proporcioni a la FPGA la capacitat de reconfiguració dinàmica parcial dels seus recursos programables, amb l’objectiu de sintetitzar, mitjançant codisseny hardware/software, una determinada aplicació particionada en tasques multiplexades en temps i en espai, optimizant així la seva implementació física –àrea de silici, temps de processat, complexitat, flexibilitat, densitat funcional, cost i potència dissipada– comparada amb altres alternatives basades en hardware estàtic (MCU, DSP, GPU, ASSP, ASIC, etc.). S’evalúa el fluxe de disseny d’aquesta tecnologia a través del prototipat de varies aplicacions d’enginyeria (sistemes de control, coprocessadors aritmètics, processadors d’imatge, etc.), demostrant un nivell de maduresa viable ja per a la seva explotació a la indústria
    corecore