530 research outputs found

    Packet Transactions: High-level Programming for Line-Rate Switches

    Full text link
    Many algorithms for congestion control, scheduling, network measurement, active queue management, security, and load balancing require custom processing of packets as they traverse the data plane of a network switch. To run at line rate, these data-plane algorithms must be in hardware. With today's switch hardware, algorithms cannot be changed, nor new algorithms installed, after a switch has been built. This paper shows how to program data-plane algorithms in a high-level language and compile those programs into low-level microcode that can run on emerging programmable line-rate switching chipsets. The key challenge is that these algorithms create and modify algorithmic state. The key idea to achieve line-rate programmability for stateful algorithms is the notion of a packet transaction : a sequential code block that is atomic and isolated from other such code blocks. We have developed this idea in Domino, a C-like imperative language to express data-plane algorithms. We show with many examples that Domino provides a convenient and natural way to express sophisticated data-plane algorithms, and show that these algorithms can be run at line rate with modest estimated die-area overhead.Comment: 16 page

    Power Reductions with Energy Recovery Using Resonant Topologies

    Get PDF
    The problem of power densities in system-on-chips (SoCs) and processors has become more exacerbated recently, resulting in high cooling costs and reliability issues. One of the largest components of power consumption is the low skew clock distribution network (CDN), driving large load capacitance. This can consume as much as 70% of the total dynamic power that is lost as heat, needing elaborate sensing and cooling mechanisms. To mitigate this, resonant clocking has been utilized in several applications over the past decade. An improved energy recovering reconfigurable generalized series resonance (GSR) solution with all the critical support circuitry is developed in this work. This LC resonant clock driver is shown to save about 50% driver power (\u3e40% overall), on a 22nm process node and has 50% less skew than a non-resonant driver at 2GHz. It can operate down to 0.2GHz to support other energy savings techniques like dynamic voltage and frequency scaling (DVFS). As an example, GSR can be configured for the simpler pulse series resonance (PSR) operation to enable further power saving for double data rate (DDR) applications, by using de-skewing latches instead of flip-flop banks. A PSR based subsystem for 40% savings in clocking power with 40% driver active area reduction xii is demonstrated. This new resonant driver generates tracking pulses at each transition of clock for dual edge operation across DVFS. PSR clocking is designed to drive explicit-pulsed latches with negative setup time. Simulations using 45nm IBM/PTM device and interconnect technology models, clocking 1024 flip-flops show the reductions, compared to non-resonant clocking. DVFS range from 2GHz/1.3V to 200MHz/0.5V is obtained. The PSR frequency is set \u3e3× the clock rate, needing only 1/10th the inductance of prior-art LC resonance schemes. The skew reductions are achieved without needing to increase the interconnect widths owing to negative set-up times. Applications in data circuits are shown as well with a 90nm example. Parallel resonant and split-driver non-resonant configurations as well are derived from GSR. Tradeoffs in timing performance versus power, based on theoretical analysis, are compared for the first time and verified. This enables synthesis of an optimal topology for a given application from the GSR

    Achieving High Speed CFD simulations: Optimization, Parallelization, and FPGA Acceleration for the unstructured DLR TAU Code

    Get PDF
    Today, large scale parallel simulations are fundamental tools to handle complex problems. The number of processors in current computation platforms has been recently increased and therefore it is necessary to optimize the application performance and to enhance the scalability of massively-parallel systems. In addition, new heterogeneous architectures, combining conventional processors with specific hardware, like FPGAs, to accelerate the most time consuming functions are considered as a strong alternative to boost the performance. In this paper, the performance of the DLR TAU code is analyzed and optimized. The improvement of the code efficiency is addressed through three key activities: Optimization, parallelization and hardware acceleration. At first, a profiling analysis of the most time-consuming processes of the Reynolds Averaged Navier Stokes flow solver on a three-dimensional unstructured mesh is performed. Then, a study of the code scalability with new partitioning algorithms are tested to show the most suitable partitioning algorithms for the selected applications. Finally, a feasibility study on the application of FPGAs and GPUs for the hardware acceleration of CFD simulations is presented

    Low Power Processor Architectures and Contemporary Techniques for Power Optimization – A Review

    Get PDF
    The technological evolution has increased the number of transistors for a given die area significantly and increased the switching speed from few MHz to GHz range. Such inversely proportional decline in size and boost in performance consequently demands shrinking of supply voltage and effective power dissipation in chips with millions of transistors. This has triggered substantial amount of research in power reduction techniques into almost every aspect of the chip and particularly the processor cores contained in the chip. This paper presents an overview of techniques for achieving the power efficiency mainly at the processor core level but also visits related domains such as buses and memories. There are various processor parameters and features such as supply voltage, clock frequency, cache and pipelining which can be optimized to reduce the power consumption of the processor. This paper discusses various ways in which these parameters can be optimized. Also, emerging power efficient processor architectures are overviewed and research activities are discussed which should help reader identify how these factors in a processor contribute to power consumption. Some of these concepts have been already established whereas others are still active research areas. © 2009 ACADEMY PUBLISHER

    Processing, Characterization And Performance Of Carbon Nanopaper Based Multifunctional Nanocomposites

    Get PDF
    Carbon nanofibers (CNFs) used as nano-scale reinforcement have been extensively studied since they are capable of improving the physical and mechanical properties of conventional fiber reinforced polymer composites. However, the properties of CNFs are far away from being fully utilized in the composites due to processing challenges including the dispersion of CNFs and the viscosity increase of polymer matrix. To overcome these issues, a unique approach was developed by making carbon nanopaper sheet through the filtration of well-dispersed carbon nanofibers under controlled processing conditions, and integrating carbon nanopaper sheets into composite laminates using autoclave process and resin transfer molding (RTM). This research aims to fundamentally study the processing-structure-property-performance relationship of carbon nanopaper-based nanocomposites multifunctional applications: a) Vibrational damping. Carbon nanofibers with extremely high aspect ratios and low density present an ideal candidate as vibrational damping material; specifically, the large specific area and aspect ratio of carbon nanofibers promote significant interfacial friction between carbon nanofiber and polymer matrix, causing higher energy dissipation in the matrix. Polymer composites with the reinforcement of carbon nanofibers in the form of a paper sheet have shown significant vibration damping improvement with a damping ratio increase of 300% in the nanocomposites. b) Wear resistance. In response to the iv observed increase in toughness of the nanocomposites, tribological properties of the nanocomposite coated with carbon nanofiber/ceramic particles hybrid paper have been studied. Due to high strength and toughness, carbon nanofibers can act as microcrack reducer; additionally, the composites coated with such hybrid nanopaper of carbon nanofiber and ceramic particles shown an improvement of reducing coefficient of friction (COF) and wear rate. c) High electrical conductivity. A highly conductive coating material was developed and applied on the surface of the composites for the electromagnetic interference shielding and lightning strike protection. To increase the conductivity of the carbon nanofiber paper, carbon nanofibers were modified with nickel nanostrands. d) Electrical actuation of SMP composites. Compared with other methods of SMP actuation, the use of electricity to induce the shape-memory effect of SMP is desirable due to the controllability and effectiveness. The electrical conductivity of carbon fiber reinforced SMP composites can be significantly improved by incorporating CNFs and CNF paper into them. A vision-based system was designed to control the deflection angle of SMP composites to desired values. The funding support from National Science Foundation and FAA Center of Excellence for Commercial Space Transportation (FAA COE CST) is acknowledged

    Verification of delayed-reset domino circuits using ATACS

    Get PDF
    Journal ArticleThis paper discusses the application of the timing analysis tool ATACS to the high performance, self-resetting and delayed-reset domino circuits being designed at IBM's Austin Research Laboratory. The tool, which was originally developed to deal with asynchronous circuits, is well suited to the self-resetting style since internally, a block of selfresetting or delayed-reset domino logic is asynchronous. The circuits are represented using timed event/level structures. These structures correspond very directly to gate level circuits, making the translation from a transistor schematic to a TEL structure straightforward. The statespace explosion problem is mitigated using an algorithm based on partially ordered sets (POSETs). Results on a number of circuits from the recently published guTS (gigahertz unit Test Site) processor from IBM indicate that modules of significant size can be verified with ATACS using a level of abstraction that preserves the interesting timing properties of the circuit. Accurate circuit level verification allows the designer to include less margin in the design, which can lead to increased performance

    Multibeam Sparse Tiled Planar Array for Joint Communication and Sensing

    Full text link
    Multibeam analog arrays have been proposed for millimeter-wave joint communication and sensing (JCAS). We study multibeam planar arrays for JCAS, providing time division duplex communication and full-duplex sensing with steerable beams. In order to have a large aperture with a narrow beamwidth in the radiation pattern, we propose to design a sparse tiled planar array (STPA) aperture with affordable number of phase shifters. The modular tiling and sparse design of the array are non-convex optimization problems, however, we exploit the fact that the more irregularity of the antenna array geometry, the less the side lobe level. We propose to first solve the optimization by the maximum entropy in the phase centers of tiles in the array; then we perform sparse subarray selection leveraging the geometry of the sunflower array. While maintaining the same spectral efficiency in the communication link as conventional uniform planar array (CUPA), the STPA improves angle of arrival estimation when the line-of-sight path is dominant, e.g., the STPA with 125 elements distinguishes two adjacent targets with 20^\circ difference in the proximity of boresight whereas CUPA cannot. Moreover, the STPA has a 40%\% shorter blockage time compared to the CUPA when a blocker moves in the elevation angles.Comment: Manuscript submitted to IEEE Trans. Wireless Communication. On August 25, 2022. 27 pages, 16 figure

    A Structured Design Methodology for High Performance VLSI Arrays

    Get PDF
    abstract: The geometric growth in the integrated circuit technology due to transistor scaling also with system-on-chip design strategy, the complexity of the integrated circuit has increased manifold. Short time to market with high reliability and performance is one of the most competitive challenges. Both custom and ASIC design methodologies have evolved over the time to cope with this but the high manual labor in custom and statistic design in ASIC are still causes of concern. This work proposes a new circuit design strategy that focuses mostly on arrayed structures like TLB, RF, Cache, IPCAM etc. that reduces the manual effort to a great extent and also makes the design regular, repetitive still achieving high performance. The method proposes making the complete design custom schematic but using the standard cells. This requires adding some custom cells to the already exhaustive library to optimize the design for performance. Once schematic is finalized, the designer places these standard cells in a spreadsheet, placing closely the cells in the critical paths. A Perl script then generates Cadence Encounter compatible placement file. The design is then routed in Encounter. Since designer is the best judge of the circuit architecture, placement by the designer will allow achieve most optimal design. Several designs like IPCAM, issue logic, TLB, RF and Cache designs were carried out and the performance were compared against the fully custom and ASIC flow. The TLB, RF and Cache were the part of the HEMES microprocessor.Dissertation/ThesisPh.D. Electrical Engineering 201
    corecore