73 research outputs found

    Using AVX2 Instruction Set to Increase Performance of High Performance Computing Code

    Get PDF
    In this paper we discuss new Intel instruction extensions - Intel Advance Vector Extensions 2 (AVX2) and what these bring to high performance computing (HPC). To illustrate this new systems utilizing AVX2 are evaluated to demonstrate how to effectively exploit AVX2 for HPC types of the code and expose the situation when AVX2 might not be the most effective way to increase performance

    Intel: Tick-Tock product development cadence

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, System Design and Management Program, 2008.Includes bibliographical references (p. 124-142).This thesis investigates on changes in semiconductor industry's product development methodology by following Intel's product development from year 2000. Intel was challenged by customer's preference change, competitors new enhanced product, internet bubble burst economy, and miss steps in the business strategy. Dynamics of these challenges drove Intel to develop a new product strategy: Tick-Tock product cadence. The paper discusses reasons why Intel landed at the Tick-tock strategy and results how strong product portfolio Intel ended up constructing. The thesis further discusses how the new "Global Product Development" strategy evolves, which can take advantage of TickTock cadence and deliver it to the next level helped from the effective GPD and systems engineering deployment.by Cheolmin Park.S.M

    VLSI design concepts for iterative algorithms

    Get PDF
    Circuit design becomes more and more complicated, especially when the Very Large Scale Integration (VLSI) manufacturing technology node keeps shrinking down to nanoscale level. New challenges come up such as an increasing gap between the design productivity and the Moore’s Law. Leakage power becomes a major factor of the power consumption and traditional shared bus transmission is the critical bottleneck in the billion transistors Multi-Processor System–on–Chip (MPSoC) designs. These issues lead us to discuss the impact on the design of iterative algorithms. This thesis presents several strategies that satisfy various design con- straints, which can be used to explore superior solutions for the circuit design of iterative algorithms. Four selected examples of iterative al- gorithms are elaborated in this respect: hardware implementation of COordinate Rotation DIgital Computer (CORDIC) processor for sig- nal processing, configurable DCT and integer transformations based CORDIC algorithm for image/video compression, parallel Jacobi Eigen- value Decomposition (EVD) method with arbitrary iterations for com- munication, and acceleration of parallel Sparse Matrix–Vector Multipli- cation (SMVM) operations based Network–on–Chip (NoC) for solving systems of linear equations. These four applications of iterative meth- ods have been chosen since they cover a wide area of current signal processing tasks. Each method has its own unique design criteria when it comes to the direct implementation on the circuit level. Therefore, a balanced solution between various design tradeoffs is elaborated for each method. These tradeoffs are between throughput and power consumption, com- putational complexity and transformation accuracy, the number of in- ner/outer iterations and energy consumption, data structure and net- work topology. It is shown that all of these algorithms can be imple- mented on FPGA devices or as ASICs efficiently

    A tool for the automatic analysis of single events effects on electronic circuits

    Get PDF
    Nowadays integrated circuit reliability is challenged by both variability and working conditions. Environmental radiation has become a major issue when ensuring the circuit correct behavior. The required radiation and later analysis performed to the circuit boards is both fund and time expensive. The lack of tools which support pre-manufacturing radiation hardness analysis hinders circuit designers tasks. This paper describes an extensively customizable simulation tool for the characterization of radiation effects on electronic systems. The proposed tool can produce an in depth analysis of a complete circuit in almost any kind of radiation environment in affordable computation times

    Simultaneous enlargement of SRAM read/write noise margin by controlling virtual ground lines

    Get PDF
    金æČąć€§ć­Šç†ć·„ç ”ç©¶ćŸŸé›»ć­æƒ…ć ±ć­Šçł»The SRAM operating margin in 65nm technology is analyzed. The peak characteristic in the read margin versus the supply voltage was found to be caused by the channel length modulation effect. Controlling the memory cell virtual ground line proved to be effective in enlarging the operating margin simultaneously in the read and the write operations. A simple o ptimum circuit which does not require any dynamic voltage c ontrol is proposed, realizing an improvement in the operating m argin comparable to conventional circuits requiring dynamic voltage control. © 2010 IEEE

    Core Count vs Cache Size for Manycore Architectures in the Cloud

    Get PDF
    The number of cores which fit on a single chip is growing at an exponential rate while off-chip main memory bandwidth is growing at a linear rate at best. This core count to off-chip bandwidth disparity causes per-core memory bandwidth to decrease as process technology advances. Continuing per-core off-chip bandwidth reduction will cause multicore and manycore chip architects to rethink the optimal grain size of a core and the on-chip cache configuration in order to save main memory bandwidth. This work introduces an analytic model to study the tradeoffs of utilizing increased chip area for larger caches versus more cores. We focus this study on constructing manycore architectures well suited for the emerging application space of cloud computing where many independent applications are consolidated onto a single chip. This cloud computing application mix favors small, power-efficient cores. The model is exhaustively evaluated across a large range of cache and core-count configurations utilizing SPEC Int 2000 miss rates and CACTI timing and area models to determine the optimal cache configurations and the number of cores across four process nodes. The model maximizes aggregate computational throughput and is applied to SRAM and logic process DRAM caches. As an example, our study demonstrates that the optimal manycore configuration in the 32nm node for a 200 mm^2 die uses on the order of 158 cores, with each core containing a 64KB L1I cache, a 16KB L1D cache, and a 1MB L2 embedded-DRAM cache. This study finds that the optimal cache size will continue to grow as process technology advances, but the tradeoff between more cores and larger caches is a complex tradeoff in the face of limited off-chip bandwidth and the non-linearities of cache miss rates and memory controller queuing delay

    Simultaneous enlargement of SRAM read/write noise margin by controlling virtual ground lines

    Full text link
    • 

    corecore