61 research outputs found

    Skybridge: 3-D Integrated Circuit Technology Alternative to CMOS

    Full text link
    Continuous scaling of CMOS has been the major catalyst in miniaturization of integrated circuits (ICs) and crucial for global socio-economic progress. However, scaling to sub-20nm technologies is proving to be challenging as MOSFETs are reaching their fundamental limits and interconnection bottleneck is dominating IC operational power and performance. Migrating to 3-D, as a way to advance scaling, has eluded us due to inherent customization and manufacturing requirements in CMOS that are incompatible with 3-D organization. Partial attempts with die-die and layer-layer stacking have their own limitations. We propose a 3-D IC fabric technology, Skybridge[TM], which offers paradigm shift in technology scaling as well as design. We co-architect Skybridge's core aspects, from device to circuit style, connectivity, thermal management, and manufacturing pathway in a 3-D fabric-centric manner, building on a uniform 3-D template. Our extensive bottom-up simulations, accounting for detailed material system structures, manufacturing process, device, and circuit parasitics, carried through for several designs including a designed microprocessor, reveal a 30-60x density, 3.5x performance per watt benefits, and 10X reduction in interconnect lengths vs. scaled 16-nm CMOS. Fabric-level heat extraction features are shown to successfully manage IC thermal profiles in 3-D. Skybridge can provide continuous scaling of integrated circuits beyond CMOS in the 21st century.Comment: 53 Page

    Design of ALU and Cache Memory for an 8 bit ALU

    Get PDF
    The design of an ALU and a Cache memory for use in a high performance processor was examined in this thesis. Advanced architectures employing increased parallelism were analyzed to minimize the number of execution cycles needed for 8 bit integer arithmetic operations. In addition to the arithmetic unit, an optimized SRAM memory cell was designed to be used as cache memory and as fast Look Up Table. The ALU consists of stand alone units for bit parallel computation of basic integer arithmetic operations. Addition and subtraction were performed using Kogge Stone parallel prefix hardware operating at 330MHz. A high performance multiplier was built using Radix 4 Modified Booth Encoder (MBE) and a Wallace Tree summation array. The multiplier requires single clock cycle for 8 bit integer multiplication and operates at a maximum frequency of 100MHz. Multiplicative division hardware was built for executing both integer division and square root. The division hardware computes 8-bit division and square root in 4 clock cycles. Multiplier forms the basic building block of all these functional units, making high level of resource sharing feasible with this architecture. The optimal operating frequency for the arithmetic unit is 70MHz. A 6T CMOS SRAM cell measuring 90 µm2 was designed using minimum size transistors. The layout allows for horizontal overlap resulting in effective area of 76 µm2 for an 8x8 array. By substituting equivalent bit line capacitance of P4 L1 Cache, the memory was simulated to have a read time of 3.27ns. An optimized set of test vectors were identified to enable high fault coverage without the need for any additional test circuitry. Sixteen test cases were identified that would toggle all the nodes and provide all possible inputs to the sub units of the multiplier. A correlation based semi automatic method was investigated to facilitate test case identification for large multipliers. This method of testability eliminates performance and area overhead associated with conventional testability hardware. Bottom up design methodology was employed for the design. The performance and area metrics are presented along with estimated power consumption. A set of Monte Carlo analysis was carried out to ensure the dependability of the design under process variations as well as fluctuations in operating conditions. The arithmetic unit was found to require a total die area of 2mm2 (approx.) in 0.35 micron process

    VLSI design of high-speed adders for digital signal processing applications.

    Get PDF

    Techniques of Energy-Efficient VLSI Chip Design for High-Performance Computing

    Get PDF
    How to implement quality computing with the limited power budget is the key factor to move very large scale integration (VLSI) chip design forward. This work introduces various techniques of low power VLSI design used for state of art computing. From the viewpoint of power supply, conventional in-chip voltage regulators based on analog blocks bring the large overhead of both power and area to computational chips. Motivated by this, a digital based switchable pin method to dynamically regulate power at low circuit cost has been proposed to make computing to be executed with a stable voltage supply. For one of the widely used and time consuming arithmetic units, multiplier, its operation in logarithmic domain shows an advantageous performance compared to that in binary domain considering computation latency, power and area. However, the introduced conversion error reduces the reliability of the following computation (e.g. multiplication and division.). In this work, a fast calibration method suppressing the conversion error and its VLSI implementation are proposed. The proposed logarithmic converter can be supplied by dc power to achieve fast conversion and clocked power to reduce the power dissipated during conversion. Going out of traditional computation methods and widely used static logic, neuron-like cell is also studied in this work. Using multiple input floating gate (MIFG) metal-oxide semiconductor field-effect transistor (MOSFET) based logic, a 32-bit, 16-operation arithmetic logic unit (ALU) with zipped decoding and a feedback loop is designed. The proposed ALU can reduce the switching power and has a strong driven-in capability due to coupling capacitors compared to static logic based ALU. Besides, recent neural computations bring serious challenges to digital VLSI implementation due to overload matrix multiplications and non-linear functions. An analog VLSI design which is compatible to external digital environment is proposed for the network of long short-term memory (LSTM). The entire analog based network computes much faster and has higher energy efficiency than the digital one

    Implementation of arithmetic primitives using truly deep submicron technology (TDST)

    Get PDF
    The invention of the transistor in 1947 at Bell Laboratories revolutionised the electronics industry and created a powerful platform for emergence of new industries. The quest to increase the number of devices per chip over the last four decades has resulted in rapid transition from Small-Scale-Integration (SSI) and Large-Scale-lntegration (LSI), through to the Very-Large-Scale-Integration (VLSI) technologies, incorporating approximately 10 to 100 million devices per chip. The next phase in this evolution is the Ultra-Large-Scale-Integration (ULSI) aiming to realise new application domains currently not accessible to CMOS technology. Although technology is continuously evolving to produce smaller systems with minimised power dissipation, the IC industry is facing major challenges due to constraints on power density (W/cm2) and high dynamic (operating) and static (standby) power dissipation. Mobile multimedia communication and optical based technologies have rapidly become a significant area of research and development challenging a variety of technological fronts. The future emergence or 4G (4th Generation) wireless communications networks is further driving this development, requiring increasing levels of media rich content. The processing requirements for capture, conversion, compression, decompression, enhancement and display of higher quality multimedia, place heavy demands on current ULSI systems. This is also apparent for mobile applications and intelligent optical networks where silicon chip area and power dissipation become primary considerations. In addition to the requirements for very low power, compact size and real-time processing, the rapidly evolving nature of telecommunication networks means that flexible soft programmable systems capable of adaptation to support a number of different standards and/or roles become highly desirable. In order to fully realise the capabilities promised by the 4G and supporting intelligent networks, new enabling technologies arc needed to facilitate the next generation of personal communications devices. Most of the current solutions to meet these challenges are based on various implementations of conventional architectures. For decades, silicon has been the main platform of computing, however it is slow, bulky, runs too hot, and is too expensive. Thus, new approaches to architectures, driving multimedia and future telecommunications systems, are needed in order to extend the life cycle of silicon technology. The emergence of Truly Deep Submicron Technology (TDST) and related 3-D interconnection technologies have provided potential alternatives from conventional architectures to 3-D system solutions, through integration of IDST, Vertical Software Mapping and Intelligent Interconnect Technology (IIT). The concept of Soft-Chip Technology (SCT) entails integration of Soft• Processing Circuits with Soft-Configurable Circuits . This concept can effectively manipulate hardware primitives through vertical integration of control and data. Thus the notion of 3-D Soft-Chip emerges as a new design algorithm for content-rich multimedia, telecommunication and intelligent networking system applications. 3•D architectures (design algorithms used suitable for 3-D soft-chip technology), are driven by three factors. The first is development of new device technology (TDST) that can support new architectures with complexities of 100M to 1000M devices. The second is development of advanced wafer bonding techniques such as Indium bump and the more futuristic optical interconnects for 3-D soft-chip mapping. The third is related to improving the performance of silicon CMOS systems as devices continue to scale down in dimensions. One of the fundamental building blocks of any computer system is the arithmetic component. Optimum performance of the system is determined by the efficiency of each individual component, as well as the network as a whole entity. Development of configurable arithmetic primitives is the fundamental focus in 3-D architecture design where functionality can be implemented through soft configurable hardware elements. Therefore the ability to improve the performance capability of a system is of crucial importance for a successful design. Important factors that predict the efficiency of such arithmetic components are: • The propagation delay of the circuit, caused by the gate, diffusion and wire capacitances within !he circuit, minimised through transistor sizing. and • Power dissipation, which is generally based on node transition activity. [2] Although optimum performance of 3-D soft-chip systems is primarily established by the choice of basic primitives such as adders and multipliers, the interconnecting network also has significant degree of influence on !he efficiency of the system. 3-D superposition of devices can decrease interconnect delays by up to 60% compared to a similar planar architecture. This research is based on development and implementation of configurable arithmetic primitives, suitable to the 3-D architecture, and has these foci: • To develop a variety of arithmetic components such as adders and multipliers with particular emphasis on minimum area and compatible with 3-D soft-chip design paradigm. • To explore implementation of configurable distributed primitives for arithmetic processing. This entails optimisation of basic primitives, and using them as part of array processing. In this research the detailed designs of configurable arithmetic primitives are implemented using TDST O.l3µm (130nm) technology, utilising CAD software such as Mentor Graphics and Cadence in Custom design mode, carrying through design, simulation and verification steps

    An Ultra-Low-Power 75mV 64-Bit Current-Mode Majority-Function Adder

    Get PDF
    Ultra-low-power circuits are becoming more desirable due to growing portable device markets and they are also becoming more interesting and applicable today in biomedical, pharmacy and sensor networking applications because of the nano-metric scaling and CMOS reliability improvements. In this thesis, three main achievements are presented in ultra-low-power adders. First, a new majority function algorithm for carry and the sum generation is presented. Then with this algorithm and implied new architecture, we achieved a circuit with 75mV supply voltage operation. Last but not least, a 64 bit current-mode majority-function adder based on the new architecture and algorithm is successfully tested at 75mV supply voltage. The circuit consumed 4.5nW or 3.8pJ in one of the worst conditions

    A Reconfigurable Digital Multiplier and 4:2 Compressor Cells Design

    Get PDF
    With the continually growing use of portable computing devices and increasingly complex software applications, there is a constant push for low power high speed circuitry to support this technology. Because of the high usage and large complex circuitry required to carry out arithmetic operations used in applications such as digital signal processing, there has been a great focus on increasing the efficiency of computer arithmetic circuitry. A key player in the realm of computer arithmetic is the digital multiplier and because of its size and power consumption, it has moved to the forefront of today\u27s research. A digital reconfigurable multiplier architecture will be introduced. Regulated by a 2-bit control signal, the multiplier is capable of double and single precision multiplication, as well as fault tolerant and dual throughput single precision execution. The architecture proposed in this thesis is centered on a recursive multiplication algorithm, where a large multiplication is carried out using recursions of simpler submultiplier modules. Within each sub-multiplier module, instead of carry save adder arrays, 4:2 compressor rows are utilized for partial product reduction, which present greater efficiency, thus result in lower delay and power consumption of the whole multiplier. In addition, a study of various digital logic circuit styles are initially presented, and then three different designs of 4:2 compressor in Domino Logic are presented and simulation results confirm the property of proposed design in terms of delay, power consumption and operation frequenc
    • …
    corecore