459 research outputs found
A global wire planning scheme for Network-on-Chip.
As technology scales down, the interconnect for on-chip global communication becomes the delay bottleneck. In order to provide well-controlled global wire delay and efficient global communication, a packet switched Network-on-Chip (NoC) architecture was proposed by different authors. In this paper, the NoC system parameters constrained by the interconnections are studied. Predictions on scaled system parameters such as clock frequency, resource size, global communication bandwidth and inter-resource delay are made for future technologies. Based on these parameters, a global wire planning scheme is proposed
Energy and performance models for clocked and asynchronous communication
Journal ArticleParameterized first-order models for throughput, energy, and bandwidth are presented in this paper. Models are developed for many common pipeline methodologies, including clocked flopped, clocked time-borrowing latch protocols, asynchronous two-cycle, four-cycle, delay-insensitive, and source synchronous. The paper focuses on communication costs which have the potential to throttle design performance as scaling continues. The models can also be applied to logic. The equations share common parameters to allow apples-to-apples comparisons against different design targets and pipeline methodologies. By applying the parameters to various design targets, one can determine when unclocked communication is superior at the physical level to clocked communication in terms of energy for a given bandwidth. Comparisons between protocols at fixed targets also allow designers to understand tradeoffs between implementations that have a varying degree of timing assumptions and design requirements
Limits on Fundamental Limits to Computation
An indispensable part of our lives, computing has also become essential to
industries and governments. Steady improvements in computer hardware have been
supported by periodic doubling of transistor densities in integrated circuits
over the last fifty years. Such Moore scaling now requires increasingly heroic
efforts, stimulating research in alternative hardware and stirring controversy.
To help evaluate emerging technologies and enrich our understanding of
integrated-circuit scaling, we review fundamental limits to computation: in
manufacturing, energy, physical space, design and verification effort, and
algorithms. To outline what is achievable in principle and in practice, we
recall how some limits were circumvented, compare loose and tight limits. We
also point out that engineering difficulties encountered by emerging
technologies may indicate yet-unknown limits.Comment: 15 pages, 4 figures, 1 tabl
A Survey Addressing on High Performance On-Chip VLSI Interconnect
With the rapid increase in transmission speeds of communication systems, the demand for very high-speed lowpower VLSI circuits is on the rise. Although the performance of CMOS technologies improves notably with scaling, conventional CMOS circuits cannot simultaneously satisfy the speed and power requirements of these applications. In this paper we survey the state of the art of on-chip interconnect techniques for improving performance, power and delay optimization and also comparative analysis of various techniques for high speed design have been discussed
Skybridge: 3-D Integrated Circuit Technology Alternative to CMOS
Continuous scaling of CMOS has been the major catalyst in miniaturization of
integrated circuits (ICs) and crucial for global socio-economic progress.
However, scaling to sub-20nm technologies is proving to be challenging as
MOSFETs are reaching their fundamental limits and interconnection bottleneck is
dominating IC operational power and performance. Migrating to 3-D, as a way to
advance scaling, has eluded us due to inherent customization and manufacturing
requirements in CMOS that are incompatible with 3-D organization. Partial
attempts with die-die and layer-layer stacking have their own limitations. We
propose a 3-D IC fabric technology, Skybridge[TM], which offers paradigm shift
in technology scaling as well as design. We co-architect Skybridge's core
aspects, from device to circuit style, connectivity, thermal management, and
manufacturing pathway in a 3-D fabric-centric manner, building on a uniform 3-D
template. Our extensive bottom-up simulations, accounting for detailed material
system structures, manufacturing process, device, and circuit parasitics,
carried through for several designs including a designed microprocessor, reveal
a 30-60x density, 3.5x performance per watt benefits, and 10X reduction in
interconnect lengths vs. scaled 16-nm CMOS. Fabric-level heat extraction
features are shown to successfully manage IC thermal profiles in 3-D. Skybridge
can provide continuous scaling of integrated circuits beyond CMOS in the 21st
century.Comment: 53 Page
Recommended from our members
Synthesis of On-Chip Interconnection Structures:From Point-to-Point Links to Networks-on-Chip
Packet-switched networks-on-chip (NOC) have been advocated as the solution to the challenge of organizing efficient and reliable communication structures among the components of a system-on-chip (SOC). A critical issue in designing a NOC is to determine its topology given the set of point-to-point communication requirements among these components. We present a novel approach to on-chip communication synthesis that is based on the iterative combination of two efficient computational steps: (1) an application of the k-Median algorithm to coarsely determine the global communication structure (which may turned out not be a network after all), and a (2) a variation of the shortest-path algorithm in order to finely tune the data flows on the communication channels. The application of our method to case studies taken from the literature shows that we can automatically synthesize optimal NOC topologies for multi-core on-chip processors and it offers new insights on why NOC are not necessarily a value proposition for some classes of applcation-specific SOCs
Constant-degree graph expansions that preserve the treewidth
Many hard algorithmic problems dealing with graphs, circuits, formulas and
constraints admit polynomial-time upper bounds if the underlying graph has
small treewidth. The same problems often encourage reducing the maximal degree
of vertices to simplify theoretical arguments or address practical concerns.
Such degree reduction can be performed through a sequence of splittings of
vertices, resulting in an _expansion_ of the original graph. We observe that
the treewidth of a graph may increase dramatically if the splittings are not
performed carefully. In this context we address the following natural question:
is it possible to reduce the maximum degree to a constant without substantially
increasing the treewidth?
Our work answers the above question affirmatively. We prove that any simple
undirected graph G=(V, E) admits an expansion G'=(V', E') with the maximum
degree <= 3 and treewidth(G') <= treewidth(G)+1. Furthermore, such an expansion
will have no more than 2|E|+|V| vertices and 3|E| edges; it can be computed
efficiently from a tree-decomposition of G. We also construct a family of
examples for which the increase by 1 in treewidth cannot be avoided.Comment: 12 pages, 6 figures, the main result used by quant-ph/051107
Low-swing signaling for energy efficient on-chip networks
Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 65-69).On-chip networks have emerged as a scalable and high-bandwidth communication fabric in many-core processor chips. However, the energy consumption of these networks is becoming comparable to that of computation cores, making further scaling of core counts difficult. This thesis makes several contributions to low-swing signaling circuit design for the energy efficient on-chip networks in two separate projects: on-chip networks optimized for one-to-many multicasts and broadcasts, and link designs that allow on-chip networks to approach an ideal interconnection fabric. A low-swing crossbar switch, which is based on tri-state Reduced-Swing Drivers (RSDs), is presented for the first project. Measurement results of its test chip fabricated in 45nm SOI CMOS show that the tri-state RSD-based crossbar enables 55% power savings as compared to an equivalent full-swing crossbar and link. Also, the measurement results show that the proposed crossbar allows the broadcast-optimized on-chip networks using a single pipeline stage for physical data transmission to operate at 21% higher data rate, when compared with the full-swing networks. For the second project, two clockless low-swing repeaters, a Self-Resetting Logic Repeater (SRLR) and a Voltage-Locked Repeater (VLR), have been proposed and analyzed in simulation only. They both require no reference clock, differential signaling, and bias current. Such digital-intensive properties enable them to approach energy and delay performance of a point-to-point interconnect of variable lengths. Simulated in 45nm SOI CMOS, the 10mm SRLR featured with high energy efficiency consumes 338fJ/b at 5.4Gb/s/ch while the 10mm VLR raises its data rate up to 16.OGb/s/ch with 427fJ/b.by Sunghyun Park.S.M
On-Chip Transparent Wire Pipelining (invited paper)
Wire pipelining has been proposed as a viable mean to break the discrepancy between decreasing gate delays and increasing wire delays in deep-submicron technologies. Far from being a straightforwardly applicable technique, this methodology requires a number of design modifications in order to insert it seamlessly in the current design flow. In this paper we briefly survey the methods presented by other researchers in the field and then we thoroughly analyze the solutions we recently proposed, ranging from system-level wire pipelining to physical design aspects
Throughput-driven floorplanning with wire pipelining
The size of future high-performance SoC is such that the time-of-flight of wires connecting distant pins in the layout can be much higher than the clock period. In order to keep the frequency as high as possible, the wires may be pipelined. However, the insertion of flip-flops may alter the throughput of the system due to the presence of loops in the logic netlist. In this paper, we address the problem of floorplanning a large design where long interconnects are pipelined by inserting the throughput in the cost function of a tool based on simulated annealing. The results obtained on a series of benchmarks are then validated using a simple router that breaks long interconnects by suitably placing flip-flops along the wires
- âŠ