3 research outputs found

    Energy Consumption of VLSI Decoders

    Full text link
    Thompson's model of VLSI computation relates the energy of a computation to the product of the circuit area and the number of clock cycles needed to carry out the computation. It is shown that for any family of circuits implemented according to this model, using any algorithm that performs decoding of a codeword passed through a binary erasure channel, as the block length approaches infinity either (a) the probability of block error is asymptotically lower bounded by 1/2 or (b) the energy of the computation scales at least as Omega(n(log n)^(1/2)), and so the energy of successful decoding, per decoded bit, must scale at least as Omega((log n)^(1/2)). This implies that the average energy per decoded bit must approach infinity for any sequence of codes that approaches capacity. The analysis techniques used are then extended to the case of serial computation, showing that if a circuit is restricted to serial computation, then as block length approaches infinity, either the block error probability is lower bounded by 1/2 or the energy scales at least as fast as Omega(n log(n)). In a very general case that allows for the number of output pins to vary with block length, it is shown that the average energy per decoded bit must scale as Omega(n(log n)^(1/5)). A simple example is provided of a class of circuits performing low-density parity-check decoding whose energy complexity scales as O(n^2 log log n).Comment: Submitte

    Off-chip Communications Architectures For High Throughput Network Processors

    Get PDF
    In this work, we present off-chip communications architectures for line cards to increase the throughput of the currently used memory system. In recent years there is a significant increase in memory bandwidth demand on line cards as a result of higher line rates, an increase in deep packet inspection operations and an unstoppable expansion in lookup tables. As line-rate data and NPU processing power increase, memory access time becomes the main system bottleneck during data store/retrieve operations. The growing demand for memory bandwidth contrasts the notion of indirect interconnect methodologies. Moreover, solutions to the memory bandwidth bottleneck are limited by physical constraints such as area and NPU I/O pins. Therefore, indirect interconnects are replaced with direct, packet-based networks such as mesh, torus or k-ary n-cubes. We investigate multiple k-ary n-cube based interconnects and propose two variations of 2-ary 3-cube interconnect called the 3D-bus and 3D-mesh. All of the k-ary n-cube interconnects include multiple, highly efficient techniques to route, switch, and control packet flows in order to minimize congestion spots and packet loss. We explore the tradeoffs between implementation constraints and performance. We also developed an event-driven, interconnect simulation framework to evaluate the performance of packet-based off-chip k-ary n-cube interconnect architectures for line cards. The simulator uses the state-of-the-art software design techniques to provide the user with a flexible yet robust tool, that can emulate multiple interconnect architectures under non-uniform traffic patterns. Moreover, the simulator offers the user with full control over network parameters, performance enhancing features and simulation time frames that make the platform as identical as possible to the real line card physical and functional properties. By using our network simulator, we reveal the best processor-memory configuration, out of multiple configurations, that achieves optimal performance. Moreover, we explore how network enhancement techniques such as virtual channels and sub-channeling improve network latency and throughput. Our performance results show that k-ary n-cube topologies, and especially our modified version of 2-ary 3-cube interconnect - the 3D-mesh, significantly outperform existing line card interconnects and are able to sustain higher traffic loads. The flow control mechanism proved to extensively reduce hot-spots, load-balance areas of high traffic rate and achieve low transmission failure rate. Moreover, it can scale to adopt more memories and/or processors and as a result to increase the line card\u27s processing power
    corecore