97 research outputs found

    Optimal Memoryless Encoding for Low Power Off-Chip Data Buses

    Full text link
    Off-chip buses account for a significant portion of the total system power consumed in embedded systems. Bus encoding schemes have been proposed to minimize power dissipation, but none has been demonstrated to be optimal with respect to any measure. In this paper, we give the first provably optimal and explicit (polynomial-time constructible) families of memoryless codes for minimizing bit transitions in off-chip buses. Our results imply that having access to a clock does not make a memoryless encoding scheme that minimizes bit transitions more powerful.Comment: Proceedings of the 2006 IEEE/ACM international Conference on Computer-Aided Design (San Jose, California, November 05 - 09, 2006). ICCAD '06. ACM, New York, NY, 369-37

    Variable length pattern coding for power reduction in off-chip data buses

    Get PDF
    Off-chip buses consume a huge fraction (20%-40%) of the system power. Hence, techniques such as increasing bus widths, transition encoding etc. have been used for power reduction on off-chip data buses. Since capacitances at the I/O pads and interwire capacitances contribute significantly to increase in power, encoding/decoding schemes have been developed to reduce switching activity of the off-chip bus lines, thus reducing power. Frequent-Value Encoding(FVE) [1], Frequent Value Encoding with Xor (FVExor) [1] and VALVE [2] are some of the better known encoding schemes but they still have scope for improvement. This thesis addresses the problem of power reduction in off-chip data buses by encoding variable number (1 to 4) of fixed-size (32-bit) data values (variable length patterns) which exhibit temporal locality. This characteristic enables us to cache these patterns using 64-entry CAM at the encoder and 64-entry SRAM at the decoder. Whenever a pattern match occurs a 2-bit code indicating the index of the match is sent. If a variable length pattern match occurs then the code and unmatched portion of data is sent. We implemented our scheme, Variable Length Pattern Coding (VLPC) for various integer and floating point benchmarks and have seen 6% to 49% encodable patterns in these benchmarks. Based on the experiments on simplescalar and our analysis in MATLAB, we obtained 4.88% to 40.11% reduction in transition activity for SPEC2000 benchmarks such as crafty, swim, mcf, applu, ammp etc. over unencoded data. This is 0.3% to 38.9% higher than that obtained using FVE, FVExor [1] and VALVE [2] encoding schemes. Finally, we have designed a low-power custom CAM and SRAM using 45nm BSIM4 technology models which has been used to verify lower latency of data matching and storing

    An investigation into a DSP implementation of partial response signaling for 4800 bits per second full-duplex data communications over M.1020 telephone lines

    Get PDF
    Includes bibliographical references.This thesis investigates high-speed digital transmission over a conditioned, voice-grade telephone circuit (M.1020), using a technique known as partial response signaling, or PRS. In particular, the case where 4800 bps, full-duplex transmission is required in a CCI'PT V. 22 type format is investigated. The main v.22 criterion to be adhered to, is that frequency-division multiplexing (FDM) is to be used as the means of separating thetransmit and receive channels. The carrier frequencies should be 1200 Hz and 2400 Hz respectively. The investigation concerns the modulation and demodulation sections only

    The price of certainty: "waterslide curves" and the gap to capacity

    Full text link
    The classical problem of reliable point-to-point digital communication is to achieve a low probability of error while keeping the rate high and the total power consumption small. Traditional information-theoretic analysis uses `waterfall' curves to convey the revolutionary idea that unboundedly low probabilities of bit-error are attainable using only finite transmit power. However, practitioners have long observed that the decoder complexity, and hence the total power consumption, goes up when attempting to use sophisticated codes that operate close to the waterfall curve. This paper gives an explicit model for power consumption at an idealized decoder that allows for extreme parallelism in implementation. The decoder architecture is in the spirit of message passing and iterative decoding for sparse-graph codes. Generalized sphere-packing arguments are used to derive lower bounds on the decoding power needed for any possible code given only the gap from the Shannon limit and the desired probability of error. As the gap goes to zero, the energy per bit spent in decoding is shown to go to infinity. This suggests that to optimize total power, the transmitter should operate at a power that is strictly above the minimum demanded by the Shannon capacity. The lower bound is plotted to show an unavoidable tradeoff between the average bit-error probability and the total power used in transmission and decoding. In the spirit of conventional waterfall curves, we call these `waterslide' curves.Comment: 37 pages, 13 figures. Submitted to IEEE Transactions on Information Theory. This version corrects a subtle bug in the proofs of the original submission and improves the bounds significantl

    SIGNAL PROCESSING TECHNIQUES AND APPLICATIONS

    Get PDF
    As the technologies scaling down, more transistors can be fabricated into the same area, which enables the integration of many components into the same substrate, referred to as system-on-chip (SoC). The components on SoC are connected by on-chip global interconnects. It has been shown in the recent International Technology Roadmap of Semiconductors (ITRS) that when scaling down, gate delay decreases, but global interconnect delay increases due to crosstalk. The interconnect delay has become a bottleneck of the overall system performance. Many techniques have been proposed to address crosstalk, such as shielding, buffer insertion, and crosstalk avoidance codes (CACs). The CAC is a promising technique due to its good crosstalk reduction, less power consumption and lower area. In this dissertation, I will present analytical delay models for on-chip interconnects with improved accuracy. This enables us to have a more accurate control of delays for transition patterns and lead to a more efficient CAC, whose worst-case delay is 30-40% smaller than the best of previously proposed CACs. As the clock frequency approaches multi-gigahertz, the parasitic inductance of on-chip interconnects has become significant and its detrimental effects, including increased delay, voltage overshoots and undershoots, and increased crosstalk noise, cannot be ignored. We introduce new CACs to address both capacitive and inductive couplings simultaneously.Quantum computers are more powerful in solving some NP problems than the classical computers. However, quantum computers suffer greatly from unwanted interactions with environment. Quantum error correction codes (QECCs) are needed to protect quantum information against noise and decoherence. Given their good error-correcting performance, it is desirable to adapt existing iterative decoding algorithms of LDPC codes to obtain LDPC-based QECCs. Several QECCs based on nonbinary LDPC codes have been proposed with a much better error-correcting performance than existing quantum codes over a qubit channel. In this dissertation, I will present stabilizer codes based on nonbinary QC-LDPC codes for qubit channels. The results will confirm the observation that QECCs based on nonbinary LDPC codes appear to achieve better performance than QECCs based on binary LDPC codes.As the technologies scaling down further to nanoscale, CMOS devices suffer greatly from the quantum mechanical effects. Some emerging nano devices, such as resonant tunneling diodes (RTDs), quantum cellular automata (QCA), and single electron transistors (SETs), have no such issues and are promising candidates to replace the traditional CMOS devices. Threshold gate, which can implement complex Boolean functions within a single gate, can be easily realized with these devices. Several applications dealing with real-valued signals have already been realized using nanotechnology based threshold gates. Unfortunately, the applications using finite fields, such as error correcting coding and cryptography, have not been realized using nanotechnology. The main obstacle is that they require a great number of exclusive-ORs (XORs), which cannot be realized in a single threshold gate. Besides, the fan-in of a threshold gate in RTD nanotechnology needs to be bounded for both reliability and performance purpose. In this dissertation, I will present a majority-class threshold architecture of XORs with bounded fan-in, and compare it with a Boolean-class architecture. I will show an application of the proposed XORs for the finite field multiplications. The analysis results will show that the majority class outperforms the Boolean class architectures in terms of hardware complexity and latency. I will also introduce a sort-and-search algorithm, which can be used for implementations of any symmetric functions. Since XOR is a special symmetric function, it can be implemented via the sort-and-search algorithm. To leverage the power of multi-input threshold functions, I generalize the previously proposed sort-and-search algorithm from a fan-in of two to arbitrary fan-ins, and propose an architecture of multi-input XORs with bounded fan-ins

    The development of a novel modem structure for connection of rural to diginet

    Get PDF
    Includes bibliographical references.This thesis investigates the use of partial response signalling as a modulation scheme in a modem structure. The modem structure consists of transmitter modulation and receiver demodulation sections only. The modem is designed to operate at data rates of 2400, 4800 and 9600 bps. The signalling format replaces the CCITT Recommendation V.29 format. The transmitted signal is required to conform to the bandwidth limitations of CCITT Recommendation M.1020 leased telephone circuits

    AirSync: Enabling Distributed Multiuser MIMO with Full Spatial Multiplexing

    Full text link
    The enormous success of advanced wireless devices is pushing the demand for higher wireless data rates. Denser spectrum reuse through the deployment of more access points per square mile has the potential to successfully meet the increasing demand for more bandwidth. In theory, the best approach to density increase is via distributed multiuser MIMO, where several access points are connected to a central server and operate as a large distributed multi-antenna access point, ensuring that all transmitted signal power serves the purpose of data transmission, rather than creating "interference." In practice, while enterprise networks offer a natural setup in which distributed MIMO might be possible, there are serious implementation difficulties, the primary one being the need to eliminate phase and timing offsets between the jointly coordinated access points. In this paper we propose AirSync, a novel scheme which provides not only time but also phase synchronization, thus enabling distributed MIMO with full spatial multiplexing gains. AirSync locks the phase of all access points using a common reference broadcasted over the air in conjunction with a Kalman filter which closely tracks the phase drift. We have implemented AirSync as a digital circuit in the FPGA of the WARP radio platform. Our experimental testbed, comprised of two access points and two clients, shows that AirSync is able to achieve phase synchronization within a few degrees, and allows the system to nearly achieve the theoretical optimal multiplexing gain. We also discuss MAC and higher layer aspects of a practical deployment. To the best of our knowledge, AirSync offers the first ever realization of the full multiuser MIMO gain, namely the ability to increase the number of wireless clients linearly with the number of jointly coordinated access points, without reducing the per client rate.Comment: Submitted to Transactions on Networkin

    Design Techniques for Energy-Quality Scalable Digital Systems

    Get PDF
    Energy efficiency is one of the key design goals in modern computing. Increasingly complex tasks are being executed in mobile devices and Internet of Things end-nodes, which are expected to operate for long time intervals, in the orders of months or years, with the limited energy budgets provided by small form-factor batteries. Fortunately, many of such tasks are error resilient, meaning that they can toler- ate some relaxation in the accuracy, precision or reliability of internal operations, without a significant impact on the overall output quality. The error resilience of an application may derive from a number of factors. The processing of analog sensor inputs measuring quantities from the physical world may not always require maximum precision, as the amount of information that can be extracted is limited by the presence of external noise. Outputs destined for human consumption may also contain small or occasional errors, thanks to the limited capabilities of our vision and hearing systems. Finally, some computational patterns commonly found in domains such as statistics, machine learning and operational research, naturally tend to reduce or eliminate errors. Energy-Quality (EQ) scalable digital systems systematically trade off the quality of computations with energy efficiency, by relaxing the precision, the accuracy, or the reliability of internal software and hardware components in exchange for energy reductions. This design paradigm is believed to offer one of the most promising solutions to the impelling need for low-energy computing. Despite these high expectations, the current state-of-the-art in EQ scalable design suffers from important shortcomings. First, the great majority of techniques proposed in literature focus only on processing hardware and software components. Nonetheless, for many real devices, processing contributes only to a small portion of the total energy consumption, which is dominated by other components (e.g. I/O, memory or data transfers). Second, in order to fulfill its promises and become diffused in commercial devices, EQ scalable design needs to achieve industrial level maturity. This involves moving from purely academic research based on high-level models and theoretical assumptions to engineered flows compatible with existing industry standards. Third, the time-varying nature of error tolerance, both among different applications and within a single task, should become more central in the proposed design methods. This involves designing “dynamic” systems in which the precision or reliability of operations (and consequently their energy consumption) can be dynamically tuned at runtime, rather than “static” solutions, in which the output quality is fixed at design-time. This thesis introduces several new EQ scalable design techniques for digital systems that take the previous observations into account. Besides processing, the proposed methods apply the principles of EQ scalable design also to interconnects and peripherals, which are often relevant contributors to the total energy in sensor nodes and mobile systems respectively. Regardless of the target component, the presented techniques pay special attention to the accurate evaluation of benefits and overheads deriving from EQ scalability, using industrial-level models, and on the integration with existing standard tools and protocols. Moreover, all the works presented in this thesis allow the dynamic reconfiguration of output quality and energy consumption. More specifically, the contribution of this thesis is divided in three parts. In a first body of work, the design of EQ scalable modules for processing hardware data paths is considered. Three design flows are presented, targeting different technologies and exploiting different ways to achieve EQ scalability, i.e. timing-induced errors and precision reduction. These works are inspired by previous approaches from the literature, namely Reduced-Precision Redundancy and Dynamic Accuracy Scaling, which are re-thought to make them compatible with standard Electronic Design Automation (EDA) tools and flows, providing solutions to overcome their main limitations. The second part of the thesis investigates the application of EQ scalable design to serial interconnects, which are the de facto standard for data exchanges between processing hardware and sensors. In this context, two novel bus encodings are proposed, called Approximate Differential Encoding and Serial-T0, that exploit the statistical characteristics of data produced by sensors to reduce the energy consumption on the bus at the cost of controlled data approximations. The two techniques achieve different results for data of different origins, but share the common features of allowing runtime reconfiguration of the allowed error and being compatible with standard serial bus protocols. Finally, the last part of the manuscript is devoted to the application of EQ scalable design principles to displays, which are often among the most energy- hungry components in mobile systems. The two proposals in this context leverage the emissive nature of Organic Light-Emitting Diode (OLED) displays to save energy by altering the displayed image, thus inducing an output quality reduction that depends on the amount of such alteration. The first technique implements an image-adaptive form of brightness scaling, whose outputs are optimized in terms of balance between power consumption and similarity with the input. The second approach achieves concurrent power reduction and image enhancement, by means of an adaptive polynomial transformation. Both solutions focus on minimizing the overheads associated with a real-time implementation of the transformations in software or hardware, so that these do not offset the savings in the display. For each of these three topics, results show that the aforementioned goal of building EQ scalable systems compatible with existing best practices and mature for being integrated in commercial devices can be effectively achieved. Moreover, they also show that very simple and similar principles can be applied to design EQ scalable versions of different system components (processing, peripherals and I/O), and to equip these components with knobs for the runtime reconfiguration of the energy versus quality tradeoff

    Domain specific high performance reconfigurable architecture for a communication platform

    Get PDF
    corecore