3,621 research outputs found

    DIA: A complexity-effective decoding architecture

    Get PDF
    Fast instruction decoding is a true challenge for the design of CISC microprocessors implementing variable-length instructions. A well-known solution to overcome this problem is caching decoded instructions in a hardware buffer. Fetching already decoded instructions avoids the need for decoding them again, improving processor performance. However, introducing such special--purpose storage in the processor design involves an important increase in the fetch architecture complexity. In this paper, we propose a novel decoding architecture that reduces the fetch engine implementation cost. Instead of using a special-purpose hardware buffer, our proposal stores frequently decoded instructions in the memory hierarchy. The address where the decoded instructions are stored is kept in the branch prediction mechanism, enabling it to guide our decoding architecture. This makes it possible for the processor front end to fetch already decoded instructions from the memory instead of the original nondecoded instructions. Our results show that using our decoding architecture, a state-of-the-art superscalar processor achieves competitive performance improvements, while requiring less chip area and energy consumption in the fetch architecture than a hardware code caching mechanism.Peer ReviewedPostprint (published version

    Control speculation for energy-efficient next-generation superscalar processors

    Get PDF
    Conventional front-end designs attempt to maximize the number of "in-flight" instructions in the pipeline. However, branch mispredictions cause the processor to fetch useless instructions that are eventually squashed, increasing front-end energy and issue queue utilization and, thus, wasting around 30 percent of the power dissipated by a processor. Furthermore, processor design trends lead to increasing clock frequencies by lengthening the pipeline, which puts more pressure on the branch prediction engine since branches take longer to be resolved. As next-generation high-performance processors become deeply pipelined, the amount of wasted energy due to misspeculated instructions will go up. The aim of this work is to reduce the energy consumption of misspeculated instructions. We propose selective throttling, which triggers different power-aware techniques (fetch throttling, decode throttling, or disabling the selection logic) depending on the branch prediction confidence level. Results show that combining fetch-bandwidth reduction along with select-logic disabling provides the best performance in terms of overall energy reduction and energy-delay product improvement (14 percent and 10 percent, respectively, for a processor with a 22-stage pipeline and 16 percent and 13 percent, respectively, for a processor with a 42-stage pipeline).Peer ReviewedPostprint (published version

    On-Demand Cooperation MAC Protocols with Optimal Diversity-Multiplexing Tradeoff

    Get PDF
    This paper presents access protocols with optimal Diversity-Multiplexing Tradeoff (DMT) performance in the context of IEEE 802.11-based mesh networks. The protocols are characterized by two main features: on-demand cooperation and selection of the best relay terminal. The on-demand characteristic refers to the ability of a destination terminal to ask for cooperation when it fails in decoding the message transmitted by a source terminal. This approach allows maximization of the spatial multiplexing gain. The selection of the best relay terminal allows maximization of the diversity order. Hence, the optimal DMT curve is achieved with these protocols

    Cross-layer optimization of unequal protected layered video over hierarchical modulation

    Get PDF
    Abstract-unequal protection mechanisms have been proposed at several layers in order to improve the reliability of multimedia contents, especially for video data. The paper aims at implementing a multi-layer unequal protection scheme, which is based on a Physical-Transport-Application cross-layer design. Hierarchical modulation, in the physical layer, has been demonstrated to increase the overall user capacity of a wireless communications. On the other hand, unequal erasure protection codes at the transport layer turned out to be an efficient method to protect video data generated by the application layer by exploiting their intrinsic properties. In this paper, the two techniques are jointly optimized in order to enable recovering lost data in case the protection is performed separately. We show that the cross-layer design proposed herein outperforms the performance of hierarchical modulation and unequal erasure codes taken independently

    A Detailed Analysis of Contemporary ARM and x86 Architectures

    Get PDF
    RISC vs. CISC wars raged in the 1980s when chip area and processor design complexity were the primary constraints and desktops and servers exclusively dominated the computing landscape. Today, energy and power are the primary design constraints and the computing landscape is significantly different: growth in tablets and smartphones running ARM (a RISC ISA) is surpassing that of desktops and laptops running x86 (a CISC ISA). Further, the traditionally low-power ARM ISA is entering the high-performance server market, while the traditionally high-performance x86 ISA is entering the mobile low-power device market. Thus, the question of whether ISA plays an intrinsic role in performance or energy efficiency is becoming important, and we seek to answer this question through a detailed measurement based study on real hardware running real applications. We analyze measurements on the ARM Cortex-A8 and Cortex-A9 and Intel Atom and Sandybridge i7 microprocessors over workloads spanning mobile, desktop, and server computing. Our methodical investigation demonstrates the role of ISA in modern microprocessors? performance and energy efficiency. We find that ARM and x86 processors are simply engineering design points optimized for different levels of performance, and there is nothing fundamentally more energy efficient in one ISA class or the other. The ISA being RISC or CISC seems irrelevant

    Design Considerations for Low Power Internet Protocols

    Full text link
    Over the past 10 years, low-power wireless networks have transitioned to supporting IPv6 connectivity through 6LoWPAN, a set of standards which specify how to aggressively compress IPv6 packets over low-power wireless links such as 802.15.4. We find that different low-power IPv6 stacks are unable to communicate using 6LoWPAN, and therefore IP, due to design tradeoffs between code size and energy efficiency. We argue that applying traditional protocol design principles to low-power networks is responsible for these failures, in part because receivers must accommodate a wide range of senders. Based on these findings, we propose three design principles for Internet protocols on low-power networks. These principles are based around the importance of providing flexible tradeoffs between code size and energy efficiency. We apply these principles to 6LoWPAN and show that the resulting design of the protocol provides developers a wide range of tradeoff points while allowing implementations with different choices to seamlessly communicate
    corecore