53 research outputs found

    Dynamic Information Flow Tracking on Multicores

    Get PDF
    Dynamic Information Flow Tracking (DIFT) is a promising technique for detecting software attacks. Due to the computationally intensive nature of the technique, prior efficient implementations [21, 6] rely on specialized hardware support whose only purpose is to enable DIFT. Alternatively, prior software implementations are either too slow [17, 15] resulting in execution time increases as much as four fold for SPEC integer programs or they are not transparent [31] requiring source code modifications. In this paper, we propose the use of chip multiprocessors (CMP) to perform DIFT transparently and efficiently. We spawn a helper thread that is scheduled on a separate core and is only responsible for performing information flow tracking operations. This entails the communication of registers and flags between the main and helper threads. We explore software (shared memory) and hardware (dedicated interconnect) approaches to enable this communication. Finally, we propose a novel application of the DIFT infrastructure where, in addition to the detection of the software attack, DIFT assists in the process of identifying the cause of the bug in the code that enabled the exploit in the first place. We conducted detailed simulations to evaluate the overhead for performing DIFT and found that to be 48 % for SPEC integer programs

    Tiled microprocessors

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2007.Includes bibliographical references (p. 251-258).Current-day microprocessors have reached the point of diminishing returns due to inherent scalability limitations. This thesis examines the tiled microprocessor, a class of microprocessor which is physically scalable but inherits many of the desirable properties of conventional microprocessors. Tiled microprocessors are composed of an array of replicated tiles connected by a special class of network, the Scalar Operand Network (SON), which is optimized for low-latency, low-occupancy communication between remote ALUs on different tiles. Tiled microprocessors can be constructed to scale to 100's or 1000's of functional units. This thesis identifies seven key criteria for achieving physical scalability in tiled microprocessors. It employs an archetypal tiled microprocessor to examine the challenges in achieving these criteria and to explore the properties of Scalar Operand Networks. The thesis develops the field of SONs in three major ways: it introduces the 5-tuple performance metric, it describes a complete, high-frequency SON implementation, and it proposes a taxonomy, called AsTrO, for categorizing them.(cont.) To develop these ideas, the thesis details the design, implementation and analysis of a tiled microprocessor prototype, the Raw Microprocessor, which was implemented at MIT in 180 nm technology. Overall, compared to Raw, recent commercial processors with half the transistors required 30x as many lines of code, occupied 100x as many designers, contained 50x as many pre-tapeout bugs, and resulted in 33x as many post-tapeout bugs. At the same time, the Raw microprocessor proves to be more versatile in exploiting ILP, stream, and server-farm workloads with modest to large amounts of parallelism.by Michael Bedford Taylor.Ph.D

    A Virtual Channel Network-on-Chip for GT and BE traffic

    Get PDF
    This paper presents an on-chip network for a run-time reconfigurable System-on-Chip. The network uses packet-switching with virtual channels. It can provide guaranteed services as well as best effort services. The guaranteed services are based on virtual channel allocation, in contrast to other on-chip networks where guarantees are provided by time-division multiplexing. The network is particularly suitable for systems in which the traffic is dominated by streams. We model the data traffic in the system and simulate the behaviour of the network with this model. The results show that the network is capable of handling the system traffic and can provide the required guarantees

    Recursive partitioning multicast: a bandwidth-efficient routing for networks-on-chip

    Get PDF
    Chip Multi-processor (CMP) architectures have become mainstream for designing processors. With a large number of cores, Networks-on-Chip (NOCs) provide a scalable communication method for CMP architectures. NOCs must be carefully designed to meet constraints of power consumption and area, and provide ultra low latencies. Existing NOCs mostly use Dimension Order Routing (DOR) to determine the route taken by a packet in unicast traffic. However, with the development of diverse applications in CMPs, one-to-many (multicast) and one-to-all (broadcast) traffic are becoming more common. Current unicast routing cannot support multicast and broadcast traffic efficiently. In this paper, we propose Recursive Partitioning Multicast (RPM) routing and a detailed multicast wormhole router design for NOCs. RPM allows routers to select intermediate replication nodes based on the global distribution of destination nodes. This provides more path diversities, thus achieves more bandwidth-efficiency and finally improves the performance of the whole network. Our simulation results using a detailed cycle-accurate simulator show that compared with the most recent multicast scheme, RPM saves 25 % of crossbar and link power, and 33 % of link utilization with 50 % network performance improvement. Also RPM is more scalable to large networks than the recently proposed VCTM. 1

    HELIX-RC

    Get PDF
    Data dependences in sequential programs limit parallelization because extracted threads cannot run independently. Although thread-level speculation can avoid the need for precise dependence analysis, communication overheads required to synchronize actual dependences counteract the benefits of parallelization. To address these challenges, we propose a lightweight architectural enhancement co-designed with a parallelizing compiler, which together can decouple communication from thread execution. Simulations of these approaches, applied to a processor with 16 Intel Atom-like cores, show an average of 6.85x performance speedup for six SPEC CINT2000 benchmarksThis work was possible thanks to the sponsorship of the Royal Academy of Engineering, EPSRC and the National Science Foundation (award number IIS-0926148).This is the accepted manuscript. The final version is available from IEEE and ACM at http://dl.acm.org/citation.cfm?doid=2678373.2665705

    A Parallel Routing Algorithm for Torus NoCs

    Get PDF
    Abstract. This paper proposes a parallel routing algorithm for routing multiple data streams over disjoint paths in the torus Network-on-Chip (NoC) architecture. We show how to construct a maximal set of disjoint paths between any two nodes of a torus network topology and then make use of the constructed paths for the simultaneous routing of multiple data streams between these nodes. Analytical performance evaluation results are obtained showing the effectiveness of the proposed parallel routing algorithm in reducing communication delays and increasing throughput when transferring large amounts of data in an NoC-based multi-core system

    Microarchitectural wire management for performance and power in partitioned architectures

    Get PDF
    Journal ArticleFuture high-performance billion-transistor processors are likely to employ partitioned architectures to achieve high clock speeds, high parallelism, low design complexity, and low power. In such architectures, inter-partition communication over global wires has a significant impact on overall processor performance and power consumption. VLSI techniques allow a variety of wire implementations, but these wire properties have previously never been exposed to the microarchitecture. This paper advocates global wire management at the microarchitecture level and proposes a heterogeneous interconnect that is comprised of wires with varying latency, bandwidth, and energy characteristics. We propose and evaluate microarchitectural techniques that can exploit such a heterogeneous interconnect to improve performance and reduce energy consumption. These techniques include a novel cache pipeline design, the identification of narrow bit-width operands, the classification of non-critical data, and the detection of interconnect load imbalance. For a dynamically scheduled partitioned architecture, our results demonstrate that the proposed innovations result in up to 11% reductions in overall processor ED2, compared to a baseline processor that employs a homogeneous interconnect

    LBDR: An efficient unicast routing support for CMPs

    Full text link
    LBDR is a routing distributed layer based on minimum logic that removes the need for routing tables at switches on network-on-chips (NoCs) in CMPs and enables the implementation of many routing algorithms on most of regular and irregular toplogies we may find in the near future in a multi-core system.Rodrigo Mocholí, S. (2008). LBDR: An efficient unicast routing support for CMPs. http://hdl.handle.net/10251/13476Archivo delegad

    Mapping and Configuration Methods for Multi-Use-Case Networks on Chips

    Get PDF
    To provide a scalable communication infrastructure for Systems on Chips (SoCs), Networks on Chips (NoCs), a communication centric design paradigm is needed. To be cost effective, SoCs are often programmable and integrate several different applications or use-cases on to the same chip. For the SoC platform to support the different use-cases, the NoC architecture should satisfy the performance constraints of each individual use-case. In this work we motivate the need to consider multiple use-cases during the NoC design process. We present a method to ef ciently map the applications on to the NoC architecture, satisfying the design constraints of each individual use-case. We also present novel ways to dynamically recon- gure the network across the different use-cases and explore the possibility of integrating Dynamic Voltage and Frequency Scaling (DVS/DFS) techniques with the use-case centric NoC design methodology. We validate the performance of the design methodology on several SoC applications. The dynamic recon guration of the NoC integrated with DVS/DFS schemes results in large power savings for the resulting NoC systems
    corecore