Search CORE

53 research outputs found

Dynamic Information Flow Tracking on Multicores

Author: Gupta Rajiv
Kim Ho-Seop
Nagarajan Vijay
Wu Youfeng
Publication venue
Publication date: 01/01/2008
Field of study

Dynamic Information Flow Tracking (DIFT) is a promising technique for detecting software attacks. Due to the computationally intensive nature of the technique, prior efficient implementations [21, 6] rely on specialized hardware support whose only purpose is to enable DIFT. Alternatively, prior software implementations are either too slow [17, 15] resulting in execution time increases as much as four fold for SPEC integer programs or they are not transparent [31] requiring source code modifications. In this paper, we propose the use of chip multiprocessors (CMP) to perform DIFT transparently and efficiently. We spawn a helper thread that is scheduled on a separate core and is only responsible for performing information flow tracking operations. This entails the communication of registers and flags between the main and helper threads. We explore software (shared memory) and hardware (dedicated interconnect) approaches to enable this communication. Finally, we propose a novel application of the DIFT infrastructure where, in addition to the detection of the software attack, DIFT assists in the process of identifying the cause of the bug in the code that enabled the exploit in the first place. We conducted detailed simulations to evaluate the overhead for performing DIFT and found that to be 48 % for SPEC integer programs

CiteSeerX

Edinburgh Research Explorer

Tiled microprocessors

Author: Taylor Michael Bedford, 1975-
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2007
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2007.Includes bibliographical references (p. 251-258).Current-day microprocessors have reached the point of diminishing returns due to inherent scalability limitations. This thesis examines the tiled microprocessor, a class of microprocessor which is physically scalable but inherits many of the desirable properties of conventional microprocessors. Tiled microprocessors are composed of an array of replicated tiles connected by a special class of network, the Scalar Operand Network (SON), which is optimized for low-latency, low-occupancy communication between remote ALUs on different tiles. Tiled microprocessors can be constructed to scale to 100's or 1000's of functional units. This thesis identifies seven key criteria for achieving physical scalability in tiled microprocessors. It employs an archetypal tiled microprocessor to examine the challenges in achieving these criteria and to explore the properties of Scalar Operand Networks. The thesis develops the field of SONs in three major ways: it introduces the 5-tuple performance metric, it describes a complete, high-frequency SON implementation, and it proposes a taxonomy, called AsTrO, for categorizing them.(cont.) To develop these ideas, the thesis details the design, implementation and analysis of a tiled microprocessor prototype, the Raw Microprocessor, which was implemented at MIT in 180 nm technology. Overall, compared to Raw, recent commercial processors with half the transistors required 30x as many lines of code, occupied 100x as many designers, contained 50x as many pre-tapeout bugs, and resulted in 33x as many post-tapeout bugs. At the same time, the Raw microprocessor proves to be more versatile in exploiting ILP, stream, and server-farm workloads with modest to large amounts of parallelism.by Michael Bedford Taylor.Ph.D

CiteSeerX

DSpace@MIT

A Virtual Channel Network-on-Chip for GT and BE traffic

Author: Jansen Pierre G.
Kavaldjiev Nikolay
Smit Gerard J.M.
Wolkotte Pascal T.
Publication venue: Centre for Telematics and Information Technology (CTIT)
Publication date: 01/01/2005
Field of study

This paper presents an on-chip network for a run-time reconfigurable System-on-Chip. The network uses packet-switching with virtual channels. It can provide guaranteed services as well as best effort services. The guaranteed services are based on virtual channel allocation, in contrast to other on-chip networks where guarantees are provided by time-division multiplexing. The network is particularly suitable for systems in which the traffic is dominated by streams. We model the data traffic in the system and simulate the behaviour of the network with this model. The results show that the network is capable of handling the system traffic and can provide the required guarantees

University of Twente Research Information

Recursive partitioning multicast: a bandwidth-efficient routing for networks-on-chip

Author: Eun Jung Kim
Hyungjun Kim
Lei Wang
Yuho Jin
Publication venue
Publication date: 01/01/2009
Field of study

Chip Multi-processor (CMP) architectures have become mainstream for designing processors. With a large number of cores, Networks-on-Chip (NOCs) provide a scalable communication method for CMP architectures. NOCs must be carefully designed to meet constraints of power consumption and area, and provide ultra low latencies. Existing NOCs mostly use Dimension Order Routing (DOR) to determine the route taken by a packet in unicast traffic. However, with the development of diverse applications in CMPs, one-to-many (multicast) and one-to-all (broadcast) traffic are becoming more common. Current unicast routing cannot support multicast and broadcast traffic efficiently. In this paper, we propose Recursive Partitioning Multicast (RPM) routing and a detailed multicast wormhole router design for NOCs. RPM allows routers to select intermediate replication nodes based on the global distribution of destination nodes. This provides more path diversities, thus achieves more bandwidth-efficiency and finally improves the performance of the whole network. Our simulation results using a detailed cycle-accurate simulator show that compared with the most recent multicast scheme, RPM saves 25 % of crossbar and link power, and 33 % of link utilization with 50 % network performance improvement. Also RPM is more scalable to large networks than the recently proposed VCTM. 1

CiteSeerX

Crossref

Recommended from our members

Automatically accelerating non-numerical programs by architecture-compiler co-design

Author: Brooks D
Brownell K
Campanoni S
Jones TM
Kanev S
Wei GY
Publication venue: Communications of the ACM
Publication date: 01/01/2017
Field of study

Because of the high cost of communication between processors, compilers that parallelize loops automatically have been forced to skip a large class of loops that are both critical to performance and rich in latent parallelism. HELIX-RC is a compiler/microprocessor co-design that opens those loops to parallelization by decoupling communication from thread execution in conventional multicore architecures. Simulations of HELIX-RC, applied to a processor with 16 Intel Atom-like cores, show an average of 6.85× performance speedup for six SPEC CINT2000 benchmarks.</jats:p

Apollo (Cambridge)

HELIX-RC

Author: Brooks David
Brownell Kevin
Campanoni Simone
Jones Timothy M
Kanev Svilen
Wei Gu-Yeon
Publication venue: ACM SIGARCH Computer Architecture News
Publication date: 14/06/2014
Field of study

Data dependences in sequential programs limit parallelization because extracted threads cannot run independently. Although thread-level speculation can avoid the need for precise dependence analysis, communication overheads required to synchronize actual dependences counteract the benefits of parallelization. To address these challenges, we propose a lightweight architectural enhancement co-designed with a parallelizing compiler, which together can decouple communication from thread execution. Simulations of these approaches, applied to a processor with 16 Intel Atom-like cores, show an average of 6.85x performance speedup for six SPEC CINT2000 benchmarksThis work was possible thanks to the sponsorship of the Royal Academy of Engineering, EPSRC and the National Science Foundation (award number IIS-0926148).This is the accepted manuscript. The final version is available from IEEE and ACM at http://dl.acm.org/citation.cfm?doid=2678373.2665705

Crossref

Apollo (Cambridge)

A Parallel Routing Algorithm for Torus NoCs

Author: Abderezak Touzene
Bassel Arafeh
Khaled Day
Nasser Alzeidi
Publication venue
Publication date: 11/04/2020
Field of study

Abstract. This paper proposes a parallel routing algorithm for routing multiple data streams over disjoint paths in the torus Network-on-Chip (NoC) architecture. We show how to construct a maximal set of disjoint paths between any two nodes of a torus network topology and then make use of the constructed paths for the simultaneous routing of multiple data streams between these nodes. Analytical performance evaluation results are obtained showing the effectiveness of the proposed parallel routing algorithm in reducing communication delays and increasing throughput when transferring large amounts of data in an NoC-based multi-core system

CiteSeerX

Microarchitectural wire management for performance and power in partitioned architectures

Author: Balasubramonian Rajeev
Muralimanohar Naveen
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2005
Field of study

Journal ArticleFuture high-performance billion-transistor processors are likely to employ partitioned architectures to achieve high clock speeds, high parallelism, low design complexity, and low power. In such architectures, inter-partition communication over global wires has a significant impact on overall processor performance and power consumption. VLSI techniques allow a variety of wire implementations, but these wire properties have previously never been exposed to the microarchitecture. This paper advocates global wire management at the microarchitecture level and proposes a heterogeneous interconnect that is comprised of wires with varying latency, bandwidth, and energy characteristics. We propose and evaluate microarchitectural techniques that can exploit such a heterogeneous interconnect to improve performance and reduce energy consumption. These techniques include a novel cache pipeline design, the identification of narrow bit-width operands, the classification of non-critical data, and the detection of interconnect load imbalance. For a dynamically scheduled partitioned architecture, our results demonstrate that the proposed innovations result in up to 11% reductions in overall processor ED2, compared to a baseline processor that employs a homogeneous interconnect

The University of Utah: J. Willard Marriott Digital Library

LBDR: An efficient unicast routing support for CMPs

Author: Rodrigo Mocholí Samuel
Publication venue: 'Universitat Politecnica de Valencia'
Publication date: 23/11/2011
Field of study

LBDR is a routing distributed layer based on minimum logic that removes the need for routing tables at switches on network-on-chips (NoCs) in CMPs and enables the implementation of many routing algorithms on most of regular and irregular toplogies we may find in the near future in a multi-core system.Rodrigo Mocholí, S. (2008). LBDR: An efficient unicast routing support for CMPs. http://hdl.handle.net/10251/13476Archivo delegad

RiuNet

Mapping and Configuration Methods for Multi-Use-Case Networks on Chips

Author: Coenen Martijn
De Micheli Giovanni
Goossens Kees
Murali Srinivasan
Radulescu Andrei
Publication venue
Publication date: 01/01/2006
Field of study

To provide a scalable communication infrastructure for Systems on Chips (SoCs), Networks on Chips (NoCs), a communication centric design paradigm is needed. To be cost effective, SoCs are often programmable and integrate several different applications or use-cases on to the same chip. For the SoC platform to support the different use-cases, the NoC architecture should satisfy the performance constraints of each individual use-case. In this work we motivate the need to consider multiple use-cases during the NoC design process. We present a method to ef ciently map the applications on to the NoC architecture, satisfying the design constraints of each individual use-case. We also present novel ways to dynamically recon- gure the network across the different use-cases and explore the possibility of integrating Dynamic Voltage and Frequency Scaling (DVS/DFS) techniques with the use-case centric NoC design methodology. We validate the performance of the design methodology on several SoC applications. The dynamic recon guration of the NoC integrated with DVS/DFS schemes results in large power savings for the resulting NoC systems

Infoscience - École polytechnique fédérale de Lausanne

Crossref