160 research outputs found

    Data routing in multicore processors using dimension increment method

    Full text link
    A Deadlock-free routing algorithm can be generated for arbitrary interconnection network using the concept of virtual channels but the virtual channels will lead to more complex algorithms and more demands of NOC resource. In this thesis, we study a Torus topology for NOC application, design its structure and propose a routing algorithm exploiting the characteristics of NOC. We have chosen a typical 16 (4 by 4) routers Torus and propose the corresponding route algorithm. In our algorithm, all the channels are assigned 4 different dimensions (n0,n1,n2 & n3). By following the dimension increment method, we break the dependent route circles, and avoid dead lock and live-lock and avoid the overhead of virtual channels. Xilinx offers two soft core processors, namely Picoblaze and Microblaze. The Picoblaze processor is 8-bit configurable processor core. These soft processor cores offer designers tremendous flexibility during the design process, allowing the designers to configure the processor to meet the needs of their systems (e.g., adding custom instructions or including/excluding particular data path coprocessors) and to quickly integrate the processor within any FPGA. Unlike single chip Microprocessor/FPGA systems using hard-core processors, soft processor cores allow designers to incorporate varying numbers of processors within a single FPGA design depending on an application’s needs.Soft processor cores implemented using FPGAs typically have higher power consumption and decreased performance compared with hard-core processors. Key features of the Picoblaze processor, as well as other soft processor cores, include the user configurable options that allow a designer to tailor the processor’s functionality to their specific design. The proposed design implements sixteen instances of a soft processor, Picoblaze, connected in a torus topology. Data is passed from one processor to another employing a routing algorithm which is based on dimension increment method. Thus we design an NOC with multiple microcontrollers and related logic, synthesize the process and test its performance in a simulation environment

    Adding Executable Context to Executable Architectures: Enabling an Executable Context Simulation Framework (ECSF)

    Get PDF
    A system that does not stand alone is represented by a complex entity of component combinations that interact with each other to execute a function. In today\u27s interconnected world, systems integrate with other systems - called a system-of-systems infrastructure: a network of interrelated systems that can often exhibit both predictable and unpredictable behavior. The current state-of-the-art evaluation process of these system-of-systems and their community of practitioners in the academic community are limited to static methods focused on defining who is doing what and where. However, to answer the questions of why and how a system operates within complex systems-of-systems interrelationships, a system\u27s architecture and context must be observed over time, its executable architecture, to discern effective predictable and unpredictable behavior. The objective of this research is to determine a method for evaluating a system\u27s executable architecture and assess the contribution and efficiency of the specified system before it is built. This research led to the development of concrete steps that synthesize the observance of the executable architecture, assessment recommendations provided by the North Atlantic Treaty Organization (NATO) Code of Best Practice for Command and Control (C2) Assessment, and the metrics for operational efficiency provided by the Military Missions and Means Framework. Based on the research herein, this synthesis is designed to evaluate and assess system-of-systems architectures in their operational context to provide quantitative results

    Embedded dynamic programming networks for networks-on-chip

    Get PDF
    PhD ThesisRelentless technology downscaling and recent technological advancements in three dimensional integrated circuit (3D-IC) provide a promising prospect to realize heterogeneous system-on-chip (SoC) and homogeneous chip multiprocessor (CMP) based on the networks-onchip (NoCs) paradigm with augmented scalability, modularity and performance. In many cases in such systems, scheduling and managing communication resources are the major design and implementation challenges instead of the computing resources. Past research efforts were mainly focused on complex design-time or simple heuristic run-time approaches to deal with the on-chip network resource management with only local or partial information about the network. This could yield poor communication resource utilizations and amortize the benefits of the emerging technologies and design methods. Thus, the provision for efficient run-time resource management in large-scale on-chip systems becomes critical. This thesis proposes a design methodology for a novel run-time resource management infrastructure that can be realized efficiently using a distributed architecture, which closely couples with the distributed NoC infrastructure. The proposed infrastructure exploits the global information and status of the network to optimize and manage the on-chip communication resources at run-time. There are four major contributions in this thesis. First, it presents a novel deadlock detection method that utilizes run-time transitive closure (TC) computation to discover the existence of deadlock-equivalence sets, which imply loops of requests in NoCs. This detection scheme, TC-network, guarantees the discovery of all true-deadlocks without false alarms in contrast to state-of-the-art approximation and heuristic approaches. Second, it investigates the advantages of implementing future on-chip systems using three dimensional (3D) integration and presents the design, fabrication and testing results of a TC-network implemented in a fully stacked three-layer 3D architecture using a through-silicon via (TSV) complementary metal-oxide semiconductor (CMOS) technology. Testing results demonstrate the effectiveness of such a TC-network for deadlock detection with minimal computational delay in a large-scale network. Third, it introduces an adaptive strategy to effectively diffuse heat throughout the three dimensional network-on-chip (3D-NoC) geometry. This strategy employs a dynamic programming technique to select and optimize the direction of data manoeuvre in NoC. It leads to a tool, which is based on the accurate HotSpot thermal model and SystemC cycle accurate model, to simulate the thermal system and evaluate the proposed approach. Fourth, it presents a new dynamic programming-based run-time thermal management (DPRTM) system, including reactive and proactive schemes, to effectively diffuse heat throughout NoC-based CMPs by routing packets through the coolest paths, when the temperature does not exceed chip’s thermal limit. When the thermal limit is exceeded, throttling is employed to mitigate heat in the chip and DPRTM changes its course to avoid throttled paths and to minimize the impact of throttling on chip performance. This thesis enables a new avenue to explore a novel run-time resource management infrastructure for NoCs, in which new methodologies and concepts are proposed to enhance the on-chip networks for future large-scale 3D integration.Iraqi Ministry of Higher Education and Scientific Research (MOHESR)

    Simulation of packet and cell-based communication networks

    Get PDF
    This thesis investigates, using simulation techniques, the practical aspects of implementing a novel mobility protocol on the emerging Broadband Integrated Services Digital Network standard. The increasing expansion of telecommunications networks has meant that the demand for simulation has increased rapidly in recent years; but conventional simulators are slow and developments in the communications field are outstripping the ability of sequential uni-processor simulators. Newer techniques using distributed simulation on a multi-processor network are investigated in an attempt to make a cell-level simulation of a non-trivial B.-I.S.D.N. network feasible. The current state of development of the Asynchronous Transfer Mode standard, which will be used to implement a B.-I.S.D.N., is reviewed and simulation studies of the Orwell Slotted Ring protocol were made in an attempt to devise a simpler model for use in the main simulator. The mobility protocol, which uses a footprinting technique to simplify hand- offs by distributing information about a connexion to surrounding base stations, was implemented on the simulator and found to be functional after a few 'special case' scenarios had been catered for

    Integrated shared-memory and message-passing communication in the Alewife multiprocessor

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1998.Includes bibliographical references (p. 237-246) and index.by John David Kubiatowicz.Ph.D

    Language independent modelling of parallelism

    Get PDF
    To make programs work in parallel contexts without any hazards, programming languages require changes to their structures and compilers. One of the most complicated parts is memory models and how programming languages deal with memory interactions. Different processors provide a different level of safety guarantees (i.e. ARM provides relaxed whereas Intel provides strong guarantees). On the other hand, different programming languages provide different structures for parallel computation and have individual protocols for communicating with parallel processes. Unfortunately, no specific choice is best in all situations. This thesis focuses on memory models of various programming languages and processors highlighting some positive and negative features from the point of view of programmability, performance and portability. In order to give some evidence of problems and performance bottlenecks, some small programs have been developed. This thesis also concentrates on incorrect behaviors, especially on data race conditions in programs, providing suggestions on how to avoid them. Also, some litmus tests on systems featuring different vendors' processors were performed to observe data races on each system. Nowadays programming paradigms also became a big issue. Some of the programming styles support observable non-determinism which is the main reason for incorrect behavior in programs. In this thesis, different programming models are also discussed based on the current state of the available research. Also, the imperative and functional paradigms in different contexts are compared. Finally, a mathematical problem was solved using two different paradigms to provide some practical evidence of the theory

    Off-chip Communications Architectures For High Throughput Network Processors

    Get PDF
    In this work, we present off-chip communications architectures for line cards to increase the throughput of the currently used memory system. In recent years there is a significant increase in memory bandwidth demand on line cards as a result of higher line rates, an increase in deep packet inspection operations and an unstoppable expansion in lookup tables. As line-rate data and NPU processing power increase, memory access time becomes the main system bottleneck during data store/retrieve operations. The growing demand for memory bandwidth contrasts the notion of indirect interconnect methodologies. Moreover, solutions to the memory bandwidth bottleneck are limited by physical constraints such as area and NPU I/O pins. Therefore, indirect interconnects are replaced with direct, packet-based networks such as mesh, torus or k-ary n-cubes. We investigate multiple k-ary n-cube based interconnects and propose two variations of 2-ary 3-cube interconnect called the 3D-bus and 3D-mesh. All of the k-ary n-cube interconnects include multiple, highly efficient techniques to route, switch, and control packet flows in order to minimize congestion spots and packet loss. We explore the tradeoffs between implementation constraints and performance. We also developed an event-driven, interconnect simulation framework to evaluate the performance of packet-based off-chip k-ary n-cube interconnect architectures for line cards. The simulator uses the state-of-the-art software design techniques to provide the user with a flexible yet robust tool, that can emulate multiple interconnect architectures under non-uniform traffic patterns. Moreover, the simulator offers the user with full control over network parameters, performance enhancing features and simulation time frames that make the platform as identical as possible to the real line card physical and functional properties. By using our network simulator, we reveal the best processor-memory configuration, out of multiple configurations, that achieves optimal performance. Moreover, we explore how network enhancement techniques such as virtual channels and sub-channeling improve network latency and throughput. Our performance results show that k-ary n-cube topologies, and especially our modified version of 2-ary 3-cube interconnect - the 3D-mesh, significantly outperform existing line card interconnects and are able to sustain higher traffic loads. The flow control mechanism proved to extensively reduce hot-spots, load-balance areas of high traffic rate and achieve low transmission failure rate. Moreover, it can scale to adopt more memories and/or processors and as a result to increase the line card\u27s processing power

    Exploring Adaptive Implementation of On-Chip Networks

    Get PDF
    As technology geometries have shrunk to the deep submicron regime, the communication delay and power consumption of global interconnections in high performance Multi- Processor Systems-on-Chip (MPSoCs) are becoming a major bottleneck. The Network-on- Chip (NoC) architecture paradigm, based on a modular packet-switched mechanism, can address many of the on-chip communication issues such as performance limitations of long interconnects and integration of large number of Processing Elements (PEs) on a chip. The choice of routing protocol and NoC structure can have a significant impact on performance and power consumption in on-chip networks. In addition, building a high performance, area and energy efficient on-chip network for multicore architectures requires a novel on-chip router allowing a larger network to be integrated on a single die with reduced power consumption. On top of that, network interfaces are employed to decouple computation resources from communication resources, to provide the synchronization between them, and to achieve backward compatibility with existing IP cores. Three adaptive routing algorithms are presented as a part of this thesis. The first presented routing protocol is a congestion-aware adaptive routing algorithm for 2D mesh NoCs which does not support multicast (one-to-many) traffic while the other two protocols are adaptive routing models supporting both unicast (one-to-one) and multicast traffic. A streamlined on-chip router architecture is also presented for avoiding congested areas in 2D mesh NoCs via employing efficient input and output selection. The output selection utilizes an adaptive routing algorithm based on the congestion condition of neighboring routers while the input selection allows packets to be serviced from each input port according to its congestion level. Moreover, in order to increase memory parallelism and bring compatibility with existing IP cores in network-based multiprocessor architectures, adaptive network interface architectures are presented to use multiple SDRAMs which can be accessed simultaneously. In addition, a smart memory controller is integrated in the adaptive network interface to improve the memory utilization and reduce both memory and network latencies. Three Dimensional Integrated Circuits (3D ICs) have been emerging as a viable candidate to achieve better performance and package density as compared to traditional 2D ICs. In addition, combining the benefits of 3D IC and NoC schemes provides a significant performance gain for 3D architectures. In recent years, inter-layer communication across multiple stacked layers (vertical channel) has attracted a lot of interest. In this thesis, a novel adaptive pipeline bus structure is proposed for inter-layer communication to improve the performance by reducing the delay and complexity of traditional bus arbitration. In addition, two mesh-based topologies for 3D architectures are also introduced to mitigate the inter-layer footprint and power dissipation on each layer with a small performance penalty.Siirretty Doriast
    • …
    corecore