48 research outputs found

    Integrated shared-memory and message-passing communication in the Alewife multiprocessor

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1998.Includes bibliographical references (p. 237-246) and index.by John David Kubiatowicz.Ph.D

    Comparison of multi-layer bus interconnection and a network on chip solution

    Get PDF
    Abstract. This thesis explains the basic subjects that are required to take in consideration when designing a network on chip solutions in the semiconductor world. For example, general topologies such as mesh, torus, octagon and fat tree are explained. In addition, discussion related to network interfaces, switches, arbitration, flow control, routing, error avoidance and error handling are provided. Furthermore, there is discussion related to design flow, a computer aided designing tools and a few comprehensive researches. However, several networks are designed for the minimum latency, although there are also versions which trade performance for decreased bus widths. These designed networks are compared with a corresponding multi-layer bus interconnection and both synthesis and register transfer level simulations are run. For example, results from throughput, latency, logic area and power consumptions are gathered and compared. It was discovered that overall throughput was well balanced with the network on chip solutions, although its maximum throughput was limited by protocol conversions. For example, the multi-layer bus interconnection was capable of providing a few times smaller latencies and higher throughputs when only a single interface was injected at the time. However, with parallel traffic and high-performance requirements a network on chip solution provided better results, even though the difference decreased when performance requirements were lower. Furthermore, it was discovered that the network on chip solutions required approximately 3–4 times higher total cell area than the multi-layer bus interconnection and that resources were mainly located at network interfaces and switches. In addition, power consumption was approximately 2–3 times higher and was mostly caused by dynamic consumption.Monitasoisen väyläarkkitehtuurin ja tietokoneverkkomaisen ratkaisun vertailua. Tiivistelmä. Tutkielmassa käsitellään tärkeimpiä aihealueita, jotka tulee huomioida suunniteltaessa tietokoneverkkomaisia väyläratkaisuja puolijohdemaailmassa. Esimerkiksi yleiset rakenteet, kuten verkko-, torus-, kahdeksankulmio- ja puutopologiat käsitellään lyhyesti. Lisäksi alustetaan verkon liitäntäkohdat, kytkimet, vuorottelu, vuon hallinta, reititys, virheiden välttely ja -käsittely. Lopuksi kerrotaan suunnitteluvuon oleellisimmat välivaiheet ja niihin soveltuvia kaupallisia työkaluja, sekä käsitellään lyhyesti muutaman aiemman julkaisun tuloksia. Tutkielmassa käytetään suunnittelutyökalua muutaman tietokoneverkkomaisen ratkaisun toteutukseen ja tavoitteena on saavuttaa pienin mahdollinen latenssi. Toisaalta myös hieman suuremman latenssin versioita suunnitellaan, mutta pienemmillä väylänleveyksillä. Lisäksi suunniteltuja tietokoneverkkomaisia ratkaisuja vertaillaan perinteisempään monitasoiseen väyläarkkitehtuuriin. Esimerkiksi synteesi- ja simulaatiotuloksia, kuten logiikan vaatimaa pinta-alaa, tehonkulutusta, latenssia ja suorituskykyä, vertaillaan keskenään. Tutkielmassa selvisi, että suunnittelutyökalulla toteutetut tietokoneverkkomaiset ratkaisut mahdollistivat tasaisemman suorituskyvyn, joskin niiden suurin saavutettu suorituskyky ja pienin latenssi määräytyivät protokollan käännöksen aiheuttamasta viiveestä. Tutkielmassa havaittiin, että perinteisemmillä menetelmillä saavutettiin noin kaksi kertaa suurempi suorituskyky ja pienempi latenssi, kun verkossa ei ollut muuta liikennettä. Rinnakkaisen liikenteen lisääntyessä tietokoneverkkomainen ratkaisu tarjosi keskimäärin paremman suorituskyvyn, kun sille asetetut tehokkuusvaateet olivat suuret, mutta suorituskykyvaatimuksien laskiessa erot kapenivat. Lisäksi huomattiin, että tietokoneverkkomaisten ratkaisujen käyttämä pinta-ala oli noin 3–4 kertaa suurempi kuin monitasoisella väyläarkkitehtuurilla ja että resurssit sijaitsivat enimmäkseen verkon liittymäkohdissa ja kytkimissä. Lisäksi tehonkulutuksen huomattiin olevan noin 2–3 kertaa suurempi, joskin sen havaittiin koostuvan pääosin dynaamisesta kulutuksesta

    Design for manufacturability : a feature-based agent-driven approach

    Get PDF

    The MANGO clockless network-on-chip: Concepts and implementation

    Get PDF

    Modeling, Design And Evaluation Of Networking Systems And Protocols Through Simulation

    Get PDF
    Computer modeling and simulation is a practical way to design and test a system without actually having to build it. Simulation has many benefits which apply to many different domains: it reduces costs creating different prototypes for mechanical engineers, increases the safety of chemical engineers exposed to dangerous chemicals, speeds up the time to model physical reactions, and trains soldiers to prepare for battle. The motivation behind this work is to build a common software framework that can be used to create new networking simulators on top of an HLA-based federation for distributed simulation. The goals are to model and simulate networking architectures and protocols by developing a common underlying simulation infrastructure and to reduce the time a developer has to learn the semantics of message passing and time management to free more time for experimentation and data collection and reporting. This is accomplished by evolving the simulation engine through three different applications that model three different types of network protocols. Computer networking is a good candidate for simulation because of the Internet\u27s rapid growth that has spawned off the need for new protocols and algorithms and the desire for a common infrastructure to model these protocols and algorithms. One simulation, the 3DInterconnect simulator, simulates data transmitting through a hardware k-array n-cube network interconnect. Performance results show that k-array n-cube topologies can sustain higher traffic load than the currently used interconnects. The second simulator, Cluster Leader Logic Algorithm Simulator, simulates an ad-hoc wireless routing protocol that uses a data distribution methodology based on the GPS-QHRA routing protocol. CLL algorithm can realize a maximum of 45% power savings and maximum 25% reduced queuing delay compared to GPS-QHRA. The third simulator simulates a grid resource discovery protocol for helping Virtual Organizations to find resource on a grid network to compute or store data on. Results show that worst-case 99.43% of the discovery messages are able to find a resource provider to use for computation. The simulation engine was then built to perform basic HLA operations. Results show successful HLA functions including creating, joining, and resigning from a federation, time management, and event publication and subscription

    2-wire time independent asynchronous communications

    Get PDF
    Communications both to and between low end microprocessors represents a real cost in a number of industrial and consumer products. This thesis starts by examining the properties of protocols that help to minimize these expenses and comes to the conclusion that the derived set of properties define a new category of communications protocol : Time Independent Asynchronous ( TIA) communications. To show the utility of the TIA category we develop a novel TIA protocol that uses only 2-wires and general IO pins on each host. The protocol is analyzed using the Petri net based STG ( Signal Transition Graph) which is widely use to model asynchronous logic. It is shown that STGs do not accurately model the behavior of software driven systems and so a modified form called STG-FT ( STG For Threads) is developed to better model software systems. A simulator is created to take an STG-FT model and perform a full reachability tree analysis to prove correctness and analyze livelock and deadlock properties. The simulator can also examine the full reachability tree for every possible system state ( the cross product of all sub-system states), and analyze deadlock and livelock issues related to unexpected inputs and unusual situations. Reachability pruning algorithms are developed which decrease the search tree by a factor of approximately 250 million. The 2-wire protocol is implemented between a PC and an Atmel Tiny26 microprocessor, there is also a variant that works between microprocessors. Testing verifies the simulation results including an avoidable livelock condition with data throughput peaking at a useful 50 kilobits/second in both directions. The first practical application of 2-wire TIA is part of a novel debugger for the Atmel Tiny26 microprocessor. The approach can be extended to any microprocessor with general IO pins. TIA communications, developed in this thesis, is a serious contender whenever low end microprocessors must communicate with other processors. Consumer and industrial products may be able to achieve cost saving by using this new protocol

    Atomic Transfer for Distributed Systems

    Get PDF
    Building applications and information systems increasingly means dealing with concurrency and faults stemming from distribution of system components. Atomic transactions are a well-known method for transferring the responsibility for handling concurrency and faults from developers to the software\u27s execution environment, but incur considerable execution overhead. This dissertation investigates methods that shift some of the burden of concurrency control into the network layer, to reduce response times and increase throughput. It anticipates future programmable network devices, enabling customized high-performance network protocols. We propose Atomic Transfer (AT), a distributed algorithm to prevent race conditions due to messages crossing on a path of network switches. Switches check request messages for conflicts with response messages traveling in the opposite direction. Conflicting requests are dropped, obviating the request\u27s receiving host from detecting and handling the conflict. AT is designed to perform well under high data contention, as concurrency control effort is balanced across a network instead of being handled by the contended endpoint hosts themselves. We use AT as the basis for a new optimistic transactional cache consistency algorithm, supporting execution of atomic applications caching shared data. We then present a scalable refinement, allowing hierarchical consistent caches with predictable performance despite high data update rates. We give detailed I/O Automata models of our algorithms along with correctness proofs. We begin with a simplified model, assuming static network paths and no message loss, and then refine it to support dynamic network paths and safe handling of message loss. We present a trie-based data structure for accelerating conflict-checking on switches, with benchmarks suggesting the feasibility of our approach from a performance stand-point

    Simulation Modelling of Distributed-Shared Memory Multiprocessors

    Get PDF
    Institute for Computing Systems ArchitectureDistributed shared memory (DSM) systems have been recognised as a compelling platform for parallel computing due to the programming advantages and scalability. DSM systems allow applications to access data in a logically shared address space by abstracting away the distinction of physical memory location. As the location of data is transparent, the sources of overhead caused by accessing the distant memories are difficult to analyse. This memory locality problem has been identified as crucial to DSM performance. Many researchers have investigated the problem using simulation as a tool for conducting experiments resulting in the progressive evolution of DSM systems. Nevertheless, both the diversity of architectural configurations and the rapid advance of DSM implementations impose constraints on simulation model designs in two issues: the limitation of the simulation framework on model extensibility and the lack of verification applicability during a simulation run causing the delay in verification process. This thesis studies simulation modelling techniques for memory locality analysis of various DSM systems implemented on top of a cluster of symmetric multiprocessors. The thesis presents a simulation technique to promote model extensibility and proposes a technique for verification applicability, called a Specification-based Parameter Model Interaction (SPMI). The proposed techniques have been implemented in a new interpretation-driven simulation called DSiMCLUSTER on top of a discrete event simulation (DES) engine known as HASE. Experiments have been conducted to determine which factors are most influential on the degree of locality and to determine the possibility to maximise the stability of performance. DSiMCLUSTER has been validated against a SunFire 15K server and has achieved similarity of cache miss results, an average of +-6% with the worst case less than 15% of difference. These results confirm that the techniques used in developing the DSiMCLUSTER can contribute ways to achieve both (a) a highly extensible simulation framework to keep up with the ongoing innovation of the DSM architecture, and (b) the verification applicability resulting in an efficient framework for memory analysis experiments on DSM architecture

    Lock Inference for Java

    No full text
    Atomicity is an important property for concurrent software, as it provides a stronger guarantee against errors caused by unanticipated thread interactions than race-freedom does. However, concurrency control in general is tricky to get right because current techniques are too low-level and error-prone. With the introduction of multicore processors, the problems are compounded. Consequently, a new software abstraction is gaining popularity to take care of concurrency control and the enforcing of atomicity properties, called atomic sections. One possible implementation of their semantics is to acquire a global lock upon entry to each atomic section, ensuring that they execute in mutual exclusion. However, this cripples concurrency, as non-interfering atomic sections cannot run in parallel. Transactional memory is another automated technique for providing atomicity, but relies on the ability to rollback conflicting atomic sections and thus places restrictions on the use of irreversible operations, such as I/O and system calls, or serialises all sections that use such features. Therefore, from a language designer's point of view, the challenge is to implement atomic sections without compromising performance or expressivity. This thesis explores the technique of lock inference, which infers a set of locks for each atomic section, while attempting to balance the requirements of maximal concurrency, minimal locking overhead and freedom from deadlock. We focus on lock-inference techniques for tackling large Java programs that make use of mature libraries. This improves upon existing work, which either (i) ignores libraries, (ii) requires library implementors to annotate which locks to take, or (iii) only considers accesses performed up to one-level deep in library call chains. As a result, each of these prior approaches may result in atomicity violations. This is a problem because even simple uses of I/O in Java programs can involve large amounts of library code. Our approach is the first to analyse library methods in full and thus able to soundly handle atomic sections involving complicated real-world side effects, while still permitting atomic sections to run concurrently in cases where their lock sets are disjoint. To validate our claims, we have implemented our techniques in Lockguard, a fully automatic tool that translates Java bytecode containing atomic sections to an equivalent program that uses locks instead. We show that our techniques scale well and despite protecting all library accesses, we obtain performance comparable to the original locking policy of our benchmarks

    Performance Optimization Strategies for Transactional Memory Applications

    Get PDF
    This thesis presents tools for Transactional Memory (TM) applications that cover multiple TM systems (Software, Hardware, and hybrid TM) and use information of all different layers of the TM software stack. Therefore, this thesis addresses a number of challenges to extract static information, information about the run time behavior, and expert-level knowledge to develop these new methods and strategies for the optimization of TM applications
    corecore