Search CORE

25 research outputs found

Recommended from our members

Performance modelling and evaluation of heterogeneous wired / wireless networks under Bursty Traffic. Analytical models for performance analysis of communication networks in multi-computer systems, multi-cluster systems, and integrated wireless systems.

Author: Yulei W.U.
Publication venue: School of Computing, Informatics and Media
Publication date: 01/01/2010
Field of study

Computer networks can be classified into two broad categories: wired networks and wireless networks, according to the hardware and software technologies used to interconnect the individual devices. Wired interconnection networks are hardware fabrics supporting communications between individual processors in highperformance computing systems (e.g., multi-computer systems and cluster systems). On the other hand, due to the rapid development of wireless technologies, wireless networks have emerged and become an indispensable part for people¿s lives. The integration of different wireless technologies is an effective approach to accommodate the increasing demand of the users to communicate with each other and access the Internet. This thesis aims to investigate the performance of wired interconnection networks and integrated wireless networks under the realistic working conditions. Traffic patterns have a significant impact on network performance. A number of recent measurement studies have convincingly demonstrated that the traffic generated by many real-world applications in communication networks exhibits bursty arrival nature and the message destinations are non-uniformly distributed. Analytical models for the performance evaluation of wired interconnection networks and integrated wireless networks have been widely reported. However, most of these models are developed under the simplified assumption of non-bursty Poisson process with uniformly distributed message destinations. To fill this gap, this thesis first presents an analytical model to investigate the performance of wired interconnection networks in multi-computer systems. Secondly, the analytical models for wired interconnection networks in multi-cluster systems are developed. Finally, this thesis proposes analytical models to evaluate the end-to-end delay and throughput of integrated wireless local area networks and wireless mesh networks. These models are derived when the networks are subject to bursty traffic with non-uniformly distributed message destinations which can capture the burstiness of real-world network traffic in the both temporal domain and spatial domain. Extensive simulation experiments are conducted to validate the accuracy of the analytical models. The models are then used as practical and cost-effective tools to investigate the performance of heterogeneous wired or wireless networks under the traffic patterns exhibited by real-world applications

Bradford Scholars

Data recovery in wormhole routing networks in hypercubes and meshes

Author: Alowayed Mohammad S.
Publication venue
Publication date: 01/12/1997
Field of study

SHAREOK repository

Novel techniques in large scaleable ATM switches

Author: Lawrence M.A.
Publication venue: Department of Electrical Engineering
Publication date: 01/01/2000
Field of study

Bibliography: p. 172-178.This dissertation explores the research area of large scale ATM switches. The requirements for an ATM switch are determined by overviewing the ATM network architecture. These requirements lead to the discussion of an abstract ATM switch which illustrates the components of an ATM switch that automatically scale with increasing switch size (the Input Modules and Output Modules) and those that do not (the Connection Admission Control and Switch Management systems as well as the Cell Switch Fabric). An architecture is suggested which may result in a scalable Switch Management and Connection Admission Control function. However, the main thrust of the dissertation is confined to the cell switch fabric. The fundamental mathematical limits of ATM switches and buffer placement is presented next emphasising the desirability of output buffering. This is followed by an overview of the possible routing strategies in a multi-stage interconnection network. A variety of space division switches are then considered which leads to a discussion of the hypercube fabric, (a novel switching technique). The hypercube fabric achieves good performance with an O(N.log₂N)²) scaling. The output module, resequencing, cell scheduling and output buffering technique is presented leading to a complete description of the proposed ATM switch. Various traffic models are used to quantify the switch's performance. These include a simple exponential inter-arrival time model, a locality of reference model and a self-similar, bursty, multiplexed Variable Bit Rate (VBR) model. FIFO queueing is simple to implement in an ATNI switch, however, more responsive queueing strategies can result in an improved performance. An associative memory is presented which allows the separate queues in the ATM switch to be effectively logically combined into a single FIFO queue. The associative memory is described in detail and its feasibility is shown by laying out the Integrated Circuit masks and performing an analogue simulation of the IC's performance is SPICE3. Although optimisations were required to the original design, the feasibility of the approach is shown with a 15Ƞs write time and a 160Ƞs read time for a 32 row, 8 priority bit, 10 routing bit version of the memory. This is achieved with 2µm technology, more advanced technologies may result in even better performance. The various traffic models and switch models are simulated in a number of runs. This shows the performance of the hypercube which outperforms a Clos network of equivalent technology and approaches the performance of an ideal reference fabric. The associative memory leverages a significant performance advantage in the hypercube network and a modest advantage in the Clos network. The performance of the switches is shown to degrade with increasing traffic density, increasing locality of reference, increasing variance in the cell rate and increasing burst length. Interestingly, the fabrics show no real degradation in response to increasing self similarity in the fabric. Lastly, the appendices present suggestions on how redundancy, reliability and multicasting can be achieved in the hypercube fabric. An overview of integrated circuits is provided. A brief description of commercial ATM switching products is given. Lastly, a road map to the simulation code is provided in the form of descriptions of the functionality found in all of the files within the source tree. This is intended to provide the starting ground for anyone wishing to modify or extend the simulation system developed for this thesis

Cape Town University OpenUCT

Automatic synthesis and optimization of chip multiprocessors

Author: Nikitin Nikita
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2013
Field of study

The microprocessor technology has experienced an enormous growth during the last decades. Rapid downscale of the CMOS technology has led to higher operating frequencies and performance densities, facing the fundamental issue of power dissipation. Chip Multiprocessors (CMPs) have become the latest paradigm to improve the power-performance efficiency of computing systems by exploiting the parallelism inherent in applications. Industrial and prototype implementations have already demonstrated the benefits achieved by CMPs with hundreds of cores.CMP architects are challenged to take many complex design decisions. Only a few of them are:- What should be the ratio between the core and cache areas on a chip?- Which core architectures to select?- How many cache levels should the memory subsystem have?- Which interconnect topologies provide efficient on-chip communication?These and many other aspects create a complex multidimensional space for architectural exploration. Design Automation tools become essential to make the architectural exploration feasible under the hard time-to-market constraints. The exploration methods have to be efficient and scalable to handle future generation on-chip architectures with hundreds or thousands of cores.Furthermore, once a CMP has been fabricated, the need for efficient deployment of the many-core processor arises. Intelligent techniques for task mapping and scheduling onto CMPs are necessary to guarantee the full usage of the benefits brought by the many-core technology. These techniques have to consider the peculiarities of the modern architectures, such as availability of enhanced power saving techniques and presence of complex memory hierarchies.This thesis has several objectives. The first objective is to elaborate the methods for efficient analytical modeling and architectural design space exploration of CMPs. The efficiency is achieved by using analytical models instead of simulation, and replacing the exhaustive exploration with an intelligent search strategy. Additionally, these methods incorporate high-level models for physical planning. The related contributions are described in Chapters 3, 4 and 5 of the document.The second objective of this work is to propose a scalable task mapping algorithm onto general-purpose CMPs with power management techniques, for efficient deployment of many-core systems. This contribution is explained in Chapter 6 of this document.Finally, the third objective of this thesis is to address the issues of the on-chip interconnect design and exploration, by developing a model for simultaneous topology customization and deadlock-free routing in Networks-on-Chip. The developed methodology can be applied to various classes of the on-chip systems, ranging from general-purpose chip multiprocessors to application-specific solutions. Chapter 7 describes the proposed model.The presented methods have been thoroughly tested experimentally and the results are described in this dissertation. At the end of the document several possible directions for the future research are proposed

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

Secretaría de Estado de Cultura

Contention and achieved performance in multicomputer wormhole routing networks

Author: Tweedie Stephen C.
Publication venue: The University of Edinburgh
Publication date: 01/01/1998
Field of study

Edinburgh Research Archive

On a Multiprocessor Computer Farm for Online Physics Data Processing

Author: Sinanis N J
Publication venue: Union of Concerned Scientists
Publication date: 01/01/1999
Field of study

The topic of this thesis is the design-phase performance evaluation of a large multiprocessor (MP) computer farm intended for the on-line data processing of the Compact Muon Solenoid (CMS) experiment. CMS is a high energy Physics experiment, planned to operate at CERN (Geneva, Switzerland) during the year 2005. The CMS computer farm is consisting of 1,000 MP computer systems and a 1,000 X 1,000 communications switch. The followed approach to the farm performance evaluation is through simulation studies and evaluation of small prototype systems building blocks of the farm. For the purposes of the simulation studies, we have developed a discrete-event, event-driven simulator that is capable to describe the high-level architecture of the farm and give estimates of the farm's performance. The simulator is designed in a modular way to facilitate the development of various modules that model the behavior of the farm building blocks in the desired level of detail. With the aid of this simulator, we make a particular study on the scheduling of the nodes of the farm, showing that a preemptive scheduling can increase farm's throughput. We have developed a prototype setup of a farm node an event filter unit. The setup consists of a high performance MP system (the farm node) connected to a second computer system (used to emulate the data sources) through an ATM network. The performance issues of interfacing a network interface controller (NIC) to the application running in the farm node, are explored. It is shown with the aid of this setup, that the switch-to-farm interface (SFI) a device used to put together the incoming data fragments into a single entity can be entirely avoided by emulating its function in software. We show that in order to meet the required event assembly performance in the filter node inputs, the development effort has to concentrate on the NIC hardware, software and its interface to the application, rather than building a custom designed device specialized to perform the task of event assembly. Finally, the farm scaling issues are investigated. Our aim is to obtain an "operational region" inside the farm configuration space, when the various networking speeds are taken into account. Analytically obtained results that have been confirmed with the above mentioned simulator, are discussed. We present also results showing the influence 8 of the inherent to the farm parameters (like the algorithm rejection factor) on the requirements for the farm building blocks (sustained I/O bandwidth) of the inherent to the farm parameters (like the algorithm rejection factor) on the requirements for the farm building blocks (sustained I/O bandwidth)

CERN Document Server

Optimizing Communication for Massively Parallel Processing

Author: Kumar Sameer
Publication venue
Publication date: 01/04/2005
Field of study

The current trends in high performance computing show that large machines with tens of thousands of processors will soon be readily available. The IBM Bluegene-L machine with 128k processors (which is currently being deployed) is an important step in this direction. In this scenario, it is going to be a significant burden for the programmer to manually scale his applications. This task of scaling involves addressing issues like load-imbalance and communication overhead. In this thesis, we explore several communication optimizations to help parallel applications to easily scale on a large number of processors. We also present automatic runtime techniques to relieve the programmer from the burden of optimizing communication in his applications. This thesis explores processor virtualization to improve communication performance in applications. With processor virtualization, the computation is mapped to virtual processors (VPs). After one VP has finished computation and is waiting for responses to its messages, another VP can compute, thus overlapping communication with computation. This overlap is only effective if the processor overhead of the communication operation is a small fraction of the total communication time. Fortunately, with network interfaces having co-processors, this happens to be true and processor virtualization has a natural advantage on such interconnects. The communication optimizations we present in this thesis, are motivated by applications such as NAMD (a classical molecular dynamics application) and CPAIMD (a quantum chemistry application). Applications like NAMD and CPAIMD consume a fair share of the time available on supercomputers. So, improving their performance would be of great value. We have successfully scaled NAMD to 1TF of peak performance on 3000 processors of PSC Lemieux, using the techniques presented in this thesis. We study both point-to-point communication and collective communication (specifically all-to-all communication). On a large number of processors all-to-all communication can take several milli-seconds to finish. With synchronous collectives defined in MPI, the processor idles while the collective messages are in flight. Therefore, we demonstrate an asynchronous collective communication framework, to let the CPU compute while the all-to-all messages are in flight. We also show that the best strategy for all-to-all communication depends on the message size, number of processors and other dynamic parameters. This suggests that these parameters can be observed at runtime and used to choose the optimal strategy for all-to-all communication. In this thesis, we demonstrate adaptive strategy switching for all-to-all communication. The communication optimization framework presented in this thesis, has been designed to optimize communication in the context of processor virtualization and dynamic migrating objects. We present the streaming strategy to optimize fine grained object-to-object communication. In this thesis, we motivate the need for hardware collectives, as processor based collectives can be delayed by intermediate that processors busy with computation. We explore a next generation interconnect that supports collectives in the switching hardware. We show the performance gains of hardware collectives through synthetic benchmarks

Illinois Digital Environment for Access to Learning and Scholarship Repository

Design of complex integrated systems based on networks-on-chip: Trading off performance, power and reliability

Author: Cornelius Claas (gnd: 1025615611)
Publication venue: Universität Rostock Rostock
Publication date
Field of study

The steady advancement of microelectronics is associated with an escalating number of challenges for design engineers due to both the tiny dimensions and the enormous complexity of integrated systems. Against this background, this work deals with Network-On-Chip (NOC) as the emerging design paradigm to cope with diverse issues of nanotechnology. The detailed investigations within the chapters focus on the communication-centric aspects of multi-core-systems, whereas performance, power consumption as well as reliability are considered likewise as the essential design criteria

Rostocker Dokumentenserver

High Performance Network Evaluation and Testing

Author: Pilimon Artur
Publication venue: DTU - Department of Photonics Engineering
Publication date: 01/01/2018
Field of study

Online Research Database In Technology