Search CORE

479 research outputs found

Quarc: an architecture for efficient on-chip communication

Author: Moadeli Mahmoud
Publication venue
Publication date: 01/01/2010
Field of study

The exponential downscaling of the feature size has enforced a paradigm shift from computation-based design to communication-based design in system on chip development. Buses, the traditional communication architecture in systems on chip, are incapable of addressing the increasing bandwidth requirements of future large systems. Networks on chip have emerged as an interconnection architecture offering unique solutions to the technological and design issues related to communication in future systems on chip. The transition from buses as a shared medium to networks on chip as a segmented medium has given rise to new challenges in system on chip realm. By leveraging the shared nature of the communication medium, buses have been highly efficient in delivering multicast communication. The segmented nature of networks, however, inhibits the multicast messages to be delivered as efficiently by networks on chip. Relying on extensive research on multicast communication in parallel computers, several network on chip architectures have offered mechanisms to perform the operation, while conforming to resource constraints of the network on chip paradigm. Multicast communication in majority of these networks on chip is implemented by establishing a connection between source and all multicast destinations before the message transmission commences. Establishing the connections incurs an overhead and, therefore, is not desirable; in particular in latency sensitive services such as cache coherence. To address high performance multicast communication, this research presents Quarc, a novel network on chip architecture. The Quarc architecture targets an area-efficient, low power, high performance implementation. The thesis covers a detailed representation of the building blocks of the architecture, including topology, router and network interface. The cost and performance comparison of the Quarc architecture against other network on chip architectures reveals that the Quarc architecture is a highly efficient architecture. Moreover, the thesis introduces novel performance models of complex traffic patterns, including multicast and quality of service-aware communication

Glasgow Theses Service

CiteSeerX

OpenGrey Repository

A hybrid queueing model for fast broadband networking simulation

Author: Liu Enjie
Publication venue: 'Queen Mary University of London'
Publication date: 01/01/2002
Field of study

PhDThis research focuses on the investigation of a fast simulation method for broadband telecommunication networks, such as ATM networks and IP networks. As a result of this research, a hybrid simulation model is proposed, which combines the analytical modelling and event-driven simulation modelling to speeding up the overall simulation. The division between foreground and background traffic and the way of dealing with these different types of traffic to achieve improvement in simulation time is the major contribution reported in this thesis. Background traffic is present to ensure that proper buffering behaviour is included during the course of the simulation experiments, but only the foreground traffic of interest is simulated, unlike traditional simulation techniques. Foreground and background traffic are dealt with in a different way. To avoid the need for extra events on the event list, and the processing overhead, associated with the background traffic, the novel technique investigated in this research is to remove the background traffic completely, adjusting the service time of the queues for the background traffic to compensate (in most cases, the service time for the foreground traffic will increase). By removing the background traffic from the event-driven simulator the number of cell processing events dealt with is reduced drastically. Validation of this approach shows that, overall, the method works well, but the simulation using this method does have some differences compared with experimental results on a testbed. The reason for this is mainly because of the assumptions behind the analytical model that make the modelling tractable. Hence, the analytical model needs to be adjusted. This is done by having a neural network trained to learn the relationship between the input traffic parameters and the output difference between the proposed model and the testbed. Following this training, simulations can be run using the output of the neural network to adjust the analytical model for those particular traffic conditions. The approach is applied to cell scale and burst scale queueing to simulate an ATM switch, and it is also used to simulate an IP router. In all the applications, the method ensures a fast simulation as well as an accurate result

Queen Mary Research Online

Recommended from our members

Methods for Performance Evaluation of Parallel Computer Systems

Author: Eisenstadter Yoram
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/1986
Field of study

Although parallel computers have existed for many years, recently there has been a surge of academic, industrial and governmental interest in parallel computing. Commercially manufactured parallel computers have started to become available. Many new experimental parallel architectures are reported in the literature every year. Software for many types of applications, from scientific number crunching to artificial intelligence, is being written to run on parallel machines. Performance is an essential consideration both in the design of new systems and the deployment of existing systems. Users of computers wish to utilize their hardware and software systems as efficiently as possible. Over the years, a field known as computer performance evaluation has arisen to address the problem of quantifying and predicting computer performance. Methods exist that can determine how efficiently a system's resources are being used. These can help track down the probable causes of performance problems

Columbia University Academic Commons

A formalism for describing and simulating systems with interacting components.

Author: Corr Glenn A.
Publication venue
Publication date: 31/05/1996
Field of study

This thesis addresses the problem of descriptive complexity presented by systems involving a high number of interacting components. It investigates the evaluation measure of performability and its application to such systems. A new description and simulation language, ICE and it's application to performability modelling is presented. ICE (Interacting ComponEnts) is based upon an earlier description language which was first proposed for defining reliability problems. ICE is declarative in style and has a limited number of keywords. The ethos in the development of the language has been to provide an intuitive formalism with a powerful descriptive space. The full syntax of the language is presented with discussion as to its philosophy. The implementation of a discrete event simulator using an ICE interface is described, with use being made of examples to illustrate the functionality of the code and the semantics of the language. Random numbers are used to provide the required stochastic behaviour within the simulator. The behaviour of an industry standard generator within the simulator and different methods of number allocation are shown. A new generator is proposed that is a development of a fast hardware shift register generator and is demonstrated to possess good statistical properties and operational speed. For the purpose of providing a rigorous description of the language and clarification of its semantics, a computational model is developed using the formalism of extended coloured Petri nets. This model also gives an indication of the language's descriptive power relative to that of a recognised and well developed technique. Some recognised temporal and structural problems of system event modelling are identified. and ICE solutions given. The growing research area of ATM communication networks is introduced and a sophisticated top down model of an ATM switch presented. This model is simulated and interesting results are given. A generic ICE framework for performability modelling is developed and demonstrated. This is considered as a positive contribution to the general field of performability research

Open Access Institutional Repository at Robert Gordon University

Formal Availability Analysis using Theorem Proving

Author: CE Ebeling
F Bistouni
HC Wu
J Harrison
J Wu
JP Signoret
JT Blake
KS Trivedi
M Bozzano
M Gordon
P Bailis
R Robidoux
RF Stapelberg
T Mhamdi
W Ahmed
W Ahmed
Publication venue
Publication date: 05/08/2016
Field of study

Availability analysis is used to assess the possible failures and their restoration process for a given system. This analysis involves the calculation of instantaneous and steady-state availabilities of the individual system components and the usage of this information along with the commonly used availability modeling techniques, such as Availability Block Diagrams (ABD) and Fault Trees (FTs) to determine the system-level availability. Traditionally, availability analyses are conducted using paper-and-pencil methods and simulation tools but they cannot ascertain absolute correctness due to their inaccuracy limitations. As a complementary approach, we propose to use the higher-order-logic theorem prover HOL4 to conduct the availability analysis of safety-critical systems. For this purpose, we present a higher-order-logic formalization of instantaneous and steady-state availability, ABD configurations and generic unavailability FT gates. For illustration purposes, these formalizations are utilized to conduct formal availability analysis of a satellite solar array, which is used as the main source of power for the Dong Fang Hong-3 (DFH-3) satellite.Comment: 16 pages. arXiv admin note: text overlap with arXiv:1505.0264

arXiv.org e-Print Archive

Crossref

Switching considerations in storage networks.

Author
Publication venue
Publication date: 01/01/2003
Field of study

by Leung Yiu Tong.Thesis (M.Phil.)--Chinese University of Hong Kong, 2003.Includes bibliographical references (leaves 96-98).Abstracts in English and Chinese.Chapter 1. --- Introduction --- p.1Chapter 1.1 --- Motivation --- p.1Chapter 1.2 --- Thesis Organization --- p.3Chapter 2. --- Storage Network Fundamentals --- p.4Chapter 2.1 --- Storage Network Topology --- p.4Chapter 2.1.1 --- Direct Attached Storage (DAS) --- p.5Chapter 2.1.2 --- Network Attached Storage (NAS) --- p.7Chapter 2.1.3 --- Storage Area Network (SAN) --- p.9Chapter 2.1.3.1 --- SAN and the Fibre Channel Protocol --- p.11Chapter 2.1.4 --- Summary on Storage Network Topology --- p.12Chapter 2.2 --- Storage Protocol --- p.15Chapter 2.2.1 --- Fibre Channel --- p.15Chapter 2.2.1.1 --- Fibre Channel over IP (FCIP) --- p.17Chapter 2.2.1.2 --- Internet Fibre Channel Protocol (iFCP) --- p.19Chapter 2.2.2 --- Internet SCSI (iSCSI) --- p.20Chapter 2.2.3 --- InfiniBand --- p.22Chapter 2.2.4 --- Review on Storage Network Protocol --- p.25Chapter 2.3 --- Standard Organization --- p.27Chapter 2.4 --- Summary --- p.28Chapter 3. --- Switching Design for Storage Networks --- p.30Chapter 3.1. --- Shared Bus Design --- p.32Chapter 3.2. --- Time Division Switch --- p.36Chapter 3.3. --- Share Buffer Memory Switch --- p.37Chapter 3.3.1 --- Parallel Memory Array --- p.40Chapter 3.3.2 --- Distributive Storage --- p.43Chapter 3.4. --- Crossbar Switch --- p.45Chapter 3.4.1 --- Arbitrated Crossbar vs. Buffered Crossbar --- p.46Chapter 3.4.1.1 --- Arbitrated Crossbar Switch --- p.47Chapter 3.4.1.2 --- Buffered Crossbar Switch --- p.48Chapter 3.4.2 --- Switch Scheduling --- p.49Chapter 3.4.2.1 --- Bipartite Matching --- p.50Chapter 3.4.2.2 --- Token-based Distributive Scheduling --- p.53Chapter 3.4.2.3 --- Resource Counting using Semaphore --- p.56Chapter 3.5. --- Algebraic Switches --- p.60Chapter 3.5.1 --- Switching by Conditionally Nonblocking Properties --- p.61Chapter 3.5.2 --- Self-Routing Mechanism with Zero-Bit Buffering --- p.64Chapter 3.5.3 --- Multistage Interconnection of Self-routing Concentrators --- p.69Chapter 3.6. --- Summary --- p.73Chapter 4. --- Investigating Switching Issue in Storage Networks --- p.74Chapter 4.1 --- Choosing a Suitable Switch --- p.74Chapter 4.2 --- Quality of Service (QoS) --- p.76Chapter 4.3 --- Multicasting --- p.77Chapter 4.3.1 --- Crossbar Switch --- p.78Chapter 4.3.2 --- Shared-Buffer Memory Switches --- p.80Chapter 4.3.3 --- Algebraic Switch --- p.82Chapter 4.3.4 --- Application on Multicast Transmission --- p.86Chapter 4.4 --- Load Balancing Mechanism --- p.87Chapter 4.5 --- Optimization on Storage Utilization --- p.91Chapter 4.6 --- Summary --- p.93Chapter 5. --- Conclusion and Summary of Original Contributions --- p.9

CUHK Digital Repository

Simulation models of shared-memory multiprocessor systems

Author: Coe Paul.
Publication venue: The University of Edinburgh
Publication date: 01/01/2000
Field of study

Edinburgh Research Archive

DESIGN OF EFFICIENT PACKET MARKING-BASED CONGESTION MANAGEMENT TECHNIQUES FOR CLUSTER INTERCONNECTS

Author: Ferrer Pérez Joan Lluís
Publication venue: 'Universitat Politecnica de Valencia'
Publication date: 19/12/2012
Field of study

El crecimiento de los computadores paralelos basados en redes de altas prestaciones ha aumentado el interés y esfuerzo de la comunidad investigadora en desarrollar nuevas técnicas que permitan obtener el mejor rendimiento de estas redes. En particular, el desarrollo de nuevas técnicas que permitan un encaminamiento eficiente y que reduzcan la latencia de los paquetes, aumentando así la productividad de la red. Sin embargo, una alta tasa de utilización de la red podría conllevar el que se conoce como "congestión de red", el cual puede causar una degradación del rendimiento. El control de la congestión en redes multietapa es un problema importante que no está completamente resuelto. Con el fin de evitar la degradación del rendimiento de la red cuando aparece congestión, se han propuesto diferentes mecanismos para el control de la congestión. Muchos de estos mecanismos están basados en notificación explícita de la congestión. Para este propósito, los switches detectan congestión y dependiendo de la estrategia aplicada, los paquetes son marcados con la finalidad de advertir a los nodos origenes. Como respuesta, los nodos origenes aplican acciones correctivas para ajustar su tasa de inyección de paquetes. El propósito de esta tesis es analizar las diferentes estratégias de detección y corrección de la congestión en redes multietapa, y proponer nuevos mecanismos de control de la congestión encaminados a este tipo de redes sin descarte de paquetes. Las nuevas propuestas están basadas en una estrategia más refinada de marcaje de paquetes en combinación con un conjunto de acciones correctivas justas que harán al mecanismo capaz de controlar la congestión de manera efectiva con independencia del grado de congestión y de las condiciones de tráfico.Ferrer Pérez, JL. (2012). DESIGN OF EFFICIENT PACKET MARKING-BASED CONGESTION MANAGEMENT TECHNIQUES FOR CLUSTER INTERCONNECTS [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/18197Palanci

Crossref

RiuNet

Efficient processor management strategies for multicomputer systems

Author: Chang Chung-Yen
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/1997
Field of study

Multicomputers are cost-effective alternatives to the conventional supercomputers. Contemporary processor management schemes tend to underutilize the processors and leave many of the processors in the system idle while jobs are waiting for execution;Instead of designing faster processors or interconnection networks, a substantial performance improvement can be obtained by implementing better processor management strategies. This dissertation studies the performance issues related to the processor management schemes and proposes several ways to enhance the multicomputer systems by means of processor management. The proposed schemes incorporate the concepts of size-reduction, non-contiguous allocation, as well as job migration. Job scheduling using a bypass-queue is also studied. All the proposed schemes are proven effective in improving the system performance via extensive simulations. Each proposed scheme has different implementation cost and constraints. In order to take advantage of these schemes, judicious selection of system parameters is important and is discussed

Digital Repository @ Iowa State University (ISU)

Multistage Packet-Switching Fabrics for Data Center Networks

Author: Hassen Fadoua
Publication venue: University of Leeds
Publication date: 05/04/2017
Field of study

Recent applications have imposed stringent requirements within the Data Center Network (DCN) switches in terms of scalability, throughput and latency. In this thesis, the architectural design of the packet-switches is tackled in different ways to enable the expansion in both the number of connected endpoints and traffic volume. A cost-effective Clos-network switch with partially buffered units is proposed and two packet scheduling algorithms are described. The first algorithm adopts many simple and distributed arbiters, while the second approach relies on a central arbiter to guarantee an ordered packet delivery. For an improved scalability, the Clos switch is build using a Network-on-Chip (NoC) fabric instead of the common crossbar units. The Clos-UDN architecture made with Input-Queued (IQ) Uni-Directional NoC modules (UDNs) simplifies the input line cards and obviates the need for the costly Virtual Output Queues (VOQs). It also avoids the need for complex, and synchronized scheduling processes, and offers speedup, load balancing, and good path diversity. Under skewed traffic, a reliable micro load-balancing contributes to boosting the overall network performance. Taking advantage of the NoC paradigm, a wrapped-around multistage switch with fully interconnected Central Modules (CMs) is proposed. The architecture operates with a congestion-aware routing algorithm that proactively distributes the traffic load across the switching modules, and enhances the switch performance under critical packet arrivals. The implementation of small on-chip buffers has been made perfectly feasible using the current technology. This motivated the implementation of a large switching architecture with an Output-Queued (OQ) NoC fabric. The design merges assets of the output queuing, and NoCs to provide high throughput, and smooth latency variations. An approximate analytical model of the switch performance is also proposed. To further exploit the potential of the NoC fabrics and their modularity features, a high capacity Clos switch with Multi-Directional NoC (MDN) modules is presented. The Clos-MDN switching architecture exhibits a more compact layout than the Clos-UDN switch. It scales better and faster in port count and traffic load. Results achieved in this thesis demonstrate the high performance, expandability and programmability features of the proposed packet-switches which makes them promising candidates for the next-generation data center networking infrastructure

Biblioteca Digital de la Comunidad de Madrid

White Rose E-theses Online