479 research outputs found
Quarc: an architecture for efficient on-chip communication
The exponential downscaling of the feature size has enforced a paradigm shift from computation-based design to communication-based design in system on chip development. Buses, the traditional communication architecture in systems on chip, are incapable of addressing the increasing bandwidth requirements of future large systems.
Networks on chip have emerged as an interconnection architecture offering unique solutions to the technological and design issues related to communication in future systems on chip. The transition from buses as a shared medium to networks on chip as a segmented medium has given rise to new challenges in system on chip realm.
By leveraging the shared nature of the communication medium, buses have been highly efficient in delivering multicast communication. The segmented nature of networks, however, inhibits the multicast messages to be delivered as efficiently by networks on chip. Relying on extensive research on multicast communication in parallel computers, several network on chip architectures have offered mechanisms to perform the operation, while conforming to resource constraints of the network on chip paradigm. Multicast communication in majority of these networks on chip is implemented by establishing a connection between source and all multicast destinations before the message transmission
commences. Establishing the connections incurs an overhead and, therefore, is not desirable; in particular in latency sensitive services such as cache coherence.
To address high performance multicast communication, this research presents Quarc, a novel network on chip architecture. The Quarc architecture targets an area-efficient, low power, high performance implementation. The thesis covers a detailed representation of
the building blocks of the architecture, including topology, router and network interface.
The cost and performance comparison of the Quarc architecture against other network on chip architectures reveals that the Quarc architecture is a highly efficient architecture.
Moreover, the thesis introduces novel performance models of complex traffic patterns, including multicast and quality of service-aware communication
A hybrid queueing model for fast broadband networking simulation
PhDThis research focuses on the investigation of a fast simulation method for broadband
telecommunication networks, such as ATM networks and IP networks. As a result of
this research, a hybrid simulation model is proposed, which combines the analytical
modelling and event-driven simulation modelling to speeding up the overall
simulation.
The division between foreground and background traffic and the way of dealing with
these different types of traffic to achieve improvement in simulation time is the major
contribution reported in this thesis. Background traffic is present to ensure that proper
buffering behaviour is included during the course of the simulation experiments, but
only the foreground traffic of interest is simulated, unlike traditional simulation
techniques. Foreground and background traffic are dealt with in a different way.
To avoid the need for extra events on the event list, and the processing overhead,
associated with the background traffic, the novel technique investigated in this
research is to remove the background traffic completely, adjusting the service time of
the queues for the background traffic to compensate (in most cases, the service time
for the foreground traffic will increase). By removing the background traffic from the
event-driven simulator the number of cell processing events dealt with is reduced
drastically.
Validation of this approach shows that, overall, the method works well, but the
simulation using this method does have some differences compared with experimental
results on a testbed. The reason for this is mainly because of the assumptions behind
the analytical model that make the modelling tractable.
Hence, the analytical model needs to be adjusted. This is done by having a neural
network trained to learn the relationship between the input traffic parameters and the
output difference between the proposed model and the testbed. Following this
training, simulations can be run using the output of the neural network to adjust the
analytical model for those particular traffic conditions.
The approach is applied to cell scale and burst scale queueing to simulate an ATM
switch, and it is also used to simulate an IP router. In all the applications, the method
ensures a fast simulation as well as an accurate result
Recommended from our members
Methods for Performance Evaluation of Parallel Computer Systems
Although parallel computers have existed for many years, recently there has been a surge of academic, industrial and governmental interest in parallel computing. Commercially manufactured parallel computers have started to become available. Many new experimental parallel architectures are reported in the literature every year. Software for many types of applications, from scientific number crunching to artificial intelligence, is being written to run on parallel machines. Performance is an essential consideration both in the design of new systems and the deployment of existing systems. Users of computers wish to utilize their hardware and software systems as efficiently as possible. Over the years, a field known as computer performance evaluation has arisen to address the problem of quantifying and predicting computer performance. Methods exist that can determine how efficiently a system's resources are being used. These can help track down the probable causes of performance problems
A formalism for describing and simulating systems with interacting components.
This thesis addresses the problem of descriptive complexity presented by systems involving a high number of interacting components. It investigates the evaluation measure of performability and its application to such systems. A new description and simulation language, ICE and it's application to performability modelling is presented. ICE (Interacting ComponEnts) is based upon an earlier description language which was first proposed for defining reliability problems. ICE is declarative in style and has a limited number of keywords. The ethos in the development of the language has been to provide an intuitive formalism with a powerful descriptive space. The full syntax of the language is presented with discussion as to its philosophy. The implementation of a discrete event simulator using an ICE interface is described, with use being made of examples to illustrate the functionality of the code and the semantics of the language. Random numbers are used to provide the required stochastic behaviour within the simulator. The behaviour of an industry standard generator within the simulator and different methods of number allocation are shown. A new generator is proposed that is a development of a fast hardware shift register generator and is demonstrated to possess good statistical properties and operational speed. For the purpose of providing a rigorous description of the language and clarification of its semantics, a computational model is developed using the formalism of extended coloured Petri nets. This model also gives an indication of the language's descriptive power relative to that of a recognised and well developed technique. Some recognised temporal and structural problems of system event modelling are identified. and ICE solutions given. The growing research area of ATM communication networks is introduced and a sophisticated top down model of an ATM switch presented. This model is simulated and interesting results are given. A generic ICE framework for performability modelling is developed and demonstrated. This is considered as a positive contribution to the general field of performability research
Formal Availability Analysis using Theorem Proving
Availability analysis is used to assess the possible failures and their
restoration process for a given system. This analysis involves the calculation
of instantaneous and steady-state availabilities of the individual system
components and the usage of this information along with the commonly used
availability modeling techniques, such as Availability Block Diagrams (ABD) and
Fault Trees (FTs) to determine the system-level availability. Traditionally,
availability analyses are conducted using paper-and-pencil methods and
simulation tools but they cannot ascertain absolute correctness due to their
inaccuracy limitations. As a complementary approach, we propose to use the
higher-order-logic theorem prover HOL4 to conduct the availability analysis of
safety-critical systems. For this purpose, we present a higher-order-logic
formalization of instantaneous and steady-state availability, ABD
configurations and generic unavailability FT gates. For illustration purposes,
these formalizations are utilized to conduct formal availability analysis of a
satellite solar array, which is used as the main source of power for the Dong
Fang Hong-3 (DFH-3) satellite.Comment: 16 pages. arXiv admin note: text overlap with arXiv:1505.0264
Switching considerations in storage networks.
by Leung Yiu Tong.Thesis (M.Phil.)--Chinese University of Hong Kong, 2003.Includes bibliographical references (leaves 96-98).Abstracts in English and Chinese.Chapter 1. --- Introduction --- p.1Chapter 1.1 --- Motivation --- p.1Chapter 1.2 --- Thesis Organization --- p.3Chapter 2. --- Storage Network Fundamentals --- p.4Chapter 2.1 --- Storage Network Topology --- p.4Chapter 2.1.1 --- Direct Attached Storage (DAS) --- p.5Chapter 2.1.2 --- Network Attached Storage (NAS) --- p.7Chapter 2.1.3 --- Storage Area Network (SAN) --- p.9Chapter 2.1.3.1 --- SAN and the Fibre Channel Protocol --- p.11Chapter 2.1.4 --- Summary on Storage Network Topology --- p.12Chapter 2.2 --- Storage Protocol --- p.15Chapter 2.2.1 --- Fibre Channel --- p.15Chapter 2.2.1.1 --- Fibre Channel over IP (FCIP) --- p.17Chapter 2.2.1.2 --- Internet Fibre Channel Protocol (iFCP) --- p.19Chapter 2.2.2 --- Internet SCSI (iSCSI) --- p.20Chapter 2.2.3 --- InfiniBand --- p.22Chapter 2.2.4 --- Review on Storage Network Protocol --- p.25Chapter 2.3 --- Standard Organization --- p.27Chapter 2.4 --- Summary --- p.28Chapter 3. --- Switching Design for Storage Networks --- p.30Chapter 3.1. --- Shared Bus Design --- p.32Chapter 3.2. --- Time Division Switch --- p.36Chapter 3.3. --- Share Buffer Memory Switch --- p.37Chapter 3.3.1 --- Parallel Memory Array --- p.40Chapter 3.3.2 --- Distributive Storage --- p.43Chapter 3.4. --- Crossbar Switch --- p.45Chapter 3.4.1 --- Arbitrated Crossbar vs. Buffered Crossbar --- p.46Chapter 3.4.1.1 --- Arbitrated Crossbar Switch --- p.47Chapter 3.4.1.2 --- Buffered Crossbar Switch --- p.48Chapter 3.4.2 --- Switch Scheduling --- p.49Chapter 3.4.2.1 --- Bipartite Matching --- p.50Chapter 3.4.2.2 --- Token-based Distributive Scheduling --- p.53Chapter 3.4.2.3 --- Resource Counting using Semaphore --- p.56Chapter 3.5. --- Algebraic Switches --- p.60Chapter 3.5.1 --- Switching by Conditionally Nonblocking Properties --- p.61Chapter 3.5.2 --- Self-Routing Mechanism with Zero-Bit Buffering --- p.64Chapter 3.5.3 --- Multistage Interconnection of Self-routing Concentrators --- p.69Chapter 3.6. --- Summary --- p.73Chapter 4. --- Investigating Switching Issue in Storage Networks --- p.74Chapter 4.1 --- Choosing a Suitable Switch --- p.74Chapter 4.2 --- Quality of Service (QoS) --- p.76Chapter 4.3 --- Multicasting --- p.77Chapter 4.3.1 --- Crossbar Switch --- p.78Chapter 4.3.2 --- Shared-Buffer Memory Switches --- p.80Chapter 4.3.3 --- Algebraic Switch --- p.82Chapter 4.3.4 --- Application on Multicast Transmission --- p.86Chapter 4.4 --- Load Balancing Mechanism --- p.87Chapter 4.5 --- Optimization on Storage Utilization --- p.91Chapter 4.6 --- Summary --- p.93Chapter 5. --- Conclusion and Summary of Original Contributions --- p.9
DESIGN OF EFFICIENT PACKET MARKING-BASED CONGESTION MANAGEMENT TECHNIQUES FOR CLUSTER INTERCONNECTS
El crecimiento de los computadores paralelos basados en redes de altas prestaciones ha aumentado el interĂŠs y esfuerzo de la comunidad investigadora en desarrollar nuevas tĂŠcnicas que permitan obtener el mejor rendimiento de estas redes. En particular, el desarrollo de nuevas tĂŠcnicas que permitan un encaminamiento eficiente y que reduzcan la latencia de los paquetes, aumentando asĂ la productividad de la red. Sin embargo, una alta tasa de utilizaciĂłn de la red podrĂa conllevar el que se conoce como "congestiĂłn de red", el cual puede causar una degradaciĂłn del rendimiento.
El control de la congestiĂłn en redes multietapa es un problema importante que no estĂĄ completamente resuelto. Con el fin de evitar la degradaciĂłn del rendimiento de la red cuando aparece congestiĂłn, se han propuesto diferentes mecanismos para el control de la congestiĂłn. Muchos de estos mecanismos estĂĄn basados en notificaciĂłn explĂcita de la congestiĂłn. Para este propĂłsito, los switches detectan congestiĂłn y dependiendo de la estrategia aplicada, los paquetes son marcados con la finalidad de advertir a los nodos origenes. Como respuesta, los nodos origenes aplican acciones correctivas para ajustar su tasa de inyecciĂłn de paquetes.
El propósito de esta tesis es analizar las diferentes estratÊgias de detección y corrección de la congestión en redes multietapa, y proponer nuevos mecanismos de control de la congestión encaminados a este tipo de redes sin descarte de paquetes. Las nuevas propuestas estån basadas en una estrategia mås refinada de marcaje de paquetes en combinación con un conjunto de acciones correctivas justas que harån al mecanismo capaz de controlar la congestión de manera efectiva con independencia del grado de congestión y de las condiciones de tråfico.Ferrer PÊrez, JL. (2012). DESIGN OF EFFICIENT PACKET MARKING-BASED CONGESTION MANAGEMENT TECHNIQUES FOR CLUSTER INTERCONNECTS [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/18197Palanci
Efficient processor management strategies for multicomputer systems
Multicomputers are cost-effective alternatives to the conventional supercomputers. Contemporary processor management schemes tend to underutilize the processors and leave many of the processors in the system idle while jobs are waiting for execution;Instead of designing faster processors or interconnection networks, a substantial performance improvement can be obtained by implementing better processor management strategies. This dissertation studies the performance issues related to the processor management schemes and proposes several ways to enhance the multicomputer systems by means of processor management. The proposed schemes incorporate the concepts of size-reduction, non-contiguous allocation, as well as job migration. Job scheduling using a bypass-queue is also studied. All the proposed schemes are proven effective in improving the system performance via extensive simulations. Each proposed scheme has different implementation cost and constraints. In order to take advantage of these schemes, judicious selection of system parameters is important and is discussed
Multistage Packet-Switching Fabrics for Data Center Networks
Recent applications have imposed stringent requirements within the Data Center Network (DCN) switches in terms of scalability, throughput and latency. In this thesis, the architectural design of the packet-switches is tackled in different ways to enable the expansion in both the number of connected endpoints and traffic volume.
A cost-effective Clos-network switch with partially buffered units is proposed and two packet scheduling algorithms are described. The first algorithm adopts many simple and distributed arbiters, while the second approach relies on a central arbiter to guarantee an ordered packet delivery.
For an improved scalability, the Clos switch is build using a Network-on-Chip (NoC) fabric instead of the common crossbar units. The Clos-UDN architecture made with Input-Queued (IQ) Uni-Directional NoC modules (UDNs) simplifies the input line cards and obviates the need for the costly Virtual Output Queues (VOQs). It also avoids the need for complex, and synchronized scheduling processes, and offers speedup, load balancing, and good path diversity.
Under skewed traffic, a reliable micro load-balancing contributes to boosting the overall network performance. Taking advantage of the NoC paradigm, a wrapped-around multistage switch with fully interconnected Central Modules (CMs) is proposed. The architecture operates with a congestion-aware routing algorithm that proactively distributes the traffic load across the switching modules, and enhances the switch performance under critical packet arrivals.
The implementation of small on-chip buffers has been made perfectly feasible using the current technology. This motivated the implementation of a large switching architecture with an Output-Queued (OQ)
NoC fabric. The design merges assets of the output queuing, and
NoCs to provide high throughput, and smooth latency variations.
An approximate analytical model of the switch performance is also proposed.
To further exploit the potential of the NoC fabrics and their modularity features, a high capacity Clos switch with Multi-Directional NoC
(MDN) modules is presented. The Clos-MDN switching architecture exhibits a more compact layout than the Clos-UDN switch. It scales better and faster in port count and traffic load. Results achieved in this thesis demonstrate the high performance, expandability and programmability features of the proposed packet-switches which makes them promising candidates for the next-generation data center networking infrastructure
- âŚ