Search CORE

327 research outputs found

New Fault Tolerant Multicast Routing Techniques to Enhance Distributed-Memory Systems Performance

Author: Shaheen Masoud Esmail Masoud
Publication venue: The Aquila Digital Community
Publication date: 01/12/2013
Field of study

Distributed-memory systems are a key to achieve high performance computing and the most favorable architectures used in advanced research problems. Mesh connected multicomputer are one of the most popular architectures that have been implemented in many distributed-memory systems. These systems must support communication operations efficiently to achieve good performance. The wormhole switching technique has been widely used in design of distributed-memory systems in which the packet is divided into small flits. Also, the multicast communication has been widely used in distributed-memory systems which is one source node sends the same message to several destination nodes. Fault tolerance refers to the ability of the system to operate correctly in the presence of faults. Development of fault tolerant multicast routing algorithms in 2D mesh networks is an important issue. This dissertation presents, new fault tolerant multicast routing algorithms for distributed-memory systems performance using wormhole routed 2D mesh. These algorithms are described for fault tolerant routing in 2D mesh networks, but it can also be extended to other topologies. These algorithms are a combination of a unicast-based multicast algorithm and tree-based multicast algorithms. These algorithms works effectively for the most commonly encountered faults in mesh networks, f-rings, f-chains and concave fault regions. It is shown that the proposed routing algorithms are effective even in the presence of a large number of fault regions and large size of fault region. These algorithms are proved to be deadlock-free. Also, the problem of fault regions overlap is solved. Four essential performance metrics in mesh networks will be considered and calculated; also these algorithms are a limited-global-information-based multicasting which is a compromise of local-information-based approach and global-information-based approach. Data mining is used to validate the results and to enlarge the sample. The proposed new multicast routing techniques are used to enhance the performance of distributed-memory systems. Simulation results are presented to demonstrate the efficiency of the proposed algorithms

Aquila Digital Community

Cost Effective Routing Implementations for On-chip Networks

Author: Rodrigo Mocholí Samuel
Publication venue: 'Universitat Politecnica de Valencia'
Publication date: 29/11/2010
Field of study

Arquitecturas de múltiples núcleos como multiprocesadores (CMP) y soluciones multiprocesador para sistemas dentro del chip (MPSoCs) actuales se basan en la eficacia de las redes dentro del chip (NoC) para la comunicación entre los diversos núcleos. Un diseño eficiente de red dentro del chip debe ser escalable y al mismo tiempo obtener valores ajustados de área, latencia y consumo de energía. Para diseños de red dentro del chip de propósito general se suele usar topologías de malla 2D ya que se ajustan a la distribución del chip. Sin embargo, la aparición de nuevos retos debe ser abordada por los diseñadores. Una mayor probabilidad de defectos de fabricación, la necesidad de un uso optimizado de los recursos para aumentar el paralelismo a nivel de aplicación o la necesidad de técnicas eficaces de ahorro de energía, puede ocasionar patrones de irregularidad en las topologías. Además, el soporte para comunicación colectiva es una característica buscada para abordar con eficacia las necesidades de comunicación de los protocolos de coherencia de caché. En estas condiciones, un encaminamiento eficiente de los mensajes se convierte en un reto a superar. El objetivo de esta tesis es establecer las bases de una nueva arquitectura para encaminamiento distribuido basado en lógica que es capaz de adaptarse a cualquier topología irregular derivada de una estructura de malla 2D, proporcionando así una cobertura total para cualquier caso resultado de soportar los retos mencionados anteriormente. Para conseguirlo, en primer lugar, se parte desde una base, para luego analizar una evolución de varios mecanismos, y finalmente llegar a una implementación, que abarca varios módulos para alcanzar el objetivo mencionado anteriormente. De hecho, esta última implementación tiene por nombre eLBDR (effective Logic-Based Distributed Routing). Este trabajo cubre desde el primer mecanismo, LBDR, hasta el resto de mecanismos que han surgido progresivamente.Rodrigo Mocholí, S. (2010). Cost Effective Routing Implementations for On-chip Networks [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/8962Palanci

RiuNet

Efficient Multicast Algorithms for Mesh and Torus Networks

Author: Malani Ankit
Publication venue
Publication date: 20/12/2012
Field of study

With the increasing popularity of multicomputers, efficient way of communication within its processors has become a popular area of research. Multicomputers refer to a computer system that has multiple processors, they have high computational power and they can perform multiple tasks concurrently. Mesh and Torus are some of the commonly used network topologies in building multicomputer systems. Their performance highly depends on the underlying network communication such as multicast. Multicast is a communication method in which a message is sent from a source node to a certain number of destinations. Two major parameters used to evaluate multicast are time that a multicast process takes to deliver the message to all destinations and traffic that indicates the number of links used for this process. Research indicates that in general, it is NP- complete to find an optimal multicasting algorithm which is efficient on both time and traffic. This thesis suggests two new algorithms to achieve multicast in mesh and torus networks. Extensive simulations of these algorithms show that in practice they perform better than existing ones

Concordia University Research Repository

Exploring Adaptive Implementation of On-Chip Networks

Author: Daneshtalab Masoud
Publication venue: Annales Universitatis Turkuensis A I 429
Publication date: 25/11/2011
Field of study

As technology geometries have shrunk to the deep submicron regime, the communication delay and power consumption of global interconnections in high performance Multi- Processor Systems-on-Chip (MPSoCs) are becoming a major bottleneck. The Network-on- Chip (NoC) architecture paradigm, based on a modular packet-switched mechanism, can address many of the on-chip communication issues such as performance limitations of long interconnects and integration of large number of Processing Elements (PEs) on a chip. The choice of routing protocol and NoC structure can have a significant impact on performance and power consumption in on-chip networks. In addition, building a high performance, area and energy efficient on-chip network for multicore architectures requires a novel on-chip router allowing a larger network to be integrated on a single die with reduced power consumption. On top of that, network interfaces are employed to decouple computation resources from communication resources, to provide the synchronization between them, and to achieve backward compatibility with existing IP cores. Three adaptive routing algorithms are presented as a part of this thesis. The first presented routing protocol is a congestion-aware adaptive routing algorithm for 2D mesh NoCs which does not support multicast (one-to-many) traffic while the other two protocols are adaptive routing models supporting both unicast (one-to-one) and multicast traffic. A streamlined on-chip router architecture is also presented for avoiding congested areas in 2D mesh NoCs via employing efficient input and output selection. The output selection utilizes an adaptive routing algorithm based on the congestion condition of neighboring routers while the input selection allows packets to be serviced from each input port according to its congestion level. Moreover, in order to increase memory parallelism and bring compatibility with existing IP cores in network-based multiprocessor architectures, adaptive network interface architectures are presented to use multiple SDRAMs which can be accessed simultaneously. In addition, a smart memory controller is integrated in the adaptive network interface to improve the memory utilization and reduce both memory and network latencies. Three Dimensional Integrated Circuits (3D ICs) have been emerging as a viable candidate to achieve better performance and package density as compared to traditional 2D ICs. In addition, combining the benefits of 3D IC and NoC schemes provides a significant performance gain for 3D architectures. In recent years, inter-layer communication across multiple stacked layers (vertical channel) has attracted a lot of interest. In this thesis, a novel adaptive pipeline bus structure is proposed for inter-layer communication to improve the performance by reducing the delay and complexity of traditional bus arbitration. In addition, two mesh-based topologies for 3D architectures are also introduced to mitigate the inter-layer footprint and power dissipation on each layer with a small performance penalty.Siirretty Doriast

UTUPub

On the performance of broadcast algorithms in interconnection networks

Author: Al-Dubai A.Y.
Ould-Khaoua M.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2005
Field of study

Broadcast Communication is among the most primitive collective capabilities of any message passing network. Broadcast algorithms for the mesh have been widely reported in the literature. However, most existing algorithms have been studied within limited conditions, such as light traffic load and fixed network sizes. In other words, most of these algorithms have not been studied at different Quality of Service (QoS) levels. In contrast, this study examines the broadcast operation, taking into account the scalability, parallelism, a wide range of traffic loads through the propagation of broadcast messages. To the best of our knowledge, this study is the first to consider the issue of broadcast latency at both the network and node levels across different traffic loads. Results are shown from a comparative analysis confirming that the coded-path based broadcast algorithms exhibit superior performance characteristics over some existing algorithms

Enlighten

Analysis domain model for shared virtual environments

Author: Jordan J.
Jorge J.
Oliveira M.
Pereira J.
Steed A.
Publication venue
Publication date: 01/12/2009
Field of study

The field of shared virtual environments, which also encompasses online games and social 3D environments, has a system landscape consisting of multiple solutions that share great functional overlap. However, there is little system interoperability between the different solutions. A shared virtual environment has an associated problem domain that is highly complex raising difficult challenges to the development process, starting with the architectural design of the underlying system. This paper has two main contributions. The first contribution is a broad domain analysis of shared virtual environments, which enables developers to have a better understanding of the whole rather than the part(s). The second contribution is a reference domain model for discussing and describing solutions - the Analysis Domain Model

UCL Discovery

A Dag Based Wormhole Routing Strategy

Author: Roy John
Publication venue: LSU Digital Commons
Publication date: 01/05/1994
Field of study

The wormhole routing (WR) technique is replacing the hitherto popular storeand- forward routing in message passing multicomputers. This is because the latter has speed and node size constraints. The wormhole routing is, on the other hand, susceptible to deadlock. A few WR schemes suggested recently in the literature, concentrate on avoiding deadlock. This thesis presents a Directed Acyclic Graph (DAG) based WR technique. At low traffic levels the proposed method follows a minimal path. But the routing is adaptive at higher traffic levels. We prove that the algorithm is deadlock-free. This method is compared for its performance with a deterministic algorithm which is a de facto standard. We also compare its implementation costs with other adaptive routing algorithms and the relative merits and demerits are highlighted in the text

Louisiana State University

Chip Multiprocessor Traffic Models Providing Consistent Multicast and Spatial Distributions

Author: Lüdtke Daniel
Tutsch Dietmar
Publication venue
Publication date: 01/01/2008
Field of study

Dieser Beitrag ist mit Zustimmung des Rechteinhabers aufgrund einer (DFG geförderten) Allianz- bzw. Nationallizenz frei zugänglich.This publication is with permission of the rights owner freely accessible due to an Alliance licence and a national licence (funded by the DFG, German Research Foundation) respectively.Chip multiprocessors (CMPs) have become the center of attention in recent years. They consist of multiple processor cores on a single chip. These cores are connected on-chip by a bus or, if many cores are involved, by an appropriate network. To investigate how a multicore processor behaves dependent on the chosen network-on-chip topology, a corresponding model must be established for performance evaluation. Modeling and simulating the entire system would lead to high model complexity. Thus, it is more reasonable to exclude the cores and to simply model stochastically the detached network. The cores are replaced by traffic generators which must provide reasonable CMP traffic. It usually consists of multicasts and a particular spatial distribution. Because the traffic is not known exactly, both multicasts and spatial traffic are described as stochastic distributions for model input. The easiest way is to specify the spatial distribution of the traffic and the kind of multicasts independently of each other. However, not all multicast distributions can be achieved with a particular desired spatial distribution and vice versa. It is therefore important to check for the compatibility of the spatial distribution and the multicasts that the modeler is willing to investigate. Such a compatibility check is provided by the algorithm presented in this paper. It prevents inconsistent traffic parameters while modeling

DepositOnce

Path-Based partitioning methods for 3D Networks-on-Chip with minimal adaptive routing

Author: Daneshtalab Masoud
Ebrahimi Masoumeh
Flich Cardo José
Liljeberg Pasi
Plosila Juha
Tenhunen Hannu
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/03/2014
Field of study

© 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Combining the benefits of 3D ICs and Networks-on-Chip (NoCs) schemes provides a significant performance gain in Chip Multiprocessors (CMPs) architectures. As multicast communication is commonly used in cache coherence protocols for CMPs and in various parallel applications, the performance of these systems can be significantly improved if multicast operations are supported at the hardware level. In this paper, we present several partitioning methods for the path-based multicast approach in 3D mesh-based NoCs, each with different levels of efficiency. In addition, we develop novel analytical models for unicast and multicast traffic to explore the efficiency of each approach. In order to distribute the unicast and multicast traffic more efficiently over the network, we propose the Minimal and Adaptive Routing (MAR) algorithm for the presented partitioning methods. The analytical and experimental results show that an advantageous method named Recursive Partitioning (RP) outperforms the other approaches. RP recursively partitions the network until all partitions contain a comparable number of switches and thus the multicast traffic is equally distributed among several subsets and the network latency is considerably decreased. The simulation results reveal that the RP method can achieve performance improvement across all workloads while performance can be further improved by utilizing the MAR algorithm. Nineteen percent average and 42 percent maximum latency reduction are obtained on SPLASH-2 and PARSEC benchmarks running on a 64-core CMP.Ebrahimi, M.; Daneshtalab, M.; Liljeberg, P.; Plosila, J.; Flich Cardo, J.; Tenhunen, H. (2014). Path-Based partitioning methods for 3D Networks-on-Chip with minimal adaptive routing. IEEE Transactions on Computers. 63(3):718-733. doi:10.1109/TC.2012.255S71873363

RiuNet

Network architecture for large-scale distributed virtual environments

Author: Balikhina Tatiana V.
Publication venue: Oxford Brookes University
Publication date: 01/01/2005
Field of study

Distributed Virtual Environments (DVEs) provide 3D graphical computer generated environments with stereo sound, supporting real-time collaboration between potentially large numbers of users distributed around the world. Early DVEs has been used over local area networks (LANs). Recently with the Internet's development into the most common embedding for DVEs these distributed applications have been moved towards an exploiting IP networks. This has brought the scalability challenges into the DVEs evolution. The network bandwidth resource is the more limited resource of the DVE system and to improve the DVE's scalability it is necessary to manage carefully this resource. To achieve the saving in the network bandwidth the different types of the network traffic that is produced by the DVEs have to be considered. DVE applications demand· exchange of the data that forms different types of traffic such as a computer data type, video and audio, and a 3D data type to keep the consistency of the application's state. The problem is that the meeting of the QoS requirements of both control and continuous media traffic already have been covered by the existing research. But QoS for transfer of the 3D information has not really been considered. The 3D DVE geometry traffic is very bursty in nature and places a high demands on the network for short intervals of time due to the quite large size of the 3D models and the DVE application requirements to transmit a 3D data as quick as possible. The main motivation in carrying out the work presented in this thesis is to find a solution to improve the scalability of the DVE applications by a consideration the QoS requirements of the 3D DVE geometrical data type. In this work we are investigating the possibility to decrease the network bandwidth utilization by the 3D DVE traffic using the level of detail (LOD) concept and the active networking approach. The background work of the thesis surveys the DVE applications and the scalability requirements of the DVE systems. It also discusses the active networks and multiresolution representation and progressive transmission of the 3D data. The new active networking approach to the transmission of the 3D geometry data within the DVE systems is proposed in this thesis. This approach enhances the currently applied peer-to-peer DVE architecture by adding to the peer-to-peer multicast neny_ork layer filtering of the 3D flows an application level filtering on the active intermediate nodes. The active router keeps the application level information about the placements of users. This information is used by active routers to prune more detailed 3D data flows (higher LODs) in the multicast tree arches that are linked to the distance DVE participants. The exploration of possible benefits of exploiting the proposed active approach through the comparison with the non-active approach is carried out using the simulationbased performance modelling approach. Complex interactions between participants in DVE application and a large number of analyzed variables indicate that flexible simulation is more appropriate than mathematical modelling. To build a test bed will not be feasible. Results from the evaluation demonstrate that the proposed active approach shows potential benefits to the improvement of the DVE's scalability but the degree of improvement depends on the users' movement pattern. Therefore, other active networking methods to support the 3D DVE geometry transmission may also be required

Oxford Brookes University: RADAR

OpenGrey Repository