Search CORE

1,612 research outputs found

Speeding-up the fault-tolerance analysis of interconnection networks

Author: Bermúdez Garzón Diego Fernando
Gómez Requena Crispín
Gómez Requena María Engracia
López Rodríguez Pedro Juan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 20/07/2015
Field of study

© 2015 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other worksAnalyzing the fault-tolerance of interconnection networks implies checking the connectivity of each sourcedestination pair. The size of the exploration space of such operation skyrockets with the network size and with the number of link faults. However, this problem is highly parallelizable since the exploration of each path between a source–destination pair is independent of the other paths. This paper presents an approach to analyze the fault-tolerance degree of multistage interconnection networks using GPUs in order to speed-up it. This approach uses CUDA as parallel programming tool on a GPU in order to take advantage of all available cores. Results show that the execution time of the fault-tolerance exploration can be significantly reduced.This work was supported by the Spanish Ministerio de Economía y Competitividad (MINECO) and by FEDER funds under Grant TIN2012-38341-C04-01.Bermúdez Garzón, DF.; Gómez Requena, C.; López Rodríguez, PJ.; Gómez Requena, ME. (2015). Speeding-up the fault-tolerance analysis of interconnection networks. IEEE. https://doi.org/10.1109/HPCSim.2015.7237035

Crossref

RiuNet

Fault tolerant clos network

Author: Ahluwalia Preet Mohan S.
Publication venue: Digital Commons @ NJIT
Publication date: 31/01/1991
Field of study

Multistage interconnection networks, or MINs, provide paths between functional modules in multiprocessor systems. The MINs are usually segmented into several stages. Each stage connects inputs to appropriate links of the next stage so that the cumulative effect of all the stages satisfies input-output connection requirements. This thesis deals with a fault tolerant Clos network. The fault tolerance technique involves addition of extra switches per stage to compensate for any switch failure The reliability analysis of both ordinary and fault tolerant Clos networks is presented. The optimal number of extra switches required to get the best reliability results has been analyzed

Digital Commons @ New Jersey Institute of Technology (NJIT)

Scalable parallel communications

Author: Foudriat E. C.
Khanna S.
Maly K.
Mukkamala R.
Overstreet C. M.
Sekhar Y. S.
Zubair M.
Publication venue
Publication date
Field of study

Coarse-grain parallelism in networking (that is, the use of multiple protocol processors running replicated software sending over several physical channels) can be used to provide gigabit communications for a single application. Since parallel network performance is highly dependent on real issues such as hardware properties (e.g., memory speeds and cache hit rates), operating system overhead (e.g., interrupt handling), and protocol performance (e.g., effect of timeouts), we have performed detailed simulations studies of both a bus-based multiprocessor workstation node (based on the Sun Galaxy MP multiprocessor) and a distributed-memory parallel computer node (based on the Touchstone DELTA) to evaluate the behavior of coarse-grain parallelism. Our results indicate: (1) coarse-grain parallelism can deliver multiple 100 Mbps with currently available hardware platforms and existing networking protocols (such as Transmission Control Protocol/Internet Protocol (TCP/IP) and parallel Fiber Distributed Data Interface (FDDI) rings); (2) scale-up is near linear in n, the number of protocol processors, and channels (for small n and up to a few hundred Mbps); and (3) since these results are based on existing hardware without specialized devices (except perhaps for some simple modifications of the FDDI boards), this is a low cost solution to providing multiple 100 Mbps on current machines. In addition, from both the performance analysis and the properties of these architectures, we conclude: (1) multiple processors providing identical services and the use of space division multiplexing for the physical channels can provide better reliability than monolithic approaches (it also provides graceful degradation and low-cost load balancing); (2) coarse-grain parallelism supports running several transport protocols in parallel to provide different types of service (for example, one TCP handles small messages for many users, other TCP's running in parallel provide high bandwidth service to a single application); and (3) coarse grain parallelism will be able to incorporate many future improvements from related work (e.g., reduced data movement, fast TCP, fine-grain parallelism) also with near linear speed-ups

NASA Technical Reports Server

Algorithms in fault-tolerant CLOS networks

Author: Lee Hyunyeop
Publication venue: Digital Commons @ NJIT
Publication date: 31/10/1994
Field of study

Digital Commons @ New Jersey Institute of Technology (NJIT)

Fault-tolerant interconnection networks for multiprocessor systems

Author: Nassar Hamed Mohamed
Publication venue: Digital Commons @ NJIT
Publication date: 31/05/1989
Field of study

Interconnection networks represent the backbone of multiprocessor systems. A failure in the network, therefore, could seriously degrade the system performance. For this reason, fault tolerance has been regarded as a major consideration in interconnection network design. This thesis presents two novel techniques to provide fault tolerance capabilities to three major networks: the Baseline network, the Benes network and the Clos network. First, the Simple Fault Tolerance Technique (SFT) is presented. The SFT technique is in fact the result of merging two widely known interconnection mechanisms: a normal interconnection network and a shared bus. This technique is most suitable for networks with small switches, such as the Baseline network and the Benes network. For the Clos network, whose switches may be large for the SFT, another technique is developed to produce the Fault-Tolerant Clos (FTC) network. In the FTC, one switch is added to each stage. The two techniques are described and thoroughly analyzed

Digital Commons @ New Jersey Institute of Technology (NJIT)

FireNN: Neural Networks Reliability Evaluation on Hybrid Platforms

Author: Azimi S.
De Sio C.
Sterpone L.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2022
Field of study

The growth of neural networks complexity has led to adopt of hardware-accelerators to cope with the computational power required by the new architectures. The possibility to adapt the network for different platforms enhanced the interests of safety-critical applications. The reliability evaluation of neural networks are still premature and requires platforms to measure the safety standards required by mission-critical applications. For this reason, the interest in studying the reliability of neural networks is growing. We propose a new approach for evaluating the resiliency of neural networks by using hybrid platforms. The approach relies on the reconfigurable hardware for emulating the target hardware platform and performing the fault injection process. The main advantage of the proposed approach is to involve the on-hardware execution of the neural network in the reliability analysis without any intrusiveness into the network algorithm and addressing specific fault models. The implementation of FireNN, the platform based on the proposed approach, is described in the paper. Experimental analyses are performed using fault injection on AlexNet. The analyses are carried out using the FireNN platform and the results are compared with the outcome of traditional software-level evaluations. Results are discussed considering the insight into the hardware level achieved using FireNN

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Low-Memory Techniques for Routing and Fault-Tolerance on the Fat-Tree Topology

Author: Gómez Requena Crispín
Publication venue: 'Universitat Politecnica de Valencia'
Publication date: 08/11/2010
Field of study

Actualmente, los clústeres de PCs están considerados como una alternativa eficiente a la hora de construir supercomputadores en los que miles de nodos de computación se conectan mediante una red de interconexión. La red de interconexión tiene que ser diseñada cuidadosamente, puesto que tiene una gran influencia sobre las prestaciones globales del sistema. Dos de los principales parámetros de diseño de las redes de interconexión son la topología y el encaminamiento. La topología define la interconexión de los elementos de la red entre sí, y entre éstos y los nodos de computación. Por su parte, el encaminamiento define los caminos que siguen los paquetes a través de la red. Las prestaciones han sido tradicionalmente la principal métrica a la hora de evaluar las redes de interconexión. Sin embargo, hoy en día hay que considerar dos métricas adicionales: el coste y la tolerancia a fallos. Las redes de interconexión además de escalar en prestaciones también deben hacerlo en coste. Es decir, no sólo tienen que mantener su productividad conforme aumenta el tamaño de la red, sino que tienen que hacerlo sin incrementar sobremanera su coste. Por otra parte, conforme se incrementa el número de nodos en las máquinas de tipo clúster, la red de interconexión debe crecer en concordancia. Este incremento en el número de elementos de la red de interconexión aumenta la probabilidad de aparición de fallos, y por lo tanto, la tolerancia a fallos es prácticamente obligatoria para las redes de interconexión actuales. Esta tesis se centra en la topología fat-tree, ya que es una de las topologías más comúnmente usadas en los clústeres. El objetivo de esta tesis es aprovechar sus características particulares para proporcionar tolerancia a fallos y un algoritmo de encaminamiento capaz de equilibrar la carga de la red proporcionando una buena solución de compromiso entre las prestaciones y el coste.Gómez Requena, C. (2010). Low-Memory Techniques for Routing and Fault-Tolerance on the Fat-Tree Topology [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/8856Palanci

Crossref

RiuNet