Search CORE

14 research outputs found

An Analysis of Failure Handling in Chameleon, A Framework for Supporting Cost-Effective Fault Tolerant Services

Author: Haakensen Erik Edward
Publication venue
Publication date
Field of study

The desire for low-cost reliable computing is increasing. Most current fault tolerant computing solutions are not very flexible, i.e., they cannot adapt to reliability requirements of newly emerging applications in business, commerce, and manufacturing. It is important that users have a flexible, reliable platform to support both critical and noncritical applications. Chameleon, under development at the Center for Reliable and High-Performance Computing at the University of Illinois, is a software framework. for supporting cost-effective adaptable networked fault tolerant service. This thesis details a simulation of fault injection, detection, and recovery in Chameleon. The simulation was written in C++ using the DEPEND simulation library. The results obtained from the simulation included the amount of overhead incurred by the fault detection and recovery mechanisms supported by Chameleon. In addition, information about fault scenarios from which Chameleon cannot recover was gained. The results of the simulation showed that both critical and noncritical applications can be executed in the Chameleon environment with a fairly small amount of overhead. No single point of failure from which Chameleon could not recover was found. Chameleon was also found to be capable of recovering from several multiple failure scenarios

NASA Technical Reports Server

The Holonic Production Unit: an Approach for an Architecture of Embedded Production Process

Author: Dulce M. Rivero
Edgar Chac&#243
Isabel Besembel
Juan Cardillo
Publication venue: 'IntechOpen'
Publication date: 01/10/2008
Field of study

IntechOpen

Designing Efficient Network Interfaces For System Area Networks

Author: Rzymianowicz Lars
Publication venue: Universität Mannheim
Publication date: 01/01/2002
Field of study

The network is the key component of a Cluster of Workstations/PCs. Its performance, measured in terms of bandwidth and latency, has a great impact on the overall system performance. It quickly became clear that traditional WAN/LAN technology is not too well suited for interconnecting powerful nodes into a cluster. Their poor performance too often slows down communication-intensive applications. This observation led to the birth of a new class of networks called System Area Networks (SAN). The ATOLL network introduces a new optimized architecture for SANs. On a single chip, not one but four network interfaces (NI) have been implemented, together with an on-chip 4x4 full-duplex switch and four link interfaces. This unique "Network on a Chip" architecture is best suited for interconnecting SMP nodes, where multiple CPUs are given an exclusive NI and do not have to share a single interface. It also removes the need for any additional switching hardware, since the four byte-wide full-duplex links can be connected by cables with neighbor nodes in an arbitrary network topology

CiteSeerX

MAnnheim DOCument Server

Chameleon: A Software Infrastructure and Testbed for Reliable High-Speed Networked Computing

Author: Bagchi S.
Iyer R.K.
Kalbarczyk Z.
Publication venue: Coordinated Science Laboratory, University of Illinois at Urbana-Champaign
Publication date: 01/07/1997
Field of study

Coordinated Science Laboratory was formerly known as Control Systems LaboratoryNASA / NAG 1-61

Illinois Digital Environment for Access to Learning and Scholarship Repository

Ytelseanalyse av FRoots og Dimension-Order

Author: Dybvik Bjørn Arne
Publication venue
Publication date: 01/01/2007
Field of study

Oppgaven har til hensikt å foreta en ytelseanalyse av to rutingalgoritmer; FRoots og Dimension-Order. Dimension-Order er egentlig ikke navnet på en spesifikk rutingalgoritme, men heller navnet på en kategori rutingalgoritmer som ruter på en spesiell måte. FRoots er den mer sofistikerte rutingalgoritmen og når oppgaven ble utdelt, trodde forfatter at han visste utfallet av ytelseanalysen. Algoritmene ble sammenlignet ved hjelp av en simulator (for øvrig utviklet ved institusjonen hvor jeg skrev oppgaven). Det var en simulator utviklet på J-Sim. For å ytelseanalysere måtte det brukes en topologi. En topologi sier oss noe om hvordan et nettverk med noder og linker er lagt ut fysisk. Det er brukt en 2D mesh med to størrelser: 4x4 og 8x8. Dette for å se om størrelsen på nettverket har noe å si for algoritmene. I tillegg blir simuleringene kjørt med forskjellige trafikkmønster (uniformt og parvis). Det uniforme trafikkmønsteret sier at en node kan kommunisere med alle andre noder under en simulering, mens det parvise sier at en node kun kan kommunisere med en node (den kan altså ikke skifte). Resultatet av analysen ble, i korthet, at FRoots yter best når det kjøres med parvis trafikkmønster, mens det er best å bruke Dimension-Order ruting ved uniformt. Størrelsen på nettverket har ingenting å si for utfallet

NORA - Norwegian Open Research Archives

Microkernel mechanisms for improving the trustworthiness of commodity hardware

Author: Shen Yanyan
Publication venue: UNSW, Sydney
Publication date: 01/01/2019
Field of study

The thesis presents microkernel-based software-implemented mechanisms for improving the trustworthiness of computer systems based on commercial off-the-shelf (COTS) hardware that can malfunction when the hardware is impacted by transient hardware faults. The hardware anomalies, if undetected, can cause data corruptions, system crashes, and security vulnerabilities, significantly undermining system dependability. Specifically, we adopt the single event upset (SEU) fault model and address transient CPU or memory faults. We take advantage of the functional correctness and isolation guarantee provided by the formally verified seL4 microkernel and hardware redundancy provided by multicore processors, design the redundant co-execution (RCoE) architecture that replicates a whole software system (including the microkernel) onto different CPU cores, and implement two variants, loosely-coupled redundant co-execution (LC-RCoE) and closely-coupled redundant co-execution (CC-RCoE), for the ARM and x86 architectures. RCoE treats each replica of the software system as a state machine and ensures that the replicas start from the same initial state, observe consistent inputs, perform equivalent state transitions, and thus produce consistent outputs during error-free executions. Compared with other software-based error detection approaches, the distinguishing feature of RCoE is that the microkernel and device drivers are also included in redundant co-execution, significantly extending the sphere of replication (SoR). Based on RCoE, we introduce two kernel mechanisms, fingerprint validation and kernel barrier timeout, detecting fault-induced execution divergences between the replicated systems, with the flexibility of tuning the error detection latency and coverage. The kernel error-masking mechanisms built on RCoE enable downgrading from triple modular redundancy (TMR) to dual modular redundancy (DMR) without service interruption. We run synthetic benchmarks and system benchmarks to evaluate the performance overhead of the approach, observe that the overhead varies based on the characteristics of workloads and the variants (LC-RCoE or CC-RCoE), and conclude that the approach is applicable for real-world applications. The effectiveness of the error detection mechanisms is assessed by conducting fault injection campaigns on real hardware, and the results demonstrate compelling improvement

UNSWorks

Diseño de mecanismos eficientes para la gestión de subredes infiniband

Author: Bermúdez Marín Aurelio
Publication venue: Ediciones de la Universidad de castilla-La Mancha
Publication date: 01/01/2004
Field of study

El objetivo principal de esta tesis doctoral es contribuir al desarrollo de mecanismos de asimilación de cambios toplogicos para la arquitectura de red infiniband. En una primera fase, se ha diseñado y evaluado un primer prototipo de mecanismo de gestión. Su evaluación nos ha permitido identificar los principales cuellos de botella en el proceso de adaptación al cambio. A continuación, se han propuesto mecanismos optimizados para cada una de las tareas involucradas en dicho proceso: la detección del cambio topológico, la adquisición de la nueva topología de la red, el cómputo de nuevas rutas y la distribución de tables de encaminamiento actualizadas a los conmutadores de la red. El resultado es un mecanismo de gestión totalmente compatible con la especificación de infiniband, fácilmente implementable en sistemas comerciales, y casi transparente desde el punto de vista de las aplicaciones a las que da servicio la red

Universidad de Castilla-La Mancha: Repositorio Universitario Institucional de Recursos Abiertos (RUIdeRA)

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Recommended from our members

Performance analysis and improvement of InfiniBand networks. Modelling and effective Quality-of-Service mechanisms for interconnection networks in cluster computing systems.

Author: Yan Shihang
Publication venue: Department of Computing, School of Computing, Informatics and Media
Publication date: 01/01/2012
Field of study

The InfiniBand Architecture (IBA) network has been proposed as a new industrial standard with high-bandwidth and low-latency suitable for constructing high-performance interconnected cluster computing systems. This architecture replaces the traditional bus-based interconnection with a switch-based network for the server Input-Output (I/O) and inter-processor communications. The efficient Quality-of-Service (QoS) mechanism is fundamental to ensure the import at QoS metrics, such as maximum throughput and minimum latency, leaving aside other aspects like guarantee to reduce the delay, blocking probability, and mean queue length, etc. Performance modelling and analysis has been and continues to be of great theoretical and practical importance in the design and development of communication networks. This thesis aims to investigate efficient and cost-effective QoS mechanisms for performance analysis and improvement of InfiniBand networks in cluster-based computing systems. Firstly, a rate-based source-response link-by-link admission and congestion control function with improved Explicit Congestion Notification (ECN) packet marking scheme is developed. This function adopts the rate control to reduce congestion of multiple-class traffic. Secondly, a credit-based flow control scheme is presented to reduce the mean queue length, throughput and response time of the system. In order to evaluate the performance of this scheme, a new queueing network model is developed. Theoretical analysis and simulation experiments show that these two schemes are quite effective and suitable for InfiniBand networks. Finally, to obtain a thorough and deep understanding of the performance attributes of InfiniBand Architecture network, two efficient threshold function flow control mechanisms are proposed to enhance the QoS of InfiniBand networks; one is Entry Threshold that sets the threshold for each entry in the arbitration table, and other is Arrival Job Threshold that sets the threshold based on the number of jobs in each Virtual Lane. Furthermore, the principle of Maximum Entropy is adopted to analyse these two new mechanisms with the Generalized Exponential (GE)-Type distribution for modelling the inter-arrival times and service times of the input traffic. Extensive simulation experiments are conducted to validate the accuracy of the analytical models

Bradford Scholars

Conceptual Model and Architecture of MAFTIA

Author: Adelsbach A.
Cachin C.
Creese S.
Deswarte Y.
Kursawe K.
Laprie J.-C.
Powell David
Randell B.
Riodan J.
Ryan P.
Simmionds W.
Stroud Robert J.
Veríssimo Paulo
Waidner M.
Wespi A.
Publication venue: Department of Informatics, University of Lisbon
Publication date: 01/01/2003
Field of study

This deliverable builds on the work reported in [MAFTIA 2000] and [Powell and Stroud 2001]. It contains a further refinement of the MAFTIA conceptual model and a revised discussion of the MAFTIA architecture. It also introduces the work done in MAFTIA on verification and assessment of security properties, which is reported on in more detail in [Adelsbach and Creese 2003

Universidade de Lisboa: Repositório.UL