Search CORE

3,507 research outputs found

Optimal fault-tolerant placement of relay nodes in a mission critical wireless network

Author: Di Pompeo Silvia
Finzi Alberto
Guarneri Riccardo
Lanciotti Filiberto
Mancini Toni
Scialanca Agostino
Tronci Enrico
Publication venue
Publication date: 01/01/2018
Field of study

The operations of many critical infrastructures (e.g., airports) heavily depend on proper functioning of the radio communication network supporting operations. As a result, such a communication network is indeed a mission-critical communication network that needs adequate protection from external electromagnetic interferences. This is usually done through radiogoniometers. Basically, by using at least three suitably deployed radiogoniometers and a gateway gathering information from them, sources of electromagnetic emissions that are not supposed to be present in the monitored area can be localised. Typically, relay nodes are used to connect radiogoniometers to the gateway. As a result, some degree of fault-tolerance for the network of relay nodes is essential in order to offer a reliable monitoring. On the other hand, deployment of relay nodes is typically quite expensive. As a result, we have two conflicting requirements: minimise costs while guaranteeing a given fault-tolerance. In this paper address the problem of computing a deployment for relay nodes that minimises the relay node network cost while at the same time guaranteeing proper working of the network even when some of the relay nodes (up to a given maximum number) become faulty (fault-tolerance). We show that the above problem can be formulated as a Mixed Integer Linear Programming (MILP) as well as a Pseudo-Boolean Satisfiability (PB-SAT) optimisation problem and present experimental results com- paring the two approaches on realistic scenarios

Archivio della ricerca - Università degli studi di Napoli Federico II

Archivio della ricerca- Università di Roma La Sapienza

Integration of tools for the Design and Assessment of High-Performance, Highly Reliable Computing Systems (DAHPHRS), phase 1

Author: Baker R.
Frank G.
Gray G.
Scheper C.
Yalamanchili S.
Publication venue
Publication date
Field of study

Systems for Space Defense Initiative (SDI) space applications typically require both high performance and very high reliability. These requirements present the systems engineer evaluating such systems with the extremely difficult problem of conducting performance and reliability trade-offs over large design spaces. A controlled development process supported by appropriate automated tools must be used to assure that the system will meet design objectives. This report describes an investigation of methods, tools, and techniques necessary to support performance and reliability modeling for SDI systems development. Models of the JPL Hypercubes, the Encore Multimax, and the C.S. Draper Lab Fault-Tolerant Parallel Processor (FTPP) parallel-computing architectures using candidate SDI weapons-to-target assignment algorithms as workloads were built and analyzed as a means of identifying the necessary system models, how the models interact, and what experiments and analyses should be performed. As a result of this effort, weaknesses in the existing methods and tools were revealed and capabilities that will be required for both individual tools and an integrated toolset were identified

NASA Technical Reports Server

Integrating Scale Out and Fault Tolerance in Stream Processing using Operator State Management

Author: Fernandez RC
Kalyvianaki E
Migliavacca M
Pietzuch P
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 01/01/2013
Field of study

As users of big data applications expect fresh results, we witness a new breed of stream processing systems (SPS) that are designed to scale to large numbers of cloud-hosted machines. Such systems face new challenges: (i) to benefit from the pay-as-you-go model of cloud computing, they must scale out on demand, acquiring additional virtual machines (VMs) and parallelising operators when the workload increases; (ii) failures are common with deployments on hundreds of VMs - systems must be fault-tolerant with fast recovery times, yet low per-machine overheads. An open question is how to achieve these two goals when stream queries include stateful operators, which must be scaled out and recovered without affecting query results. Our key idea is to expose internal operator state explicitly to the SPS through a set of state management primitives. Based on them, we describe an integrated approach for dynamic scale out and recovery of stateful operators. Externalised operator state is checkpointed periodically by the SPS and backed up to upstream VMs. The SPS identifies individual operator bottlenecks and automatically scales them out by allocating new VMs and partitioning the check-pointed state. At any point, failed operators are recovered by restoring checkpointed state on a new VM and replaying unprocessed tuples. We evaluate this approach with the Linear Road Benchmark on the Amazon EC2 cloud platform and show that it can scale automatically to a load factor of L=350 with 50 VMs, while recovering quickly from failures. Copyright © 2013 ACM

CiteSeerX

City Research Online

Crossref

Spiral - Imperial College Digital Repository

Kent Academic Repository

Adaptive Routing Approaches for Networked Many-Core Systems

Author: Ebrahimi Masoumeh
Publication venue: Annales Universitatis Turkuensis A I 465
Publication date: 28/06/2013
Field of study

Through advances in technology, System-on-Chip design is moving towards integrating tens to hundreds of intellectual property blocks into a single chip. In such a many-core system, on-chip communication becomes a performance bottleneck for high performance designs. Network-on-Chip (NoC) has emerged as a viable solution for the communication challenges in highly complex chips. The NoC architecture paradigm, based on a modular packet-switched mechanism, can address many of the on-chip communication challenges such as wiring complexity, communication latency, and bandwidth. Furthermore, the combined benefits of 3D IC and NoC schemes provide the possibility of designing a high performance system in a limited chip area. The major advantages of 3D NoCs are the considerable reductions in average latency and power consumption. There are several factors degrading the performance of NoCs. In this thesis, we investigate three main performance-limiting factors: network congestion, faults, and the lack of efficient multicast support. We address these issues by the means of routing algorithms. Congestion of data packets may lead to increased network latency and power consumption. Thus, we propose three different approaches for alleviating such congestion in the network. The first approach is based on measuring the congestion information in different regions of the network, distributing the information over the network, and utilizing this information when making a routing decision. The second approach employs a learning method to dynamically find the less congested routes according to the underlying traffic. The third approach is based on a fuzzy-logic technique to perform better routing decisions when traffic information of different routes is available. Faults affect performance significantly, as then packets should take longer paths in order to be routed around the faults, which in turn increases congestion around the faulty regions. We propose four methods to tolerate faults at the link and switch level by using only the shortest paths as long as such path exists. The unique characteristic among these methods is the toleration of faults while also maintaining the performance of NoCs. To the best of our knowledge, these algorithms are the first approaches to bypassing faults prior to reaching them while avoiding unnecessary misrouting of packets. Current implementations of multicast communication result in a significant performance loss for unicast traffic. This is due to the fact that the routing rules of multicast packets limit the adaptivity of unicast packets. We present an approach in which both unicast and multicast packets can be efficiently routed within the network. While suggesting a more efficient multicast support, the proposed approach does not affect the performance of unicast routing at all. In addition, in order to reduce the overall path length of multicast packets, we present several partitioning methods along with their analytical models for latency measurement. This approach is discussed in the context of 3D mesh networks.Siirretty Doriast

UTUPub

Optimal routing in double loop networks

Author: Gutierrez Jaime
Gómez Domingo
Ibeas Álvar
Publication venue: Elsevier Ltd.
Publication date: 22/08/2007
Field of study

AbstractIn this paper, we study the problem of finding the shortest path in circulant graphs with an arbitrary number of jumps. We provide algorithms specifically tailored for weighted undirected and directed circulant graphs with two jumps which compute the shortest path. Our method only requires O(logN) arithmetic operations and the total bit complexity is O(log2NloglogNlogloglogN), where N is the number of the graph’s vertices. This elementary and efficient shortest path algorithm has been derived from the Closest Vector Problem (CVP) of lattices in dimension two and with an ℓ1 norm

Elsevier - Publisher Connector

Improving Scalability and Usability of Parallel Runtime Environments for High Availability and High Performance Systems

Author: Angskun Thara
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/12/2007
Field of study

The number of processors embedded in high performance computing platforms is growing daily to solve larger and more complex problems. Hence, parallel runtime environments have to support and adapt to the underlying platforms that require scalability and fault management in more and more dynamic environments. This dissertation aims to analyze, understand and improve the state of the art mechanisms for managing highly dynamic, large scale applications. This dissertation demonstrates that the use of new scalable and fault-tolerant topologies, combined with rerouting techniques, builds parallel runtime environments, which are able to efficiently and reliably deliver sets of information to a large number of processes. Several important graph properties are provided to illustrate the theoretical capability of these topologies in terms of both scalability and fault-tolerance, such as reasonable degree, regular graph, low diameter, symmetric graph, low cost factor, low message traffic density, optimal connectivity, low fault-diameter and strongly resilient. The dissertation builds a communication framework based on these topologies to support parallel runtime environments. Such a framework can handle multiple types of messages, e.g., unicast, multicast, broadcast and all-gather. Additionally, the communication framework has been formally verified to work in both normal and failure circumstances without creating any of the common problems such as broadcast storm, deadlock and non-progress cycle

University of Tennessee, Knoxville: Trace