10 research outputs found
PROMON: a profile monitor of software applications
Software techniques can be efficiently used to increase the dependability of safety-critical applications. Many approaches are based on information redundancy to prevent data and code corruption during the software execution. This paper presents PROMON, a C++ library that exploits a new methodology based on the concept of "Programming by Contract" to detect system malfunctions. Resorting to assertions, pre- and post-conditions, and marginal programmer interventions, PROMON-based applications can reach high level of dependabilit
Validation of a software dependability tool via fault injection experiments
Presents the validation of the strategies employed in the RECCO tool to analyze a C/C++ software; the RECCO compiler scans C/C++ source code to extract information about the significance of the variables that populate the program and the code structure itself. Experimental results gathered on an Open Source Router are used to compare and correlate two sets of critical variables, one obtained by fault injection experiments, and the other applying the RECCO tool, respectively. Then the two sets are analyzed, compared, and correlated to prove the effectiveness of RECCO's methodology
Software dependability techniques validated via fault injection experiments
The present paper proposes a C/C++ source-to-source compiler able to increase the dependability properties of a given application. The adopted strategy is based on two main techniques: variable duplication/triplication and control flow checking. The validation of these techniques is based on the emulation of fault appearance by software fault injection. The chosen test case is a client-server application in charge of calculating and drawing a Mandelbrot fracta
Automated Synthesis of SEU Tolerant Architectures from OO Descriptions
SEU faults are a well-known problem in aerospace environment but recently their relevance grew up also at ground level in commodity applications coupled, in this frame, with strong economic constraints in terms of costs reduction. On the other hand, latest hardware description languages and synthesis tools allow reducing the boundary between software and hardware domains making the high-level descriptions of hardware components very similar to software programs. Moving from these considerations, the present paper analyses the possibility of reusing Software Implemented Hardware Fault Tolerance (SIHFT) techniques, typically exploited in micro-processor based systems, to design SEU tolerant architectures. The main characteristics of SIHFT techniques have been examined as well as how they have to be modified to be compatible with the synthesis flow. A complete environment is provided to automate the design instrumentation using the proposed techniques, and to perform fault injection experiments both at behavioural and gate level. Preliminary results presented in this paper show the effectiveness of the approach in terms of reliability improvement and reduced design effort
PROMON: a profile monitor of software applications
Software techniques can be efficiently used to increase the dependability of safety-critical applications. Many approaches are based on information redundancy to prevent data and code corruption during the software execution. This paper presents PROMON, a C++ library that exploits a new methodology based on the concept of âProgramming by Contractâ to detect system malfunctions. Resorting to assertions, pre- and post-conditions, and marginal programmer interventions, PROMON-based applications can reach high level of dependability
CPPC: a compilerâassisted tool for portable checkpointing of messageâpassing applications
This is the peer reviewed version of the following article: RodrĂguez, G. , MartĂn, M. J., GonzĂĄlez, P. , Touriño, J. and Doallo, R. (2010), CPPC: a compilerâassisted tool for portable checkpointing of messageâpassing applications. Concurrency Computat.: Pract. Exper., 22: 749-766. doi:10.1002/cpe.1541, which has been published in final form at https://doi.org/10.1002/cpe.1541. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Use of Self-Archived Versions.[Abstract] With the evolution of highâperformance computing toward heterogeneous, massively parallel systems, parallel applications have developed new checkpoint and restart necessities. Whether due to a failure in the execution or to a migration of the application processes to different machines, checkpointing tools must be able to operate in heterogeneous environments. However, some of the data manipulated by a parallel application are not truly portable. Examples of these include opaque state (e.g. data structures for communications support) or diversity of interfaces for a single feature (e.g. communications, I/O). Directly manipulating the underlying ad hoc representations renders checkpointing tools unable to work on different environments. Portable checkpointers usually work around portability issues at the cost of transparency: the user must provide information such as what data need to be stored, where to store them, or where to checkpoint. CPPC (ComPiler for Portable Checkpointing) is a checkpointing tool designed to feature both portability and transparency. It is made up of a library and a compiler. The CPPC library contains routines for variable level checkpointing, using portable code and protocols. The CPPC compiler helps to achieve transparency by relieving the user from timeâconsuming tasks, such as data flow and communications analyses and adding instrumentation code. This paper covers both the operation of the CPPC library and its compiler support. Experimental results using benchmarks and largeâscale real applications are included, demonstrating usability, efficiency, and portability.Miniesterio de EducaciĂłn y Ciencia; TIN2007â67537âC03Xunta de Galicia; 2006/
Flexible Rollback Recovery in Dynamic Heterogeneous Grid Computing
AbstractâLarge applications executing on Grid or cluster architectures consisting of hundreds or thousands of computational nodes create problems with respect to reliability. The source of the problems are node failures and the need for dynamic configuration over extensive runtime. This paper presents two fault-tolerance mechanisms called Theft-Induced Checkpointing and Systematic Event Logging. These are transparent protocols capable of overcoming problems associated with both benign faults, i.e., crash faults, and node or subnet volatility. Specifically, the protocols base the state of the execution on a dataflow graph, allowing for efficient recovery in dynamic heterogeneous systems as well as multithreaded applications. By allowing recovery even under different numbers of processors, the approaches are especially suitable for applications with a need for adaptive or reactionary configuration control. The low-cost protocols offer the capability of controlling or bounding the overhead. A formal cost model is presented, followed by an experimental evaluation. It is shown that the overhead of the protocol is very small, and the maximum work lost by a crashed process is small and bounded. Index TermsâGrid computing, rollback recovery, checkpointing, event logging. Ă
Recommended from our members
Transiently Powered Computers
Demand for compact, easily deployable, energy-efficient computers has driven the development of general-purpose transiently powered computers (TPCs) that lack both batteries and wired power, operating exclusively on energy harvested from their surroundings.
TPCs\u27 dependence solely on transient, harvested power offers several important design-time benefits. For example, omitting batteries saves board space and weight while obviating the need to make devices physically accessible for maintenance. However, transient power may provide an unpredictable supply of energy that makes operation difficult. A predictable energy supply is a key abstraction underlying most electronic designs. TPCs discard this abstraction in favor of opportunistic computation that takes advantage of available resources. A crucial question is how should a software-controlled computing device operate if it depends completely on external entities for power and other resources? The question poses challenges for computation, communication, storage, and other aspects of TPC design.
The main idea of this work is that software techniques can make energy harvesting a practicable form of power supply for electronic devices. Its overarching goal is to facilitate the design and operation of usable TPCs.
This thesis poses a set of challenges that are fundamental to TPCs, then pairs these challenges with approaches that use software techniques to address them. To address the challenge of computing steadily on harvested power, it describes Mementos, an energy-aware state-checkpointing system for TPCs. To address the dependence of opportunistic RF-harvesting TPCs on potentially untrustworthy RFID readers, it describes CCCP, a protocol and system for safely outsourcing data storage to RFID readers that may attempt to tamper with data. Additionally, it describes a simulator that facilitates experimentation with the TPC model, and a prototype computational RFID that implements the TPC model.
To show that TPCs can improve existing electronic devices, this thesis describes applications of TPCs to implantable medical devices (IMDs), a challenging design space in which some battery-constrained devices completely lack protection against radio-based attacks. TPCs can provide security and privacy benefits to IMDs by, for instance, cryptographically authenticating other devices that want to communicate with the IMD before allowing the IMD to use any of its battery power. This thesis describes a simplified IMD that lacks its own radio, saving precious battery energy and therefore size. The simplified IMD instead depends on an RFID-scale TPC for all of its communication functions.
TPCs are a natural area of exploration for future electronic design, given the parallel trends of energy harvesting and miniaturization. This work aims to establish and evaluate basic principles by which TPCs can operate
Técnicas de ponto de controlo e adaptação em grelhas computacionais
Dissertação de mestrado em Engenharia de InformĂĄticaA recente popularidade dos ambientes de grelhas introduziu a necessidade de suportar a execução robusta de aplicaçÔes numa gama alargada de recursos computacionais. Em contextos de grelhas computacionais, onde a fiabilidade e disponibilidade dos recursos nĂŁo Ă© garantida, as aplicaçÔes deverĂŁo ser capazes de suportar dois requisitos fundamentais: 1) tolerĂąncia a faltas; 2) adaptação aos recursos disponĂveis. As tĂ©cnicas tradicionais utilizam uma abordagem "caixa-negra", onde a camada intermĂ©dia de software (mediador) Ă© a Ășnica responsĂĄvel por assegurar estes dois requisitos. Estes tipos de abordagens possibilitam o suporte a estes serviços com uma intervenção mĂnima do programador, mas limitam a utilização de conhecimento sobre as caracterĂsticas da aplicação, visando a otimização destes serviços. Nesta tese sĂŁo apresentadas abordagens orientadas aos aspetos para suportar tolerĂąncia a faltas e adaptação dinĂąmica aos recursos em grelhas computacionais.
Nas abordagens propostas, as aplicaçÔes sĂŁo aprimoradas com capacidades de tolerĂąncia a faltas e de adaptação dinĂąmica atravĂ©s da ativação de mĂłdulos adicionais. A abordagem de tolerĂąncia a faltas utiliza a estratĂ©gia de ponto de controlo e restauro, enquanto a adaptação dinĂąmica utiliza uma variação da tĂ©cnica de sobre-decomposição. Ambas sĂŁo portĂĄveis entre sistemas operativos e restringem a quantidade de alteraçÔes necessĂĄrias no cĂłdigo base das aplicaçÔes. AlĂ©m disso, as aplicaçÔes poderĂŁo adaptar de uma execução sequencial para uma configuração multi-cluster. A adaptação pode ser realizada efetuando o ponto de controlo da aplicação e restaurando-a em diferentes mĂĄquinas, ou entĂŁo, realizada em plena execução da aplicação.Gridsâ recent popularity introduced the necessity of supporting robust execution of
applications on a wide range of computing resources. In computational gridsâ context,
where reliability and availability are not granted, applications must support two fundamental
requirements, namely, fault tolerance and adaptation to available resources.
Traditional techniques use a "black-box"approach, where middleware is the only sponsor
for those requirements. These kind of approaches enable this servicesâ support with a
minimum programmerâs intervention, but limits knowledge utilization of applicationâs
features in order to optimize services. This thesis presents aspect-oriented approaches
to support fault tolerance and dynamic adaptation to resources in computational grids.
In the proposed approaches, applications are enhanced with the ability of fault tolerance
and dynamic adaptation through additional modules activation. Fault tolerance approach
uses a check point and restore strategy while dynamic adaptation uses a variation
of the over-decomposition technique. Both are portable between operating systems and
minimize alterations to base code of applications. Moreover, applications can adapt from
a sequential execution to a multi-cluster configuration. Adaption can be performed by
checkpointing the application and restarting on a different mode or can be performed
during run-time
Compiler assisted chekpointing of message-passing applications in heterogeneous environments
[Resumen]
With the evolution of high performance computing towards heterogeneous, massively parallel systems, parallel applications have developed new checkpoint and restart necessities, Whether due to a failure in the execution or to a migration of the processes to different machines, checkpointing tools must be able to operate in heterogeneous environments. However, some of the data manipulated by a parallel application are not truly portable. Examples of these include opaque state (e.g. data structures for communications support) or diversity of interfaces for a single feature (e.g. communications, I/O).
Directly manipulating the underlying ad-hoc representations renders checkpointing tools incapable of working on different environments. Portable checkpointers usually work around portability issues at the cost of transparency: the user must provide information such as what data needs to be stored, where to store it, or where to checkpoint. CPPC (ComPiler for Portable Checkpointing) is a checkpointing tool designed to feature both portability and transparency, while preserving the scalability of the executed applications. It is made up of a library and a compiler. The CPPC library contains routines for variable level checkpointing, using portable code and protocols. The CPPC compiler achieves transparency by relieving the user from time-consuming tasks, such as performing code analyses and adding instrumentation code