23 research outputs found
Engineering holistic fault tolerance
PhD ThesisFault-tolerant software should be engineered to be maintainable as well as efficient with
regards to performance and resources. These characteristics should be evaluated before
deployment of the software. However, the main focus is very often made on the functional
features of the application, whereas fault tolerance mechanisms are neglected. As a result,
they are often neither maintainable nor efficient. The concept of Holistic Fault Tolerance
was introduced to deal with these issues. It is a novel crosscutting approach to the
design and implementation of fault tolerance mechanisms for developing reliable software
applications that meet non-functional requirements, such as performance and resource
utilisation.
The thesis starts with the description of problems that were motivating for the idea of
Holistic Fault Tolerance. These problems are related to resource utilisation requirements
of modern computer-based systems, since more resources like hardware components and
energy are required to process modern computational tasks and ensure performance and
reliability of the computation. Moreover, the complexity of these systems grows, leading
to maintainability deterioration, especially of those system parts, which are responsible
for satisfying non-functional requirements, such as reliability, performance and resource
usage.
After analysis of the problems and motivations, the engineering approach to Holistic Fault
Tolerance is introduced and main engineering steps are defined. Next, an architectural
pattern for Holistic Fault Tolerance is presented. The method to refine the proposed architecture and ensure efficiency of a particular system under development is demonstrated
during the modelling step. Then the implementation of Holistic Fault Tolerance based on
the proposed architecture and modelling is described in detail.
Finally, the Holistic Fault Tolerance architecture is evaluated with regards to efficiency
and maintainability. The evaluation demonstrates that Holistic Fault Tolerance assists
in meeting the non-functional requirements, makes fault tolerance mechanisms easier to
maintain and ensures higher modularity of the source cod
Energy balance between voltage-frequency scaling and resilience for linear algebra routines on low-power multicore architectures
[EN] Near Threshold Voltage (NTV) computing has been recently proposed as a technique to save energy, at the cost of incurring higher error rates including, among others, Silent Data Corruption (SDC). In this paper, we evaluate the energy efficiency of dense linear algebra routines using several low-power multicore processors and we analyze whether the potential energy reduction achieved when scaling the processor to operate at a low voltage compensates the cost of integrating a fault tolerance mechanism that tackles SDC. Our study targets algorithmic-based fault-tolerant versions of the dense matrix-vector and matrix(matrix) multiplication kernels (GEMV and GEMM, respectively), using the BLIS framework, as well as an implementation of the LU factorization with partial pivoting built on top of GEMM, Furthermore, we tailor the study for a number of representative 32-bit and 64-bit multicore processors from ARM that were specifically designed for energy efficiency. (C) 2017 Elsevier B.V. All rights reserved.The researchers from Universidad Jaume I were supported by project CICYT TIN2014-53495-R of MINECO and FEDER, and the FPU program of MECD. The researcher from Universitat Politecnica de Catalunya was supported by projects TIN2015-65316-P from the Spanish Ministry of Education and 2014 SGR 1051 from the Generalitat de Catalunya, Dep. d'Innovacio, Universitats i Empresa.Catalán, S.; Herrero, JR.; Quintana OrtĂ, ES.; RodrĂguez-Sánchez, R. (2018). Energy balance between voltage-frequency scaling and resilience for linear algebra routines on low-power multicore architectures. Parallel Computing. 73:28-39. https://doi.org/10.1016/j.parco.2017.05.004S28397