1,592 research outputs found
Quantifying fault recovery in multiprocessor systems
Various aspects of reliable computing are formalized and quantified with emphasis on efficient fault recovery. The mathematical model which proves to be most appropriate is provided by the theory of graphs. New measures for fault recovery are developed and the value of elements of the fault recovery vector are observed to depend not only on the computation graph H and the architecture graph G, but also on the specific location of a fault. In the examples, a hypercube is chosen as a representative of parallel computer architecture, and a pipeline as a typical configuration for program execution. Dependability qualities of such a system is defined with or without a fault. These qualities are determined by the resiliency triple defined by three parameters: multiplicity, robustness, and configurability. Parameters for measuring the recovery effectiveness are also introduced in terms of distance, time, and the number of new, used, and moved nodes and edges
Estimating Blood Pressure from Photoplethysmogram Signal and Demographic Features using Machine Learning Techniques
Hypertension is a potentially unsafe health ailment, which can be indicated
directly from the Blood pressure (BP). Hypertension always leads to other
health complications. Continuous monitoring of BP is very important; however,
cuff-based BP measurements are discrete and uncomfortable to the user. To
address this need, a cuff-less, continuous and a non-invasive BP measurement
system is proposed using Photoplethysmogram (PPG) signal and demographic
features using machine learning (ML) algorithms. PPG signals were acquired from
219 subjects, which undergo pre-processing and feature extraction steps. Time,
frequency and time-frequency domain features were extracted from the PPG and
their derivative signals. Feature selection techniques were used to reduce the
computational complexity and to decrease the chance of over-fitting the ML
algorithms. The features were then used to train and evaluate ML algorithms.
The best regression models were selected for Systolic BP (SBP) and Diastolic BP
(DBP) estimation individually. Gaussian Process Regression (GPR) along with
ReliefF feature selection algorithm outperforms other algorithms in estimating
SBP and DBP with a root-mean-square error (RMSE) of 6.74 and 3.59 respectively.
This ML model can be implemented in hardware systems to continuously monitor BP
and avoid any critical health conditions due to sudden changes.Comment: Accepted for publication in Sensor, 14 Figures, 14 Table
A survey of parallel algorithms for fractal image compression
This paper presents a short survey of the key research work that has been undertaken in the application of parallel algorithms for Fractal image compression. The interest in fractal image compression techniques stems from their ability to achieve high compression ratios whilst maintaining a very high quality in the reconstructed image. The main drawback of this compression method is the very high computational cost that is associated with the encoding phase. Consequently, there has been significant interest in exploiting parallel computing architectures in order to speed up this phase, whilst still maintaining the advantageous features of the approach. This paper presents a brief introduction to fractal image compression, including the iterated function system theory upon
which it is based, and then reviews the different techniques that have been, and can be, applied in order to parallelize the compression algorithm
Review of Fault Mitigation Approaches for Deep Neural Networks for Computer Vision in Autonomous Driving
The aim of this work is to identify and present challenges and risks related to the employment of DNNs in Computer Vision for Autonomous Driving. Nowadays one of the major technological challenges is to choose the right technology among the abundance that is available on the market.
Specifically, in this thesis it is collected a synopsis of the state-of-the-art architectures, techniques and methodologies adopted for building fault-tolerant hardware and ensuring robustness in DNNs-based Computer Vision applications for Autonomous Driving
Periodic Application of Concurrent Error Detection in Processor Array Architectures
Processor arrays can provide an attractive architecture for some applications. Featuring modularity, regular interconnection and high parallelism, such arrays are well-suited for VLSI/WSI implementations, and applications with high computational requirements, such as real-time signal processing. Preserving the integrity of results can be of paramount importance for certain applications. In these cases, fault tolerance should be used to ensure reliable delivery of a system's service. One aspect of fault tolerance is the detection of errors caused by faults. Concurrent error detection (CED) techniques offer the advantage that transient and intermittent faults may be detected with greater probability than with off-line diagnostic tests. Applying time-redundant CED techniques can reduce hardware redundancy costs. However, most time-redundant CED techniques degrade a system's performance
Fault tolerance issues in nanoelectronics
The astonishing success story of microelectronics cannot go on indefinitely. In fact, once
devices reach the few-atom scale (nanoelectronics), transient quantum effects are expected
to impair their behaviour. Fault tolerant techniques will then be required. The aim of this
thesis is to investigate the problem of transient errors in nanoelectronic devices. Transient
error rates for a selection of nanoelectronic gates, based upon quantum cellular automata
and single electron devices, in which the electrostatic interaction between electrons is used
to create Boolean circuits, are estimated. On the bases of such results, various fault tolerant
solutions are proposed, for both logic and memory nanochips. As for logic chips, traditional
techniques are found to be unsuitable. A new technique, in which the voting approach of
triple modular redundancy (TMR) is extended by cascading TMR units composed of
nanogate clusters, is proposed and generalised to other voting approaches. For memory
chips, an error correcting code approach is found to be suitable. Various codes are
considered and a lookup table approach is proposed for encoding and decoding. We are
then able to give estimations for the redundancy level to be provided on nanochips, so as to
make their mean time between failures acceptable. It is found that, for logic chips, space
redundancies up to a few tens are required, if mean times between failures have to be of the
order of a few years. Space redundancy can also be traded for time redundancy. As for
memory chips, mean times between failures of the order of a few years are found to imply
both space and time redundancies of the order of ten
Algorithm-Based Fault Tolerance for Two-Sided Dense Matrix Factorizations
The mean time between failure (MTBF) of large supercomputers is decreasing, and future exascale computers are expected to have a MTBF of around 30 minutes. Therefore, it is urgent to prepare important algorithms for future machines with such a short MTBF. Eigenvalue problems (EVP) and singular value problems (SVP) are common in engineering and scientific research. Solving EVP and SVP numerically involves two-sided matrix factorizations: the Hessenberg reduction, the tridiagonal reduction, and the bidiagonal reduction. These three factorizations are computation intensive, and have long running times. They are prone to suffer from computer failures.
We designed algorithm-based fault tolerant (ABFT) algorithms for the parallel Hessenberg reduction and the parallel tridiagonal reduction. The ABFT algorithms target fail-stop errors. These two fault tolerant algorithms use a combination of ABFT and diskless checkpointing. ABFT is used to protect frequently modified data . We carefully design the ABFT algorithm so the checksums are valid at the end of each iterative cycle. Diskless checkpointing is used for rarely modified data. These checkpoints are in the form of checksums, which are small in size, so the time and storage cost to store them in main memory is small. Also, there are intermediate results which need to be protected for a short time window. We store a copy of this data on the neighboring process in the process grid.
We also designed algorithm-based fault tolerant algorithms for the CPU-GPU hybrid Hessenberg reduction algorithm and the CPU-GPU hybrid bidiagonal reduction algorithm. These two fault tolerant algorithms target silent errors. Our design employs both ABFT and diskless checkpointing to provide data redundancy. The low cost error detection uses two dot products and an equality test. The recovery protocol uses reverse computation to roll back the state of the matrix to a point where it is easy to locate and correct errors.
We provided theoretical analysis and experimental verification on the correctness and efficiency of our fault tolerant algorithm design. We also provided mathematical proof on the numerical stability of the factorization results after fault recovery. Experimental results corroborate with the mathematical proof that the impact is mild
Recommended from our members
Algorithm Based Fault Tolerance in Massively Parallel Systems
An A complex computer system consists of billions of transistors, miles of wires, and many interactions with an unpredictable environment. Correct results must be produced despite faults that dynamically occur in some of these components. Many techniques have been developed for fault tolerant computation. General purpose methods are independent of the application, yet incur an overhead cost which may be unacceptable for massively parallel systems. Algorithm-specific methods, which can operate at lower cost, are a developing alternative [1, 72]. This paper first reviews the general-purpose approach and then focuses on the algorithm-specific method, with an eye toward massively parallel processors. Algorithm-based fault tolerance has the attraction of low overhead; furthermore it addresses both the detection and also the correction problems. The principle is to build low-cost checking and correcting mechanism based exclusively on the redundancies inherent in the system
- …