969 research outputs found

    Autonomous Recovery Of Reconfigurable Logic Devices Using Priority Escalation Of Slack

    Get PDF
    Field Programmable Gate Array (FPGA) devices offer a suitable platform for survivable hardware architectures in mission-critical systems. In this dissertation, active dynamic redundancy-based fault-handling techniques are proposed which exploit the dynamic partial reconfiguration capability of SRAM-based FPGAs. Self-adaptation is realized by employing reconfiguration in detection, diagnosis, and recovery phases. To extend these concepts to semiconductor aging and process variation in the deep submicron era, resilient adaptable processing systems are sought to maintain quality and throughput requirements despite the vulnerabilities of the underlying computational devices. A new approach to autonomous fault-handling which addresses these goals is developed using only a uniplex hardware arrangement. It operates by observing a health metric to achieve Fault Demotion using Recon- figurable Slack (FaDReS). Here an autonomous fault isolation scheme is employed which neither requires test vectors nor suspends the computational throughput, but instead observes the value of a health metric based on runtime input. The deterministic flow of the fault isolation scheme guarantees success in a bounded number of reconfigurations of the FPGA fabric. FaDReS is then extended to the Priority Using Resource Escalation (PURE) online redundancy scheme which considers fault-isolation latency and throughput trade-offs under a dynamic spare arrangement. While deep-submicron designs introduce new challenges, use of adaptive techniques are seen to provide several promising avenues for improving resilience. The scheme developed is demonstrated by hardware design of various signal processing circuits and their implementation on a Xilinx Virtex-4 FPGA device. These include a Discrete Cosine Transform (DCT) core, Motion Estimation (ME) engine, Finite Impulse Response (FIR) Filter, Support Vector Machine (SVM), and Advanced Encryption Standard (AES) blocks in addition to MCNC benchmark circuits. A iii significant reduction in power consumption is achieved ranging from 83% for low motion-activity scenes to 12.5% for high motion activity video scenes in a novel ME engine configuration. For a typical benchmark video sequence, PURE is shown to maintain a PSNR baseline near 32dB. The diagnosability, reconfiguration latency, and resource overhead of each approach is analyzed. Compared to previous alternatives, PURE maintains a PSNR within a difference of 4.02dB to 6.67dB from the fault-free baseline by escalating healthy resources to higher-priority signal processing functions. The results indicate the benefits of priority-aware resiliency over conventional redundancy approaches in terms of fault-recovery, power consumption, and resource-area requirements. Together, these provide a broad range of strategies to achieve autonomous recovery of reconfigurable logic devices under a variety of constraints, operating conditions, and optimization criteria

    Literature Review For Networking And Communication Technology

    Get PDF
    Report documents the results of a literature search performed in the area of networking and communication technology

    State-of-the-art Assessment For Simulated Forces

    Get PDF
    Summary of the review of the state of the art in simulated forces conducted to support the research objectives of Research and Development for Intelligent Simulated Forces

    Fast Algorithm Development for SVD: Applications in Pattern Matching and Fault Diagnosis

    Get PDF
    The project aims for fast detection and diagnosis of faults occurring in process plants by designing a low-cost FPGA module for the computation. Fast detection and diagnosis when the process is still operating in a controllable region helps avoiding the further advancement of the fault and reduce the productivity loss. Model-based methods are not popular in the domain of process control as obtaining an accurate model is expensive and requires an expertise. Data-driven methods like Principal Component Analysis(PCA) is a quite popular diagnostic method for process plants as they do not require any model. PCA is widely used tool for dimensionality reduction and thus reducing the computational e�ort. The trends are captured in prinicpal components as it is di�cult to have a same amount of disturbance as simulated in historical database. The historical database has multiple instances of various kinds of faults and disturbances along with normal operation. A moving window approach has been employed to detect similar instances in the historical database based on Standard PCA similarity factor. The measurements of variables of interest over a certain period of time forms the snapshot dataset, S. At each instant, a window of same size as that of snapshot dataset is picked from the historical database forms the historical window, H. The two datasets are then compared using similarity factors like Standard PCA similarity factor which signi�es the angular di�erence between the principal components of two datasets. Since many of the operating conditions are quite similar to each other and signi�cant number of mis-classi�cations have been observed, a candidate pool which orders the historical data windows on the values of similarity factor is formed. Based on the most detected operation among the top-most windows, the operating personnel takes necessary action. Tennessee Eastman Challenge process has been chosen as an initial case study for evaluating the performance. The measurements are sampled for every one minute and the fault having the smallest maximum duration is 8 hours. Hence the snapshot window size, m has been chosen to be consisting of 500 samples i.e 8.33 hours of most recent data of all the 52 variables. Ideally, the moving window should replace the oldest sample with a new one. Then it would take approximately the same number of comparisons as that of size of historical database. The size of the historical database is 4.32 million measurements(past 8years data) for each of the 52 variables. With software simulation on Matlab, this takes around 80-100 minutes to sweep through the whole 4.32 million historical database. Since most of the computation is spent in �nding principal components of the two datasets using SVD, a hardware design has to be incorporated to accelerate the pattern matching approach. The thesis is organized as follows: Chapter 1 describes the moving window approach, various similarity factors and metrics used for pattern matching. The previous work proposed by Ashish Singhal is based on skipping few samples for reducing the computational e�ort and also employs windows as large as 5761 which is four days of snapshot. Instead, a new method which skips the samples when the similarity factor is quite low has been proposed. A simpli�ed form of the Standard PCA similarity has been proposed without any trade-o� in accuracy. Pre-computation of historical database can also be done as the data is available aprior, but this requires a large memory requirement as most of the time is spent in read/write operations. The large memory requirement is due to the fact that every sample will give rise to 52�35 matrix assuming the top-35 PC's are sufficient enough to capture the variance of the dataset. Chapter 2 describes various popular algorithms for SVD. Algorithms apart from Jacobi methods like Golub-Kahan, Divide and conquer SVD algorithms are brie y discussed. While bi-diagonal methods are very accurate they suffer from large latency and computationally intensive. On the other hand, Jacobi methods are computationally inexpensive and parallelizable, thus reducing the latency. We also evaluted the performance of the proposed hybrid Golub-Kahan Jacobi algorithm to our application. Chapter 3 describes the basic building block CORDIC which is used for performing rotations required for Jacobi methods or for n-D householder re ections of Golub-Kahan SVD. CORIDC is widely employed in hardware design for computing trigonometric, exponential or logarithmic functions as it makes use of simple shift and add/subtract operations. Two modes of CORDIC namely Rotation mode and Vectoring mode are discussed which are used in the derivation of Two-sided Jacobi SVD. Chapter 4 describes the Jacobi methods of SVD which are quite popular in hardware implementation as they are quite amenable to parallel computation. Two variants of Jacobi methods namely One-sided and Two-sided Jacobi methods are brie y discussed. Two-sided Jacobi making making use of CORDIC has has been derived. The systolic array implementation which is quite popular in hardware implementation for the past three decades has been discussed. Chapter 5 deals with the Hardware implementation of Pattern matching and reports the literature survey of various architectures developed for computing SVD. Xilinx ZC7020 has been chosen as target device for FPGA implementation as it is inexpensive device with many built-in peripherals. The latency reports with both Vivado HLS and Vivado SDSoC are also reported for the application of interest. Evaluation of other case studies and other datadriven methods similar to PCA like Correspondence Analysis(CA) and Independent Component Analysis(ICA), development of efficient hybrid method for computing SVD in hardware and highly discriminating similarity factor, extending CORDIC to n-dimensions for householder re ections have been considered for future research

    Evaluation of advanced techniques for structural FPGA self-test

    Get PDF
    This thesis presents a comprehensive test generation framework for FPGA logic elements and interconnects. It is based on and extends the current state-of-the-art. The purpose of FPGA testing in this work is to achieve reliable reconfiguration for a FPGA-based runtime reconfigurable system. A pre-configuration test is performed on a portion of the FPGA before it is reconfigured as part of the system to ensure that the FPGA fabric is fault-free. The implementation platform is the Xilinx Virtex-5 FPGA family. Existing literature in FPGA testing is evaluated and reviewed thoroughly. The various approaches are compared against one another qualitatively and the approach most suitable to the target platform is chosen. The array testing method is employed in testing the FPGA logic for its low hardware overhead and optimal test time. All tests are additionally pipelined to reduce test application time and use a high test clock frequency. A hybrid fault model including both structural and functional faults is assumed. An algorithm for the optimization of the number of required FPGA test configurations is developed and implemented in Java using a pseudo-random set-covering heuristic. Optimal solutions are obtained for Virtex-5 logic slices. The algorithm effort is parameterizable with the number of loop iterations each of which take approximately one second for a Virtex-5 sliceL circuit. A flexible test architecture for interconnects is developed. Arbitrary wire types can be tested in the same test configuration with no hardware overhead. Furthermore, a routing algorithm is integrated with the test template generation to select the wires under test and route them appropriately. Nine test configurations are required to achieve full test coverage for the FPGA logic. For interconnect testing, a local router-based on depth-first graph traversal is implemented in Java as the basis for creating systematic interconnect test templates. Pent wire testing is additionally implemented as a proof of concept. The test clock frequency for all tests exceeds 170 MHz and the hardware overhead is always lower than seven CLBs. All implemented tests are parameterizable such that they can be applied to any portion of the FPGA regardless of size or position

    Programmiersprachen und Rechenkonzepte

    Get PDF
    Seit 1984 veranstaltet die GI-Fachgruppe "Programmiersprachen und Rechenkonzepte", die aus den ehemaligen Fachgruppen 2.1.3 "Implementierung von Programmiersprachen" und 2.1.4 "Alternative Konzepte für Sprachen und Rechner" hervorgegangen ist, regelmäßig im Frühjahr einen Workshop im Physikzentrum Bad Honnef. Das Treffen dient in erster Linie dem gegenseitigen Kennenlernen, dem Erfahrungsaustausch, der Diskussion und der Vertiefung gegenseitiger Kontakte

    A novel computational framework for fast, distributed computing and knowledge integration for microarray gene expression data analysis

    Get PDF
    The healthcare burden and suffering due to life-threatening diseases such as cancer would be significantly reduced by the design and refinement of computational interpretation of micro-molecular data collected by bioinformaticians. Rapid technological advancements in the field of microarray analysis, an important component in the design of in-silico molecular medicine methods, have generated enormous amounts of such data, a trend that has been increasing exponentially over the last few years. However, the analysis and handling of these data has become one of the major bottlenecks in the utilization of the technology. The rate of collection of these data has far surpassed our ability to analyze the data for novel, non-trivial, and important knowledge. The high-performance computing platform, and algorithms that utilize its embedded computing capacity, has emerged as a leading technology that can handle such data-intensive knowledge discovery applications. In this dissertation, we present a novel framework to achieve fast, robust, and accurate (biologically-significant) multi-class classification of gene expression data using distributed knowledge discovery and integration computational routines, specifically for cancer genomics applications. The research presents a unique computational paradigm for the rapid, accurate, and efficient selection of relevant marker genes, while providing parametric controls to ensure flexibility of its application. The proposed paradigm consists of the following key computational steps: (a) preprocess, normalize the gene expression data; (b) discretize the data for knowledge mining application; (c) partition the data using two proposed methods: partitioning with overlapped windows and adaptive selection; (d) perform knowledge discovery on the partitioned data-spaces for association rule discovery; (e) integrate association rules from partitioned data and knowledge spaces on distributed processor nodes using a novel knowledge integration algorithm; and (f) post-analysis and functional elucidation of the discovered gene rule sets. The framework is implemented on a shared-memory multiprocessor supercomputing environment, and several experimental results are demonstrated to evaluate the algorithms. We conclude with a functional interpretation of the computational discovery routines for enhanced biological physiological discovery from cancer genomics datasets, while suggesting some directions for future research

    From experiment to design – fault characterization and detection in parallel computer systems using computational accelerators

    Get PDF
    This dissertation summarizes experimental validation and co-design studies conducted to optimize the fault detection capabilities and overheads in hybrid computer systems (e.g., using CPUs and Graphics Processing Units, or GPUs), and consequently to improve the scalability of parallel computer systems using computational accelerators. The experimental validation studies were conducted to help us understand the failure characteristics of CPU-GPU hybrid computer systems under various types of hardware faults. The main characterization targets were faults that are difficult to detect and/or recover from, e.g., faults that cause long latency failures (Ch. 3), faults in dynamically allocated resources (Ch. 4), faults in GPUs (Ch. 5), faults in MPI programs (Ch. 6), and microarchitecture-level faults with specific timing features (Ch. 7). The co-design studies were based on the characterization results. One of the co-designed systems has a set of source-to-source translators that customize and strategically place error detectors in the source code of target GPU programs (Ch. 5). Another co-designed system uses an extension card to learn the normal behavioral and semantic execution patterns of message-passing processes executing on CPUs, and to detect abnormal behaviors of those parallel processes (Ch. 6). The third co-designed system is a co-processor that has a set of new instructions in order to support software-implemented fault detection techniques (Ch. 7). The work described in this dissertation gains more importance because heterogeneous processors have become an essential component of state-of-the-art supercomputers. GPUs were used in three of the five fastest supercomputers that were operating in 2011. Our work included comprehensive fault characterization studies in CPU-GPU hybrid computers. In CPUs, we monitored the target systems for a long period of time after injecting faults (a temporally comprehensive experiment), and injected faults into various types of program states that included dynamically allocated memory (to be spatially comprehensive). In GPUs, we used fault injection studies to demonstrate the importance of detecting silent data corruption (SDC) errors that are mainly due to the lack of fine-grained protections and the massive use of fault-insensitive data. This dissertation also presents transparent fault tolerance frameworks and techniques that are directly applicable to hybrid computers built using only commercial off-the-shelf hardware components. This dissertation shows that by developing understanding of the failure characteristics and error propagation paths of target programs, we were able to create fault tolerance frameworks and techniques that can quickly detect and recover from hardware faults with low performance and hardware overheads
    corecore