65 research outputs found

    On Integrating Failure Localization with Survivable Design

    Get PDF
    In this thesis, I proposed a novel framework of all-optical failure restoration which jointly determines network monitoring plane and spare capacity allocation in the presence of either static or dynamic traffic. The proposed framework aims to enable a general shared protection scheme to achieve near optimal capacity efficiency as in Failure Dependent Protection(FDP) while subject to an ultra-fast, all-optical, and deterministic failure restoration process. Simply put, Local Unambiguous Failure Localization(L-UFL) and FDP are the two building blocks for the proposed restoration framework. Under L-UFL, by properly allocating a set of Monitoring Trails (m-trails), a set of nodes can unambiguously identify every possible Shared Risk Link Group (SRLG) failure merely based on its locally collected Loss of Light(LOL) signals. Two heuristics are proposed to solve L-UFL, one of which exclusively deploys Supervisory Lightpaths (S-LPs) while the other jointly considers S-LPs and Working Lightpaths (W-LPs) for suppressing monitoring resource consumption. Thanks to the ``Enhanced Min Wavelength Max Information principle'', an entropy based utility function, m-trail global-sharing and other techniques, the proposed heuristics exhibit satisfactory performance in minimizing the number of m-trails, Wavelength Channel(WL) consumption and the running time of the algorithm. Based on the heuristics for L-UFL, two algorithms, namely MPJD and DJH, are proposed for the novel signaling-free restoration framework to deal with static and dynamic traffic respectively. MPJD is developed to determine the Protection Lightpaths (P-LPs) and m-trails given the pre-computed W-LPs while DJH jointly implements a generic dynamic survivable routing scheme based on FDP with an m-trail deployment scheme. For both algorithms, m-trail deployment is guided by the Necessary Monitoring Requirement (NMR) defined at each node for achieving signaling-free restoration. Extensive simulation is conducted to verify the performance of the proposed heuristics in terms of WL consumption, number of m-trails, monitoring requirement, blocking probability and running time. In conclusion, the proposed restoration framework can achieve all-optical and signaling-free restoration with the help of L-UFL, while maintaining high capacity efficiency as in FDP based survivable routing. The proposed heuristics achieve satisfactory performance as verified by the simulation results

    Autonomous Recovery Of Reconfigurable Logic Devices Using Priority Escalation Of Slack

    Get PDF
    Field Programmable Gate Array (FPGA) devices offer a suitable platform for survivable hardware architectures in mission-critical systems. In this dissertation, active dynamic redundancy-based fault-handling techniques are proposed which exploit the dynamic partial reconfiguration capability of SRAM-based FPGAs. Self-adaptation is realized by employing reconfiguration in detection, diagnosis, and recovery phases. To extend these concepts to semiconductor aging and process variation in the deep submicron era, resilient adaptable processing systems are sought to maintain quality and throughput requirements despite the vulnerabilities of the underlying computational devices. A new approach to autonomous fault-handling which addresses these goals is developed using only a uniplex hardware arrangement. It operates by observing a health metric to achieve Fault Demotion using Recon- figurable Slack (FaDReS). Here an autonomous fault isolation scheme is employed which neither requires test vectors nor suspends the computational throughput, but instead observes the value of a health metric based on runtime input. The deterministic flow of the fault isolation scheme guarantees success in a bounded number of reconfigurations of the FPGA fabric. FaDReS is then extended to the Priority Using Resource Escalation (PURE) online redundancy scheme which considers fault-isolation latency and throughput trade-offs under a dynamic spare arrangement. While deep-submicron designs introduce new challenges, use of adaptive techniques are seen to provide several promising avenues for improving resilience. The scheme developed is demonstrated by hardware design of various signal processing circuits and their implementation on a Xilinx Virtex-4 FPGA device. These include a Discrete Cosine Transform (DCT) core, Motion Estimation (ME) engine, Finite Impulse Response (FIR) Filter, Support Vector Machine (SVM), and Advanced Encryption Standard (AES) blocks in addition to MCNC benchmark circuits. A iii significant reduction in power consumption is achieved ranging from 83% for low motion-activity scenes to 12.5% for high motion activity video scenes in a novel ME engine configuration. For a typical benchmark video sequence, PURE is shown to maintain a PSNR baseline near 32dB. The diagnosability, reconfiguration latency, and resource overhead of each approach is analyzed. Compared to previous alternatives, PURE maintains a PSNR within a difference of 4.02dB to 6.67dB from the fault-free baseline by escalating healthy resources to higher-priority signal processing functions. The results indicate the benefits of priority-aware resiliency over conventional redundancy approaches in terms of fault-recovery, power consumption, and resource-area requirements. Together, these provide a broad range of strategies to achieve autonomous recovery of reconfigurable logic devices under a variety of constraints, operating conditions, and optimization criteria

    Aeronautical engineering: A continuing bibliography with indexes (supplement 202)

    Get PDF
    This bibliography lists 447 reports, articles and other documents introduced into the NASA scientific and technical information system in June 1986

    TRIPLE MODULAR REDUNDANCY APPROACH FOR INTERNET CONNECTIVITY

    Get PDF
    This paper discusses the issue of providing tolerance to hardware and software faults in Internet system through triplicate application servers.  A replication scheme (TMR) is presented, and a detailed dependability analysis of this scheme is performed.  The proposed model was designed mainly for fault-tolerant Internet connectivity system where faults will not impair the continuous services rendered by the Internet system, thereby exhibiting highly varying and dynamic system characteristics.  A major feature of the model under consideration is to attempt the adaptive connections of the existing Triple Modular Redundancy (TMR) scheme for the execution of redundant modules for a required level of fault tolerance.  &nbsp

    Aeronautical engineering: A continuing bibliography with indexes, supplement 139

    Get PDF
    This bibliography lists 381 reports, articles, and other documents introduced into the NASA scientific and technical information system in July 1981

    MAGMA a liquid software approach to fault tolerance, computer network security, and survivable

    Get PDF
    The Next Generation Internet (NGI) will address increased multi-media Internet service demands, requiring consistent Quality of Service (QoS), similar to the legacy phone system. Server Agent-based Active network Management (SAAM) acts like a rush-hour traffic reporting helicopter. Upon routing request arrivals, SAAM server determines the best, least traffic/resistance route and assembles the routing path, freeing up ¡ʹlight-weight¡· routers to provide faster, more reliable, forwarding services. The SAAM server is a critical network node; therefore, it is imperative to make it extremely robust. With Margulis Agent-Based Mobile Application (MAGMAà EÌ ) liquid software, a SAAM server agent will remain inactive in resident memory of each router until it is stimulated by a message from the departing server. Then the agent will begin running a new server at a starting point determined from the prior server¡Šs recent state information or a pre-determined point if that state information is not available. MAGMAà EÌ will provide SAAM an increased fault tolerance and security against malicious attacks. Liquid software research has taken place since 1996 (University of Arizona/University of Pennsylvania); however, there is no known application currently providing fault tolerance and system security. In this thesis, the foundation for a mobile SAAM server was developed, with the researcher being able to manually move the server from one host to the next. Furthermore, this thesis designed a protocol thatcompresses critical state information, providing condensed messages to efficiently configure the next SAAM server across the network with the state information from the departing server extracts critical state information from the current server and periodically transports a compressed form of the state information to potential next SAAM servers in the network. MAGMAà EÌ will provide a revolution in today¡Šs computer fault tolerance and security paradigms, benefiting industry through more survivable networks with guaranteed QoS.http://archive.org/details/magmaliquidsoftw109455922US Navy (USN) authorApproved for public release; distribution is unlimited

    Design Disjunction for Resilient Reconfigurable Hardware

    Get PDF
    Contemporary reconfigurable hardware devices have the capability to achieve high performance, power efficiency, and adaptability required to meet a wide range of design goals. With scaling challenges facing current complementary metal oxide semiconductor (CMOS), new concepts and methodologies supporting efficient adaptation to handle reliability issues are becoming increasingly prominent. Reconfigurable hardware and their ability to realize self-organization features are expected to play a key role in designing future dependable hardware architectures. However, the exponential increase in density and complexity of current commercial SRAM-based field-programmable gate arrays (FPGAs) has escalated the overhead associated with dynamic runtime design adaptation. Traditionally, static modular redundancy techniques are considered to surmount this limitation; however, they can incur substantial overheads in both area and power requirements. To achieve a better trade-off among performance, area, power, and reliability, this research proposes design-time approaches that enable fine selection of redundancy level based on target reliability goals and autonomous adaptation to runtime demands. To achieve this goal, three studies were conducted: First, a graph and set theoretic approach, named Hypergraph-Cover Diversity (HCD), is introduced as a preemptive design technique to shift the dominant costs of resiliency to design-time. In particular, union-free hypergraphs are exploited to partition the reconfigurable resources pool into highly separable subsets of resources, each of which can be utilized by the same synthesized application netlist. The diverse implementations provide reconfiguration-based resilience throughout the system lifetime while avoiding the significant overheads associated with runtime placement and routing phases. Evaluation on a Motion-JPEG image compression core using a Xilinx 7-series-based FPGA hardware platform has demonstrated the potential of the proposed FT method to achieve 37.5% area saving and up to 66% reduction in power consumption compared to the frequently-used TMR scheme while providing superior fault tolerance. Second, Design Disjunction based on non-adaptive group testing is developed to realize a low-overhead fault tolerant system capable of handling self-testing and self-recovery using runtime partial reconfiguration. Reconfiguration is guided by resource grouping procedures which employ non-linear measurements given by the constructive property of f-disjunctness to extend runtime resilience to a large fault space and realize a favorable range of tradeoffs. Disjunct designs are created using the mosaic convergence algorithm developed such that at least one configuration in the library evades any occurrence of up to d resource faults, where d is lower-bounded by f. Experimental results for a set of MCNC and ISCAS benchmarks have demonstrated f-diagnosability at the individual slice level with average isolation resolution of 96.4% (94.4%) for f=1 (f=2) while incurring an average critical path delay impact of only 1.49% and area cost roughly comparable to conventional 2-MR approaches. Finally, the proposed Design Disjunction method is evaluated as a design-time method to improve timing yield in the presence of large random within-die (WID) process variations for application with a moderately high production capacity

    The Federal Conference on Intelligent Processing Equipment

    Get PDF
    Research and development projects involving intelligent processing equipment within the following U.S. agencies are addressed: Department of Agriculture, Department of Commerce, Department of Energy, Department of Defense, Environmental Protection Agency, Federal Emergency Management Agency, NASA, National Institutes of Health, and the National Science Foundation
    corecore