1,452 research outputs found

    Fault-tolerant networks-on-chip routing with coarse and fine-grained look-ahead

    Get PDF
    Fault tolerance and adaptive capabilities are challenges for modern networks-on-chip (NoC) due to the increase in physical defects in advanced manufacturing processes. Two novel adaptive routing algorithms, namely coarse and fine-grained (FG) look-ahead algorithms, are proposed in this paper to enhance 2-D mesh/torus NoC system fault-tolerant capabilities. These strategies use fault flag codes from neighboring nodes to obtain the status or conditions of real-time traffic in an NoC region, then calculate the path weights and choose the route to forward packets. This approach enables the router to minimize congestion for the adjacent connected channels and also to bypass a path with faulty channels by looking ahead at distant neighboring router paths. The novelty of the proposed routing algorithms is the weighted path selection strategies, which make near-optimal routing decisions to maintain the NoC system performance under high fault rates. Results show that the proposed routing algorithms can achieve performance improvement compared to other state of the art works under various traffic loads and high fault rates. The routing algorithm with FG look-ahead capability achieves a higher throughput compared with the coarse-grained approach under complex fault patterns. The hardware area/power overheads of both routing approaches are relatively low which does not prohibit scalability for large-scale NoC implementations

    New Fault Tolerant Multicast Routing Techniques to Enhance Distributed-Memory Systems Performance

    Get PDF
    Distributed-memory systems are a key to achieve high performance computing and the most favorable architectures used in advanced research problems. Mesh connected multicomputer are one of the most popular architectures that have been implemented in many distributed-memory systems. These systems must support communication operations efficiently to achieve good performance. The wormhole switching technique has been widely used in design of distributed-memory systems in which the packet is divided into small flits. Also, the multicast communication has been widely used in distributed-memory systems which is one source node sends the same message to several destination nodes. Fault tolerance refers to the ability of the system to operate correctly in the presence of faults. Development of fault tolerant multicast routing algorithms in 2D mesh networks is an important issue. This dissertation presents, new fault tolerant multicast routing algorithms for distributed-memory systems performance using wormhole routed 2D mesh. These algorithms are described for fault tolerant routing in 2D mesh networks, but it can also be extended to other topologies. These algorithms are a combination of a unicast-based multicast algorithm and tree-based multicast algorithms. These algorithms works effectively for the most commonly encountered faults in mesh networks, f-rings, f-chains and concave fault regions. It is shown that the proposed routing algorithms are effective even in the presence of a large number of fault regions and large size of fault region. These algorithms are proved to be deadlock-free. Also, the problem of fault regions overlap is solved. Four essential performance metrics in mesh networks will be considered and calculated; also these algorithms are a limited-global-information-based multicasting which is a compromise of local-information-based approach and global-information-based approach. Data mining is used to validate the results and to enlarge the sample. The proposed new multicast routing techniques are used to enhance the performance of distributed-memory systems. Simulation results are presented to demonstrate the efficiency of the proposed algorithms

    The importance of input variables to a neural network fault-diagnostic system for nuclear power plants

    Get PDF
    This thesis explores safety enhancement for nuclear power plants. Emergency response systems currently in use depend mainly on automatic systems engaging when certain parameters go beyond a pre-specified safety limit. Often times the operator has little or no opportunity to react since a fast scram signal shuts down the reactor smoothly and efficiently. These accidents are of interest to technical support personnel since examining the conditions that gave rise to these situations help determine causality. In many other cases an automated fault-diagnostic advisor would be a valuable tool in assisting the technicians and operators to determine what just happened and why

    Improving efficiency and resilience in large-scale computing systems through analytics and data-driven management

    Full text link
    Applications running in large-scale computing systems such as high performance computing (HPC) or cloud data centers are essential to many aspects of modern society, from weather forecasting to financial services. As the number and size of data centers increase with the growing computing demand, scalable and efficient management becomes crucial. However, data center management is a challenging task due to the complex interactions between applications, middleware, and hardware layers such as processors, network, and cooling units. This thesis claims that to improve robustness and efficiency of large-scale computing systems, significantly higher levels of automated support than what is available in today's systems are needed, and this automation should leverage the data continuously collected from various system layers. Towards this claim, we propose novel methodologies to automatically diagnose the root causes of performance and configuration problems and to improve efficiency through data-driven system management. We first propose a framework to diagnose software and hardware anomalies that cause undesired performance variations in large-scale computing systems. We show that by training machine learning models on resource usage and performance data collected from servers, our approach successfully diagnoses 98% of the injected anomalies at runtime in real-world HPC clusters with negligible computational overhead. We then introduce an analytics framework to address another major source of performance anomalies in cloud data centers: software misconfigurations. Our framework discovers and extracts configuration information from cloud instances such as containers or virtual machines. This is the first framework to provide comprehensive visibility into software configurations in multi-tenant cloud platforms, enabling systematic analysis for validating the correctness of software configurations. This thesis also contributes to the design of robust and efficient system management methods that leverage continuously monitored resource usage data. To improve performance under power constraints, we propose a workload- and cooling-aware power budgeting algorithm that distributes the available power among servers and cooling units in a data center, achieving up to 21% improvement in throughput per Watt compared to the state-of-the-art. Additionally, we design a network- and communication-aware HPC workload placement policy that reduces communication overhead by up to 30% in terms of hop-bytes compared to existing policies.2019-07-02T00:00:00

    Density Preserving Sampling: Robust and Efficient Alternative to Cross-validation for Error Estimation

    Get PDF
    Estimation of the generalization ability of a classi- fication or regression model is an important issue, as it indicates the expected performance on previously unseen data and is also used for model selection. Currently used generalization error estimation procedures, such as cross-validation (CV) or bootstrap, are stochastic and, thus, require multiple repetitions in order to produce reliable results, which can be computationally expensive, if not prohibitive. The correntropy-inspired density- preserving sampling (DPS) procedure proposed in this paper eliminates the need for repeating the error estimation procedure by dividing the available data into subsets that are guaranteed to be representative of the input dataset. This allows the production of low-variance error estimates with an accuracy comparable to 10 times repeated CV at a fraction of the computations required by CV. This method can also be used for model ranking and selection. This paper derives the DPS procedure and investigates its usability and performance using a set of public benchmark datasets and standard classifier

    Nuclear plant diagnostics using neural networks with dynamic input selection

    Get PDF
    The work presented in this dissertation explores the design and development of a large scale nuclear power plant (NPP) fault diagnostic system based on artificial neural networks (ANNs). The viability of detecting a large number of transients in a NPP using ANNs is demonstrated. A new adviser design is subsequently presented where the diagnostic task is divided into component parts, and each part is solved by an individual ANN. This new design allows the expansion of the diagnostic capabilities of an existing adviser by modifying the existing ANNs and adding new ANNs to the adviser;This dissertation also presents an architecture optimization scheme called the dynamic input selection (DIS) scheme. DIS analyzes the training data for any problem and ranks the available input variables in order of their importance to the input-output relationship. Training is initiated with the most important input and one hidden node. As the network training progresses, input and hidden nodes are added as required until the networks have learned the problem. Any hidden or input nodes that were added during training but are unnecessary for subsequent recall are now removed from the network. The DIS scheme can be applied to any ANN learning paradigm;The DIS scheme is used to train the ANNs that form the NPP fault diagnostic adviser. DIS completely eliminates any guesswork related to architecture selection, thus decreasing the time taken to train each ANN. Each ANN uses only a small subset of the available input variables that is required to solve its particular task. This reduction in the dimensionality of the problem leads to a drastic reduction in training time;Data used in this work was collected during the simulation of transients on the operator training simulator at Duane Arnold Energy Center, a boiling water reactor nuclear power plant. An adviser was developed to detect and classify 30 distinct transients based on the simulation of 47 scenarios at different severities. This adviser was then expanded to detect and classify a total of 36 transients based on the simulation of 58 transient scenarios. The noise tolerant characteristics of the adviser are demonstrated

    Heterogeneous volumetric data mapping and its medical applications

    Get PDF
    With the advance of data acquisition techniques, massive solid geometries are being collected routinely in scientific tasks, these complex and unstructured data need to be effectively correlated for various processing and analysis. Volumetric mapping solves bijective low-distortion correspondence between/among 3D geometric data, and can serve as an important preprocessing step in many tasks in compute-aided design and analysis, industrial manufacturing, medical image analysis, to name a few. This dissertation studied two important volumetric mapping problems: the mapping of heterogeneous volumes (with nonuniform inner structures/layers) and the mapping of sequential dynamic volumes. To effectively handle heterogeneous volumes, first, we studied the feature-aligned harmonic volumetric mapping. Compared to previous harmonic mapping, it supports the point, curve, and iso-surface alignment, which are important low-dimensional structures in heterogeneous volumetric data. Second, we proposed a biharmonic model for volumetric mapping. Unlike the conventional harmonic volumetric mapping that only supports positional continuity on the boundary, this new model allows us to have higher order continuity C1C^1 along the boundary surface. This suggests a potential model to solve the volumetric mapping of complex and big geometries through divide-and-conquer. We also studied the medical applications of our volumetric mapping in lung tumor respiratory motion modeling. We were building an effective digital platform for lung tumor radiotherapy based on effective volumetric CT/MRI image matching and analysis. We developed and integrated in this platform a set of geometric/image processing techniques including advanced image segmentation, finite element meshing, volumetric registration and interpolation. The lung organ/tumor and surrounding tissues are treated as a heterogeneous region and a dynamic 4D registration framework is developed for lung tumor motion modeling and tracking. Compared to the previous 3D pairwise registration, our new 4D parameterization model leads to a significantly improved registration accuracy. The constructed deforming model can hence approximate the deformation of the tissues and tumor
    corecore