39,412 research outputs found

    CRAFT: A library for easier application-level Checkpoint/Restart and Automatic Fault Tolerance

    Get PDF
    In order to efficiently use the future generations of supercomputers, fault tolerance and power consumption are two of the prime challenges anticipated by the High Performance Computing (HPC) community. Checkpoint/Restart (CR) has been and still is the most widely used technique to deal with hard failures. Application-level CR is the most effective CR technique in terms of overhead efficiency but it takes a lot of implementation effort. This work presents the implementation of our C++ based library CRAFT (Checkpoint-Restart and Automatic Fault Tolerance), which serves two purposes. First, it provides an extendable library that significantly eases the implementation of application-level checkpointing. The most basic and frequently used checkpoint data types are already part of CRAFT and can be directly used out of the box. The library can be easily extended to add more data types. As means of overhead reduction, the library offers a build-in asynchronous checkpointing mechanism and also supports the Scalable Checkpoint/Restart (SCR) library for node level checkpointing. Second, CRAFT provides an easier interface for User-Level Failure Mitigation (ULFM) based dynamic process recovery, which significantly reduces the complexity and effort of failure detection and communication recovery mechanism. By utilizing both functionalities together, applications can write application-level checkpoints and recover dynamically from process failures with very limited programming effort. This work presents the design and use of our library in detail. The associated overheads are thoroughly analyzed using several benchmarks

    LogBase: A Scalable Log-structured Database System in the Cloud

    Full text link
    Numerous applications such as financial transactions (e.g., stock trading) are write-heavy in nature. The shift from reads to writes in web applications has also been accelerating in recent years. Write-ahead-logging is a common approach for providing recovery capability while improving performance in most storage systems. However, the separation of log and application data incurs write overheads observed in write-heavy environments and hence adversely affects the write throughput and recovery time in the system. In this paper, we introduce LogBase - a scalable log-structured database system that adopts log-only storage for removing the write bottleneck and supporting fast system recovery. LogBase is designed to be dynamically deployed on commodity clusters to take advantage of elastic scaling property of cloud environments. LogBase provides in-memory multiversion indexes for supporting efficient access to data maintained in the log. LogBase also supports transactions that bundle read and write operations spanning across multiple records. We implemented the proposed system and compared it with HBase and a disk-based log-structured record-oriented system modeled after RAMCloud. The experimental results show that LogBase is able to provide sustained write throughput, efficient data access out of the cache, and effective system recovery.Comment: VLDB201

    New Hampshire University Research and Industry Plan: A Roadmap for Collaboration and Innovation

    Get PDF
    This University Research and Industry plan for New Hampshire is focused on accelerating innovation-led development in the state by partnering academia’s strengths with the state’s substantial base of existing and emerging advanced industries. These advanced industries are defined by their deep investment and connections to research and development and the high-quality jobs they generate across production, new product development and administrative positions involving skills in science, technology, engineering and math (STEM)

    Edge vulnerability in neural and metabolic networks

    Full text link
    Biological networks, such as cellular metabolic pathways or networks of corticocortical connections in the brain, are intricately organized, yet remarkably robust toward structural damage. Whereas many studies have investigated specific aspects of robustness, such as molecular mechanisms of repair, this article focuses more generally on how local structural features in networks may give rise to their global stability. In many networks the failure of single connections may be more likely than the extinction of entire nodes, yet no analysis of edge importance (edge vulnerability) has been provided so far for biological networks. We tested several measures for identifying vulnerable edges and compared their prediction performance in biological and artificial networks. Among the tested measures, edge frequency in all shortest paths of a network yielded a particularly high correlation with vulnerability, and identified inter-cluster connections in biological but not in random and scale-free benchmark networks. We discuss different local and global network patterns and the edge vulnerability resulting from them.Comment: 8 pages, 4 figures, to appear in Biological Cybernetic

    Feedback-Aware Precoding for Millimeter Wave Massive MIMO Systems

    Full text link
    Millimeter wave (mmWave) communication is a promising solution for coping with the ever-increasing mobile data traffic because of its large bandwidth. To enable a sufficient link margin, a large antenna array employing directional beamforming, which is enabled by the availability of channel state information at the transmitter (CSIT), is required. However, CSIT acquisition for mmWave channels introduces a huge feedback overhead due to the typically large number of transmit and receive antennas. Leveraging properties of mmWave channels, this paper proposes a precoding strategy which enables a flexible adjustment of the feedback overhead. In particular, the optimal unconstrained precoder is approximated by selecting a variable number of elements from a basis that is constructed as a function of the transmitter array response, where the number of selected basis elements can be chosen according to the feedback constraint. Simulation results show that the proposed precoding scheme can provide a near-optimal solution if a higher feedback overhead can be afforded. For a low overhead, it can still provide a good approximation of the optimal precoder.Comment: 7 pages, 5 figures, to appear at the IEEE International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC) 201

    A load-sharing architecture for high performance optimistic simulations on multi-core machines

    Get PDF
    In Parallel Discrete Event Simulation (PDES), the simulation model is partitioned into a set of distinct Logical Processes (LPs) which are allowed to concurrently execute simulation events. In this work we present an innovative approach to load-sharing on multi-core/multiprocessor machines, targeted at the optimistic PDES paradigm, where LPs are speculatively allowed to process simulation events with no preventive verification of causal consistency, and actual consistency violations (if any) are recovered via rollback techniques. In our approach, each simulation kernel instance, in charge of hosting and executing a specific set of LPs, runs a set of worker threads, which can be dynamically activated/deactivated on the basis of a distributed algorithm. The latter relies in turn on an analytical model that provides indications on how to reassign processor/core usage across the kernels in order to handle the simulation workload as efficiently as possible. We also present a real implementation of our load-sharing architecture within the ROme OpTimistic Simulator (ROOT-Sim), namely an open-source C-based simulation platform implemented according to the PDES paradigm and the optimistic synchronization approach. Experimental results for an assessment of the validity of our proposal are presented as well
    corecore