6,092 research outputs found

    Impact of Compositional Grading and Component Lumping on Ultimate Recovery

    Get PDF
    Imperial Users onl

    An adaptive multilevel indexing method for disaster service discovery

    Get PDF
    With the globe facing various scales of natural disasters then and there, disaster recovery is one among the hottest research areas and the rescue and recovery services can be highly benefitted with the advancements of information and communications technology (ICT). Enhanced rescue effect can be achieved through the dynamic networking of people, systems and procedures. A seamless integration of these elements along with the service-oriented systems can satisfy the mission objectives with the maximum effect. In disaster management systems, services from multiple sources are usually integrated and composed into a usable format in order to effectively drive the decision-making process. Therefore, a novel service indexing method is required to effectively discover desirable services from the large-scale disaster service repositories, comprising a huge number of services. With this in mind, this paper presents a novel multilevel indexing algorithm based on the equivalence theory in order to achieve effective service discovery in large-scale disaster service repositories. The performance and efficiency of the proposed model have been evaluated by both theoretical analysis and practical experiments. The experimental results proved that the proposed algorithm is more efficient for service discovery and composition than existing inverted index methods

    Reliable and Efficient In-Memory Fault Tolerance of Large Language Model Pretraining

    Full text link
    Extensive system scales (i.e. thousands of GPU/TPUs) and prolonged training periods (i.e. months of pretraining) significantly escalate the probability of failures when training large language models (LLMs). Thus, efficient and reliable fault-tolerance methods are in urgent need. Checkpointing is the primary fault-tolerance method to periodically save parameter snapshots from GPU memory to disks via CPU memory. In this paper, we identify the frequency of existing checkpoint-based fault-tolerance being significantly limited by the storage I/O overheads, which results in hefty re-training costs on restarting from the nearest checkpoint. In response to this gap, we introduce an in-memory fault-tolerance framework for large-scale LLM pretraining. The framework boosts the efficiency and reliability of fault tolerance from three aspects: (1) Reduced Data Transfer and I/O: By asynchronously caching parameters, i.e., sharded model parameters, optimizer states, and RNG states, to CPU volatile memory, Our framework significantly reduces communication costs and bypasses checkpoint I/O. (2) Enhanced System Reliability: Our framework enhances parameter protection with a two-layer hierarchy: snapshot management processes (SMPs) safeguard against software failures, together with Erasure Coding (EC) protecting against node failures. This double-layered protection greatly improves the survival probability of the parameters compared to existing checkpointing methods. (3) Improved Snapshotting Frequency: Our framework achieves more frequent snapshotting compared with asynchronous checkpointing optimizations under the same saving time budget, which improves the fault tolerance efficiency. Empirical results demonstrate that Our framework minimizes the overhead of fault tolerance of LLM pretraining by effectively leveraging redundant CPU resources.Comment: Fault Tolerance, Checkpoint Optimization, Large Language Model, 3D parallelis

    JPEG privacy and security framework for social networking and GLAM services

    Get PDF
    Current image coding standards provide limited support for privacy and security features. An exception is the JPSEC standard, which defines security extensions in JPEG 2000 specifications (part 8). Notwithstanding this shortcoming, the JPEG committee is currently defining a new JPEG Systems standard, which envisages privacy and security support across JPEG family of standards. In this manuscript, the main philosophy of this emerging specification is outlined along with typical use cases, main requirements as well as examples of potential technological solutions. The upcoming specification guarantees backward and forward compatibility with earlier standards and legacy implementations. Finally, we illustrate the introduced framework by two applications targeting secure photo sharing on social networks and IPR management in the GLAM sector.Peer ReviewedPostprint (published version
    • …
    corecore