6,092 research outputs found
An adaptive multilevel indexing method for disaster service discovery
With the globe facing various scales of natural disasters then and there, disaster recovery is one among the hottest research areas and the rescue and recovery services can be highly benefitted with the advancements of information and communications technology (ICT). Enhanced rescue effect can be achieved through the dynamic networking of people, systems and procedures. A seamless integration of these elements along with the service-oriented systems can satisfy the mission objectives with the maximum effect. In disaster management systems, services from multiple sources are usually integrated and composed into a usable format in order to effectively drive the decision-making process. Therefore, a novel service indexing method is required to effectively discover desirable services from the large-scale disaster service repositories, comprising a huge number of services. With this in mind, this paper presents a novel multilevel indexing algorithm based on the equivalence theory in order to achieve effective service discovery in large-scale disaster service repositories. The performance and efficiency of the proposed model have been evaluated by both theoretical analysis and practical experiments. The experimental results proved that the proposed algorithm is more efficient for service discovery and composition than existing inverted index methods
Recommended from our members
Survivor: An Approach for Adding Dependability to Legacy Workflow Systems
Although they often provide critical services, most workflow systems are not dependable. There has been much literature on dependable/survivable distributed systems; most is concerned with developing new architectures, not adapting pre-existing ones. Additionally, the literature is focused on hardening, security-based defense, as opposed to recovery. For deployed systems, it is often infeasible to completely replace existing infrastructures; what is more pragmatic are ways in which existing distributed systems can be adapted to offer better dependability. In this paper, we outline a general architecture that can easily be retrofitted to legacy workflow systems in order to improve dependability and fault tolerance. We do this by monitoring enactment and replicating partial workflow states as tools for detection, analysis and recovery. We discuss some policies that can guide these mechanisms. Finally, we describe and evaluate our implementation, Survivor, which modified an existing workflow system provided by the Naval Research Lab
Recommended from our members
Benchmarking tests on recovery oriented computing
textBenchmarks have played a very important role in guiding the progress of computer
science systems in various ways. Specifically, in Autonomous environments it has a
major role to play. System crashes and software failures are a basic part of a software
system’s life-cycle and to overcome or rather make it as less vulnerable as possible is the
main purpose of recovery oriented computing. This is usually done by trying to reduce
the downtime by automatically and efficiently recovering from a broad class of transient
software failures without having to modify applications. There have been various types of
benchmarks for recovering from a failure, but in this paper we intend to create a
benchmark framework called the warning benchmarks to measure and evaluate the
recovery oriented systems. It consists of the known and the unknown failures and few
benchmark techniques which the warning benchmarks handle with the help of various
other techniques in software fault analysis.Electrical and Computer Engineerin
Reliable and Efficient In-Memory Fault Tolerance of Large Language Model Pretraining
Extensive system scales (i.e. thousands of GPU/TPUs) and prolonged training
periods (i.e. months of pretraining) significantly escalate the probability of
failures when training large language models (LLMs). Thus, efficient and
reliable fault-tolerance methods are in urgent need. Checkpointing is the
primary fault-tolerance method to periodically save parameter snapshots from
GPU memory to disks via CPU memory. In this paper, we identify the frequency of
existing checkpoint-based fault-tolerance being significantly limited by the
storage I/O overheads, which results in hefty re-training costs on restarting
from the nearest checkpoint. In response to this gap, we introduce an in-memory
fault-tolerance framework for large-scale LLM pretraining. The framework boosts
the efficiency and reliability of fault tolerance from three aspects: (1)
Reduced Data Transfer and I/O: By asynchronously caching parameters, i.e.,
sharded model parameters, optimizer states, and RNG states, to CPU volatile
memory, Our framework significantly reduces communication costs and bypasses
checkpoint I/O. (2) Enhanced System Reliability: Our framework enhances
parameter protection with a two-layer hierarchy: snapshot management processes
(SMPs) safeguard against software failures, together with Erasure Coding (EC)
protecting against node failures. This double-layered protection greatly
improves the survival probability of the parameters compared to existing
checkpointing methods. (3) Improved Snapshotting Frequency: Our framework
achieves more frequent snapshotting compared with asynchronous checkpointing
optimizations under the same saving time budget, which improves the fault
tolerance efficiency. Empirical results demonstrate that Our framework
minimizes the overhead of fault tolerance of LLM pretraining by effectively
leveraging redundant CPU resources.Comment: Fault Tolerance, Checkpoint Optimization, Large Language Model, 3D
parallelis
JPEG privacy and security framework for social networking and GLAM services
Current image coding standards provide limited support for privacy and security features. An exception is the JPSEC standard, which defines security extensions in JPEG 2000 specifications (part 8). Notwithstanding this shortcoming, the JPEG committee is currently defining a new JPEG Systems standard, which envisages privacy and security support across JPEG family of standards. In this manuscript, the main philosophy of this emerging specification is outlined along with typical use cases, main requirements as well as examples of potential technological solutions. The upcoming specification guarantees backward and forward compatibility with earlier standards and legacy implementations. Finally, we illustrate the introduced framework by two applications targeting secure photo sharing on social networks and IPR management in the GLAM sector.Peer ReviewedPostprint (published version
- …