Search CORE

6,092 research outputs found

Impact of Compositional Grading and Component Lumping on Ultimate Recovery

Author: El Hajbi Sophia
El Hajbi Sophia
Publication venue: Department of Earth Science and Engineering, Imperial College London
Publication date: 01/01/2012
Field of study

Imperial Users onl

Spiral - Imperial College Digital Repository

An adaptive multilevel indexing method for disaster service discovery

Author: Ding Zhijun
Jiang Changjun
Liu Lu
Wu Yan
Yan Chun Gang
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

With the globe facing various scales of natural disasters then and there, disaster recovery is one among the hottest research areas and the rescue and recovery services can be highly benefitted with the advancements of information and communications technology (ICT). Enhanced rescue effect can be achieved through the dynamic networking of people, systems and procedures. A seamless integration of these elements along with the service-oriented systems can satisfy the mission objectives with the maximum effect. In disaster management systems, services from multiple sources are usually integrated and composed into a usable format in order to effectively drive the decision-making process. Therefore, a novel service indexing method is required to effectively discover desirable services from the large-scale disaster service repositories, comprising a huge number of services. With this in mind, this paper presents a novel multilevel indexing algorithm based on the equivalence theory in order to achieve effective service discovery in large-scale disaster service repositories. The performance and efficiency of the proposed model have been evaluated by both theoretical analysis and practical experiments. The experimental results proved that the proposed algorithm is more efficient for service discovery and composition than existing inverted index methods

UDORA - University of Derby Online Research Archive

Leicester Research Archive

Recommended from our members

Survivor: An Approach for Adding Dependability to Legacy Workflow Systems

Author: Greze Jean-Denis
Kaiser Gail E.
Kc Gaurav S.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2002
Field of study

Although they often provide critical services, most workflow systems are not dependable. There has been much literature on dependable/survivable distributed systems; most is concerned with developing new architectures, not adapting pre-existing ones. Additionally, the literature is focused on hardening, security-based defense, as opposed to recovery. For deployed systems, it is often infeasible to completely replace existing infrastructures; what is more pragmatic are ways in which existing distributed systems can be adapted to offer better dependability. In this paper, we outline a general architecture that can easily be retrofitted to legacy workflow systems in order to improve dependability and fault tolerance. We do this by monitoring enactment and replicating partial workflow states as tools for detection, analysis and recovery. We discuss some policies that can guide these mechanisms. Finally, we describe and evaluate our implementation, Survivor, which modified an existing workflow system provided by the Naval Research Lab

Columbia University Academic Commons

Recommended from our members

Benchmarking tests on recovery oriented computing

Author: Raman Nandita
Publication venue
Publication date: 09/07/2012
Field of study

textBenchmarks have played a very important role in guiding the progress of computer science systems in various ways. Specifically, in Autonomous environments it has a major role to play. System crashes and software failures are a basic part of a software system’s life-cycle and to overcome or rather make it as less vulnerable as possible is the main purpose of recovery oriented computing. This is usually done by trying to reduce the downtime by automatically and efficiently recovering from a broad class of transient software failures without having to modify applications. There have been various types of benchmarks for recovering from a failure, but in this paper we intend to create a benchmark framework called the warning benchmarks to measure and evaluate the recovery oriented systems. It consists of the known and the unknown failures and few benchmark techniques which the warning benchmarks handle with the help of various other techniques in software fault analysis.Electrical and Computer Engineerin

Texas ScholarWorks

Reliable and Efficient In-Memory Fault Tolerance of Large Language Model Pretraining

Author: Chu Xiaowen
He Bingsheng
He Xin
Pan Xinglin
Shi Shaohuai
Tang Zhenheng
Wang Yuxin
Wu Xiaoyu
Zheng Yang
Zhou Amelie Chi
Publication venue
Publication date: 19/10/2023
Field of study

Extensive system scales (i.e. thousands of GPU/TPUs) and prolonged training periods (i.e. months of pretraining) significantly escalate the probability of failures when training large language models (LLMs). Thus, efficient and reliable fault-tolerance methods are in urgent need. Checkpointing is the primary fault-tolerance method to periodically save parameter snapshots from GPU memory to disks via CPU memory. In this paper, we identify the frequency of existing checkpoint-based fault-tolerance being significantly limited by the storage I/O overheads, which results in hefty re-training costs on restarting from the nearest checkpoint. In response to this gap, we introduce an in-memory fault-tolerance framework for large-scale LLM pretraining. The framework boosts the efficiency and reliability of fault tolerance from three aspects: (1) Reduced Data Transfer and I/O: By asynchronously caching parameters, i.e., sharded model parameters, optimizer states, and RNG states, to CPU volatile memory, Our framework significantly reduces communication costs and bypasses checkpoint I/O. (2) Enhanced System Reliability: Our framework enhances parameter protection with a two-layer hierarchy: snapshot management processes (SMPs) safeguard against software failures, together with Erasure Coding (EC) protecting against node failures. This double-layered protection greatly improves the survival probability of the parameters compared to existing checkpointing methods. (3) Improved Snapshotting Frequency: Our framework achieves more frequent snapshotting compared with asynchronous checkpointing optimizations under the same saving time budget, which improves the fault tolerance efficiency. Empirical results demonstrate that Our framework minimizes the overhead of fault tolerance of LLM pretraining by effectively leveraging redundant CPU resources.Comment: Fault Tolerance, Checkpoint Optimization, Large Language Model, 3D parallelis

arXiv.org e-Print Archive

JPEG privacy and security framework for social networking and GLAM services

Author: Delgado Mercè Jaime
Ebrahimi Touradj
Foessel Sigfried
Ishikawa Takaaki
Natu Ambarish
Schelkens Peter
Temmermans Frederik
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Current image coding standards provide limited support for privacy and security features. An exception is the JPSEC standard, which defines security extensions in JPEG 2000 specifications (part 8). Notwithstanding this shortcoming, the JPEG committee is currently defining a new JPEG Systems standard, which envisages privacy and security support across JPEG family of standards. In this manuscript, the main philosophy of this emerging specification is outlined along with typical use cases, main requirements as well as examples of potential technological solutions. The upcoming specification guarantees backward and forward compatibility with earlier standards and legacy implementations. Finally, we illustrate the introduced framework by two applications targeting secure photo sharing on social networks and IPR management in the GLAM sector.Peer ReviewedPostprint (published version

Infoscience - École polytechnique fédérale de Lausanne

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Fraunhofer-ePrints

Directory of Open Access Journals