Search CORE

10,508 research outputs found

Software reliability through fault-avoidance and fault-tolerance

Author: Mcallister David F.
Vouk Mladen A.
Publication venue
Publication date
Field of study

The use of back-to-back, or comparison, testing for regression test or porting is examined. The efficiency and the cost of the strategy is compared with manual and table-driven single version testing. Some of the key parameters that influence the efficiency and the cost of the approach are the failure identification effort during single version program testing, the extent of implemented changes, the nature of the regression test data (e.g., random), and the nature of the inter-version failure correlation and fault-masking. The advantages and disadvantages of the technique are discussed, together with some suggestions concerning its practical use

NASA Technical Reports Server

Beam Loss Monitors at LHC

Author: Dehning B.
Publication venue
Publication date: 10/08/2016
Field of study

One of the main functions of the LHC beam loss measurement system is the protection of equipment against damage caused by impacting particles creating secondary showers and their energy dissipation in the matter. Reliability requirements are scaled according to the acceptable consequences and the frequency of particle impact events on equipment. Increasing reliability often leads to more complex systems. The downside of complexity is a reduction of availability; therefore, an optimum has to be found for these conflicting requirements. A detailed review of selected concepts and solutions for the LHC system will be given to show approaches used in various parts of the system from the sensors, signal processing, and software implementations to the requirements for operation and documentation.Comment: 16 pages, contribution to the 2014 Joint International Accelerator School: Beam Loss and Accelerator Protection, Newport Beach, CA, USA , 5-14 Nov 201

arXiv.org e-Print Archive

CERN Document Server

Alpha Entanglement Codes: Practical Erasure Codes to Archive Data in Unreliable Environments

Author: Estrada-Galiñanes Vero
Felber Pascal
Miller Ethan
Pâris Jehan-François
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 06/10/2018
Field of study

Data centres that use consumer-grade disks drives and distributed peer-to-peer systems are unreliable environments to archive data without enough redundancy. Most redundancy schemes are not completely effective for providing high availability, durability and integrity in the long-term. We propose alpha entanglement codes, a mechanism that creates a virtual layer of highly interconnected storage devices to propagate redundant information across a large scale storage system. Our motivation is to design flexible and practical erasure codes with high fault-tolerance to improve data durability and availability even in catastrophic scenarios. By flexible and practical, we mean code settings that can be adapted to future requirements and practical implementations with reasonable trade-offs between security, resource usage and performance. The codes have three parameters. Alpha increases storage overhead linearly but increases the possible paths to recover data exponentially. Two other parameters increase fault-tolerance even further without the need of additional storage. As a result, an entangled storage system can provide high availability, durability and offer additional integrity: it is more difficult to modify data undetectably. We evaluate how several redundancy schemes perform in unreliable environments and show that alpha entanglement codes are flexible and practical codes. Remarkably, they excel at code locality, hence, they reduce repair costs and become less dependent on storage locations with poor availability. Our solution outperforms Reed-Solomon codes in many disaster recovery scenarios.Comment: The publication has 12 pages and 13 figures. This work was partially supported by Swiss National Science Foundation SNSF Doc.Mobility 162014, 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN

arXiv.org e-Print Archive

Crossref

Flexible provisioning of Web service workflows

Author: Aggarwal R.
Aghdaie N.
Akkiraju R.
Baccelli F.
Eder J.
Friese T.
Jaeger M. C.
Klusch M.
Long D. D. E.
Mandell D.
Martin D.
McDermott D.
McIlraith S. A.
Nicholas R. Jennings
O'Brien A.
Paolucci M.
Raiffa H.
Russell S.
Sebastian Stein
Sirin E.
Smith T. M.
Stein S.
Stein S.
Szomszor M.
Terry R. Payne
Tillman F. A.
Weatherspoon H.
Yu T.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/02/2008
Field of study

Web services promise to revolutionise the way computational resources and business processes are offered and invoked in open, distributed systems, such as the Internet. These services are described using machine-readable meta-data, which enables consumer applications to automatically discover and provision suitable services for their workflows at run-time. However, current approaches have typically assumed service descriptions are accurate and deterministic, and so have neglected to account for the fact that services in these open systems are inherently unreliable and uncertain. Specifically, network failures, software bugs and competition for services may regularly lead to execution delays or even service failures. To address this problem, the process of provisioning services needs to be performed in a more flexible manner than has so far been considered, in order to proactively deal with failures and to recover workflows that have partially failed. To this end, we devise and present a heuristic strategy that varies the provisioning of services according to their predicted performance. Using simulation, we then benchmark our algorithm and show that it leads to a 700% improvement in average utility, while successfully completing up to eight times as many workflows as approaches that do not consider service failures

CiteSeerX

Southampton (e-Prints Soton)

Crossref

Spiral - Imperial College Digital Repository

Practical issues for the implementation of survivability and recovery techniques in optical networks

Author: Ellinas Georgios
Papadimitriou Dimitri
Rak Jacek
Staessens Dimitri
Sterbenz James PG
Walkowiak Krzysztof
Publication venue: 'Elsevier BV'
Publication date: 01/01/2014
Field of study

Ghent University Academic Bibliography