Search CORE

2,805 research outputs found

ScotGrid: Providing an Effective Distributed Tier-2 in the LHC Era

Author: Brochu F
Burgess M
cfengine website
Cluster Resource Maui Workload Manager
Cluster Resources Torque LRMS
David Ambrose-Griffith
Gentzsch W
GPFS: General Parallel File-System
Graeme Stewart
Greig Cowan
Mike Kenyon
Nagios website
OpenLDAP
Orlando Richards
Phil Roffe
Sam Skipsey
ScotGrid website
Skipsey S
Skype Voice over IP application
Sun Grid Engine Local Resource Management System
van der Ster D
VMware Virtualisation software
Publication venue: 'IOP Publishing'
Publication date: 01/01/2009
Field of study

ScotGrid is a distributed Tier-2 centre in the UK with sites in Durham, Edinburgh and Glasgow. ScotGrid has undergone a huge expansion in hardware in anticipation of the LHC and now provides more than 4MSI2K and 500TB to the LHC VOs. Scaling up to this level of provision has brought many challenges to the Tier-2 and we show in this paper how we have adopted new methods of organising the centres, from fabric management and monitoring to remote management of sites to management and operational procedures, to meet these challenges. We describe how we have coped with different operational models at the sites, where Glagsow and Durham sites are managed "in house" but resources at Edinburgh are managed as a central university resource. This required the adoption of a different fabric management model at Edinburgh and a special engagement with the cluster managers. Challenges arose from the different job models of local and grid submission that required special attention to resolve. We show how ScotGrid has successfully provided an infrastructure for ATLAS and LHCb Monte Carlo production. Special attention has been paid to ensuring that user analysis functions efficiently, which has required optimisation of local storage and networking to cope with the demands of user analysis. Finally, although these Tier-2 resources are pledged to the whole VO, we have established close links with our local physics user communities as being the best way to ensure that the Tier-2 functions effectively as a part of the LHC grid computing framework..Comment: Preprint for 17th International Conference on Computing in High Energy and Nuclear Physics, 7 pages, 1 figur

arXiv.org e-Print Archive

CiteSeerX

Enlighten

ElasTraS: An Elastic Transactional Data Store in the Cloud

Author: Abbadi Amr El
Agrawal Divyakant
Das Sudipto
Publication venue
Publication date: 01/01/2009
Field of study

Over the last couple of years, "Cloud Computing" or "Elastic Computing" has emerged as a compelling and successful paradigm for internet scale computing. One of the major contributing factors to this success is the elasticity of resources. In spite of the elasticity provided by the infrastructure and the scalable design of the applications, the elephant (or the underlying database), which drives most of these web-based applications, is not very elastic and scalable, and hence limits scalability. In this paper, we propose ElasTraS which addresses this issue of scalability and elasticity of the data store in a cloud computing environment to leverage from the elastic nature of the underlying infrastructure, while providing scalable transactional data access. This paper aims at providing the design of a system in progress, highlighting the major design choices, analyzing the different guarantees provided by the system, and identifying several important challenges for the research community striving for computing in the cloud.Comment: 5 Pages, In Proc. of USENIX HotCloud 200

arXiv.org e-Print Archive

CiteSeerX

Application Aware for Byzantine Fault Tolerance

Author: Chai Hua
Publication venue: EngagedScholarship@CSU
Publication date: 01/01/2014
Field of study

Driven by the need for higher reliability of many distributed systems, various replication-based fault tolerance technologies have been widely studied. A prominent technology is Byzantine fault tolerance (BFT). BFT can help achieve high availability and trustworthiness by ensuring replica consistency despite the presence of hardware failures and malicious faults on a small portion of the replicas. However, most state-of-the-art BFT algorithms are designed for generic stateful applications that require the total ordering of all incoming requests and the sequential execution of such requests. In this dissertation research, we recognize that a straightforward application of existing BFT algorithms is often inappropriate for many practical systems: (1) not all incoming requests must be executed sequentially according to some total order and doing so would incur unnecessary (and often prohibitively high) runtime overhead and (2) a sequential execution of all incoming requests might violate the application semantics and might result in deadlocks for some applications. In the past four and half years of my dissertation research, I have focused on designing lightweight BFT solutions for a number of Web services applications (including a shopping cart application, an event stream processing application, Web service business activities (WS-BA), and Web service atomic transactions (WS-AT)) by exploiting application semantics. The main research challenge is to identify how to minimize the use of Byzantine agreement steps and enable concurrent execution of requests that are commutable or unrelated. We have shown that the runtime overhead can be significantly reduced by adopting our lightweight solutions. One limitation for our solutions is that it requires intimate knowledge on the application design and implementation, which may be expensive and error-prone to design such BFT solutions on complex applications. Recognizing this limitation, we investigated the use of Conflict-free Replicated Data Types (CRDTs) to

Application Aware for Byzantine Fault Tolerance

Author: Chai Hua
Publication venue: EngagedScholarship@CSU
Publication date: 01/01/2014
Field of study

OhioLINK Electronic Thesis and Dissertation Center

Cleveland-Marshall College of Law

Robust and Resilient Services â How to design, build and operate them

Author: McCance G
Méndez-Lorenzo P
Shiers J
Publication venue
Publication date: 16/11/2007
Field of study

Grid infrastructures require a high degree of fault tolerance and reliability. This can only be achieved by careful planning and detailed implementation. We describe on-going work within the WLCG project to build and run highly reliable services. Following the "a priori" analysis based on the services and service levels listed in the Memorandum of Understanding that sites participating in WLCG have signed[1], this paper provides an "a posteriori" analysis following over 2 years of production service. This work covers not only the services deployed at the Tier0 centre at CERN - which has the most stringent service requirements related to the acquisition of the raw data, the initial processing phase and the distribution of raw and processed data to Tier1 sites, but also a similar analysis for Tier1 and major Tier2 sites. The latter will be covered at a workshop that will take place shortly before the EELA conference and so will be very up-to-date