670 research outputs found

    A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing

    Full text link
    Data Grids have been adopted as the platform for scientific communities that need to share, access, transport, process and manage large data collections distributed worldwide. They combine high-end computing technologies with high-performance networking and wide-area storage management techniques. In this paper, we discuss the key concepts behind Data Grids and compare them with other data sharing and distribution paradigms such as content delivery networks, peer-to-peer networks and distributed databases. We then provide comprehensive taxonomies that cover various aspects of architecture, data transportation, data replication and resource allocation and scheduling. Finally, we map the proposed taxonomy to various Data Grid systems not only to validate the taxonomy but also to identify areas for future exploration. Through this taxonomy, we aim to categorise existing systems to better understand their goals and their methodology. This would help evaluate their applicability for solving similar problems. This taxonomy also provides a "gap analysis" of this area through which researchers can potentially identify new issues for investigation. Finally, we hope that the proposed taxonomy and mapping also helps to provide an easy way for new practitioners to understand this complex area of research.Comment: 46 pages, 16 figures, Technical Repor

    Data Access for LIGO on the OSG

    Full text link
    During 2015 and 2016, the Laser Interferometer Gravitational-Wave Observatory (LIGO) conducted a three-month observing campaign. These observations delivered the first direct detection of gravitational waves from binary black hole mergers. To search for these signals, the LIGO Scientific Collaboration uses the PyCBC search pipeline. To deliver science results in a timely manner, LIGO collaborated with the Open Science Grid (OSG) to distribute the required computation across a series of dedicated, opportunistic, and allocated resources. To deliver the petabytes necessary for such a large-scale computation, our team deployed a distributed data access infrastructure based on the XRootD server suite and the CernVM File System (CVMFS). This data access strategy grew from simply accessing remote storage to a POSIX-based interface underpinned by distributed, secure caches across the OSG.Comment: 6 pages, 3 figures, submitted to PEARC1

    HEP Applications Evaluation of the EDG Testbed and Middleware

    Full text link
    Workpackage 8 of the European Datagrid project was formed in January 2001 with representatives from the four LHC experiments, and with experiment independent people from five of the six main EDG partners. In September 2002 WP8 was strengthened by the addition of effort from BaBar and D0. The original mandate of WP8 was, following the definition of short- and long-term requirements, to port experiment software to the EDG middleware and testbed environment. A major additional activity has been testing the basic functionality and performance of this environment. This paper reviews experiences and evaluations in the areas of job submission, data management, mass storage handling, information systems and monitoring. It also comments on the problems of remote debugging, the portability of code, and scaling problems with increasing numbers of jobs, sites and nodes. Reference is made to the pioneeering work of Atlas and CMS in integrating the use of the EDG Testbed into their data challenges. A forward look is made to essential software developments within EDG and to the necessary cooperation between EDG and LCG for the LCG prototype due in mid 2003.Comment: Talk from the 2003 Computing in High Energy and Nuclear Physics Conference (CHEP03), La Jolla, CA, USA, March 2003, 7 pages. PSN THCT00

    SciTokens: Capability-Based Secure Access to Remote Scientific Data

    Full text link
    The management of security credentials (e.g., passwords, secret keys) for computational science workflows is a burden for scientists and information security officers. Problems with credentials (e.g., expiration, privilege mismatch) cause workflows to fail to fetch needed input data or store valuable scientific results, distracting scientists from their research by requiring them to diagnose the problems, re-run their computations, and wait longer for their results. In this paper, we introduce SciTokens, open source software to help scientists manage their security credentials more reliably and securely. We describe the SciTokens system architecture, design, and implementation addressing use cases from the Laser Interferometer Gravitational-Wave Observatory (LIGO) Scientific Collaboration and the Large Synoptic Survey Telescope (LSST) projects. We also present our integration with widely-used software that supports distributed scientific computing, including HTCondor, CVMFS, and XrootD. SciTokens uses IETF-standard OAuth tokens for capability-based secure access to remote scientific data. The access tokens convey the specific authorizations needed by the workflows, rather than general-purpose authentication impersonation credentials, to address the risks of scientific workflows running on distributed infrastructure including NSF resources (e.g., LIGO Data Grid, Open Science Grid, XSEDE) and public clouds (e.g., Amazon Web Services, Google Cloud, Microsoft Azure). By improving the interoperability and security of scientific workflows, SciTokens 1) enables use of distributed computing for scientific domains that require greater data protection and 2) enables use of more widely distributed computing resources by reducing the risk of credential abuse on remote systems.Comment: 8 pages, 6 figures, PEARC '18: Practice and Experience in Advanced Research Computing, July 22--26, 2018, Pittsburgh, PA, US

    AstroGrid-D: Enhancing Astronomic Science with Grid Technology

    Get PDF
    We present AstroGrid-D, a project bringing together astronomers and experts in Grid technology to enhance astronomic science in many aspects. First, by sharing currently dispersed resources, scientists can calculate their models in more detail. Second, by developing new mechanisms to efficiently access and process existing datasets, scientific problems can be investigated that were until now impossible to solve. Third, by adopting Grid technology large instruments such as robotic telescopes and complex scientific workflows from data aquisition to analysis can be managed in an integrated manner. In this paper, we present prominent astronomic use cases, discuss requirements on a Grid middleware and present our approach to extend/augment existing middleware to facilitate the improvements mentioned above

    Grid Data Management in Action: Experience in Running and Supporting Data Management Services in the EU DataGrid Project

    Full text link
    In the first phase of the EU DataGrid (EDG) project, a Data Management System has been implemented and provided for deployment. The components of the current EDG Testbed are: a prototype of a Replica Manager Service built around the basic services provided by Globus, a centralised Replica Catalogue to store information about physical locations of files, and the Grid Data Mirroring Package (GDMP) that is widely used in various HEP collaborations in Europe and the US for data mirroring. During this year these services have been refined and made more robust so that they are fit to be used in a pre-production environment. Application users have been using this first release of the Data Management Services for more than a year. In the paper we present the components and their interaction, our implementation and experience as well as the feedback received from our user communities. We have resolved not only issues regarding integration with other EDG service components but also many of the interoperability issues with components of our partner projects in Europe and the U.S. The paper concludes with the basic lessons learned during this operation. These conclusions provide the motivation for the architecture of the next generation of Data Management Services that will be deployed in EDG during 2003.Comment: Talk from the 2003 Computing in High Energy and Nuclear Physics (CHEP03), La Jolla, Ca, USA, March 2003, 9 pages, LaTeX, PSN: TUAT007 all figures are in the directory "figures

    The NorduGrid architecture and tools

    Full text link
    The NorduGrid project designed a Grid architecture with the primary goal to meet the requirements of production tasks of the LHC experiments. While it is meant to be a rather generic Grid system, it puts emphasis on batch processing suitable for problems encountered in High Energy Physics. The NorduGrid architecture implementation uses the \globus{} as the foundation for various components, developed by the project. While introducing new services, the NorduGrid does not modify the Globus tools, such that the two can eventually co-exist. The NorduGrid topology is decentralized, avoiding a single point of failure. The NorduGrid architecture is thus a light-weight, non-invasive and dynamic one, while robust and scalable, capable of meeting most challenging tasks of High Energy Physics.Comment: Talk from the 2003 Computing in High Energy Physics and Nuclear Physics (CHEP03), La Jolla, Ca, USA, March 2003, 9 pages,LaTeX, 4 figures. PSN MOAT00

    The Design and Demonstration of the Ultralight Testbed

    Get PDF
    In this paper we present the motivation, the design, and a recent demonstration of the UltraLight testbed at SC|05. The goal of the Ultralight testbed is to help meet the data-intensive computing challenges of the next generation of particle physics experiments with a comprehensive, network- focused approach. UltraLight adopts a new approach to networking: instead of treating it traditionally, as a static, unchanging and unmanaged set of inter-computer links, we are developing and using it as a dynamic, configurable, and closely monitored resource that is managed from end-to-end. To achieve its goal we are constructing a next-generation global system that is able to meet the data processing, distribution, access and analysis needs of the particle physics community. In this paper we will first present early results in the various working areas of the project. We then describe our experiences of the network architecture, kernel setup, application tuning and configuration used during the bandwidth challenge event at SC|05. During this Challenge, we achieved a record-breaking aggregate data rate in excess of 150 Gbps while moving physics datasets between many Grid computing sites
    • …
    corecore