1,431 research outputs found

    A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing

    Full text link
    Data Grids have been adopted as the platform for scientific communities that need to share, access, transport, process and manage large data collections distributed worldwide. They combine high-end computing technologies with high-performance networking and wide-area storage management techniques. In this paper, we discuss the key concepts behind Data Grids and compare them with other data sharing and distribution paradigms such as content delivery networks, peer-to-peer networks and distributed databases. We then provide comprehensive taxonomies that cover various aspects of architecture, data transportation, data replication and resource allocation and scheduling. Finally, we map the proposed taxonomy to various Data Grid systems not only to validate the taxonomy but also to identify areas for future exploration. Through this taxonomy, we aim to categorise existing systems to better understand their goals and their methodology. This would help evaluate their applicability for solving similar problems. This taxonomy also provides a "gap analysis" of this area through which researchers can potentially identify new issues for investigation. Finally, we hope that the proposed taxonomy and mapping also helps to provide an easy way for new practitioners to understand this complex area of research.Comment: 46 pages, 16 figures, Technical Repor

    Comparing FutureGrid, Amazon EC2, and Open Science Grid for Scientific Workflows

    Get PDF
    Scientists have a number of computing infrastructures available to conduct their research, including grids and public or private clouds. This paper explores the use of these cyberinfrastructures to execute scientific workflows, an important class of scientific applications. It examines the benefits and drawbacks of cloud and grid systems using the case study of an astronomy application. The application analyzes data from the NASA Kepler mission in order to compute periodograms, which help astronomers detect the periodic dips in the intensity of starlight caused by exoplanets as they transit their host star. In this paper we describe our experiences modeling the periodogram application as a scientific workflow using Pegasus, and deploying it on the FutureGrid scientific cloud testbed, the Amazon EC2 commercial cloud, and the Open Science Grid. We compare and contrast the infrastructures in terms of setup, usability, cost, resource availability and performance

    Towards In-Transit Analytics for Industry 4.0

    Full text link
    Industry 4.0, or Digital Manufacturing, is a vision of inter-connected services to facilitate innovation in the manufacturing sector. A fundamental requirement of innovation is the ability to be able to visualise manufacturing data, in order to discover new insight for increased competitive advantage. This article describes the enabling technologies that facilitate In-Transit Analytics, which is a necessary precursor for Industrial Internet of Things (IIoT) visualisation.Comment: 8 pages, 10th IEEE International Conference on Internet of Things (iThings-2017), Exeter, UK, 201

    Digital curation and the cloud

    Get PDF
    Digital curation involves a wide range of activities, many of which could benefit from cloud deployment to a greater or lesser extent. These range from infrequent, resource-intensive tasks which benefit from the ability to rapidly provision resources to day-to-day collaborative activities which can be facilitated by networked cloud services. Associated benefits are offset by risks such as loss of data or service level, legal and governance incompatibilities and transfer bottlenecks. There is considerable variability across both risks and benefits according to the service and deployment models being adopted and the context in which activities are performed. Some risks, such as legal liabilities, are mitigated by the use of alternative, e.g., private cloud models, but this is typically at the expense of benefits such as resource elasticity and economies of scale. Infrastructure as a Service model may provide a basis on which more specialised software services may be provided. There is considerable work to be done in helping institutions understand the cloud and its associated costs, risks and benefits, and how these compare to their current working methods, in order that the most beneficial uses of cloud technologies may be identified. Specific proposals, echoing recent work coordinated by EPSRC and JISC are the development of advisory, costing and brokering services to facilitate appropriate cloud deployments, the exploration of opportunities for certifying or accrediting cloud preservation providers, and the targeted publicity of outputs from pilot studies to the full range of stakeholders within the curation lifecycle, including data creators and owners, repositories, institutional IT support professionals and senior manager

    Harnessing the Power of Many: Extensible Toolkit for Scalable Ensemble Applications

    Full text link
    Many scientific problems require multiple distinct computational tasks to be executed in order to achieve a desired solution. We introduce the Ensemble Toolkit (EnTK) to address the challenges of scale, diversity and reliability they pose. We describe the design and implementation of EnTK, characterize its performance and integrate it with two distinct exemplar use cases: seismic inversion and adaptive analog ensembles. We perform nine experiments, characterizing EnTK overheads, strong and weak scalability, and the performance of two use case implementations, at scale and on production infrastructures. We show how EnTK meets the following general requirements: (i) implementing dedicated abstractions to support the description and execution of ensemble applications; (ii) support for execution on heterogeneous computing infrastructures; (iii) efficient scalability up to O(10^4) tasks; and (iv) fault tolerance. We discuss novel computational capabilities that EnTK enables and the scientific advantages arising thereof. We propose EnTK as an important addition to the suite of tools in support of production scientific computing

    High Energy Physics Forum for Computational Excellence: Working Group Reports (I. Applications Software II. Software Libraries and Tools III. Systems)

    Full text link
    Computing plays an essential role in all aspects of high energy physics. As computational technology evolves rapidly in new directions, and data throughput and volume continue to follow a steep trend-line, it is important for the HEP community to develop an effective response to a series of expected challenges. In order to help shape the desired response, the HEP Forum for Computational Excellence (HEP-FCE) initiated a roadmap planning activity with two key overlapping drivers -- 1) software effectiveness, and 2) infrastructure and expertise advancement. The HEP-FCE formed three working groups, 1) Applications Software, 2) Software Libraries and Tools, and 3) Systems (including systems software), to provide an overview of the current status of HEP computing and to present findings and opportunities for the desired HEP computational roadmap. The final versions of the reports are combined in this document, and are presented along with introductory material.Comment: 72 page

    A Flexible Framework for Asynchronous In Situ and In Transit Analytics for Scientific Simulations

    Get PDF
    International audienceHigh performance computing systems are today composed of tens of thousands of processors and deep memory hierarchies. The next generation of machines will further increase the unbalance between I/O capabilities and processing power. To reduce the pressure on I/Os, the in situ analytics paradigm proposes to process the data as closely as possible to where and when the data are produced. Processing can be embedded in the simulation code, executed asynchronously on helper cores on the same nodes, or performed in transit on staging nodes dedicated to analytics. Today, software environ- nements as well as usage scenarios still need to be investigated before in situ analytics become a standard practice. In this paper we introduce a framework for designing, deploying and executing in situ scenarios. Based on a com- ponent model, the scientist designs analytics workflows by first developing processing components that are next assembled in a dataflow graph through a Python script. At runtime the graph is instantiated according to the execution context, the framework taking care of deploying the application on the target architecture and coordinating the analytics workflows with the simulation execution. Component coordination, zero- copy intra-node communications or inter-nodes data transfers rely on per-node distributed daemons. We evaluate various scenarios performing in situ and in transit analytics on large molecular dynamics systems sim- ulated with Gromacs using up to 1664 cores. We show in particular that analytics processing can be performed on the fraction of resources the simulation does not use well, resulting in a limited impact on the simulation performance (less than 6%). Our more advanced scenario combines in situ and in transit processing to compute a molecular surface based on the Quicksurf algorithm
    • …
    corecore