6,235 research outputs found

    CERN openlab Whitepaper on Future IT Challenges in Scientific Research

    Get PDF
    This whitepaper describes the major IT challenges in scientific research at CERN and several other European and international research laboratories and projects. Each challenge is exemplified through a set of concrete use cases drawn from the requirements of large-scale scientific programs. The paper is based on contributions from many researchers and IT experts of the participating laboratories and also input from the existing CERN openlab industrial sponsors. The views expressed in this document are those of the individual contributors and do not necessarily reflect the view of their organisations and/or affiliates

    A Time-driven Data Placement Strategy for a Scientific Workflow Combining Edge Computing and Cloud Computing

    Full text link
    Compared to traditional distributed computing environments such as grids, cloud computing provides a more cost-effective way to deploy scientific workflows. Each task of a scientific workflow requires several large datasets that are located in different datacenters from the cloud computing environment, resulting in serious data transmission delays. Edge computing reduces the data transmission delays and supports the fixed storing manner for scientific workflow private datasets, but there is a bottleneck in its storage capacity. It is a challenge to combine the advantages of both edge computing and cloud computing to rationalize the data placement of scientific workflow, and optimize the data transmission time across different datacenters. Traditional data placement strategies maintain load balancing with a given number of datacenters, which results in a large data transmission time. In this study, a self-adaptive discrete particle swarm optimization algorithm with genetic algorithm operators (GA-DPSO) was proposed to optimize the data transmission time when placing data for a scientific workflow. This approach considered the characteristics of data placement combining edge computing and cloud computing. In addition, it considered the impact factors impacting transmission delay, such as the band-width between datacenters, the number of edge datacenters, and the storage capacity of edge datacenters. The crossover operator and mutation operator of the genetic algorithm were adopted to avoid the premature convergence of the traditional particle swarm optimization algorithm, which enhanced the diversity of population evolution and effectively reduced the data transmission time. The experimental results show that the data placement strategy based on GA-DPSO can effectively reduce the data transmission time during workflow execution combining edge computing and cloud computing

    DALiuGE: A Graph Execution Framework for Harnessing the Astronomical Data Deluge

    Full text link
    The Data Activated Liu Graph Engine - DALiuGE - is an execution framework for processing large astronomical datasets at a scale required by the Square Kilometre Array Phase 1 (SKA1). It includes an interface for expressing complex data reduction pipelines consisting of both data sets and algorithmic components and an implementation run-time to execute such pipelines on distributed resources. By mapping the logical view of a pipeline to its physical realisation, DALiuGE separates the concerns of multiple stakeholders, allowing them to collectively optimise large-scale data processing solutions in a coherent manner. The execution in DALiuGE is data-activated, where each individual data item autonomously triggers the processing on itself. Such decentralisation also makes the execution framework very scalable and flexible, supporting pipeline sizes ranging from less than ten tasks running on a laptop to tens of millions of concurrent tasks on the second fastest supercomputer in the world. DALiuGE has been used in production for reducing interferometry data sets from the Karl E. Jansky Very Large Array and the Mingantu Ultrawide Spectral Radioheliograph; and is being developed as the execution framework prototype for the Science Data Processor (SDP) consortium of the Square Kilometre Array (SKA) telescope. This paper presents a technical overview of DALiuGE and discusses case studies from the CHILES and MUSER projects that use DALiuGE to execute production pipelines. In a companion paper, we provide in-depth analysis of DALiuGE's scalability to very large numbers of tasks on two supercomputing facilities.Comment: 31 pages, 12 figures, currently under review by Astronomy and Computin

    Towards Interoperable Research Infrastructures for Environmental and Earth Sciences

    Get PDF
    This open access book summarises the latest developments on data management in the EU H2020 ENVRIplus project, which brought together more than 20 environmental and Earth science research infrastructures into a single community. It provides readers with a systematic overview of the common challenges faced by research infrastructures and how a ‘reference model guided’ engineering approach can be used to achieve greater interoperability among such infrastructures in the environmental and earth sciences. The 20 contributions in this book are structured in 5 parts on the design, development, deployment, operation and use of research infrastructures. Part one provides an overview of the state of the art of research infrastructure and relevant e-Infrastructure technologies, part two discusses the reference model guided engineering approach, the third part presents the software and tools developed for common data management challenges, the fourth part demonstrates the software via several use cases, and the last part discusses the sustainability and future directions

    Big data analytics:Computational intelligence techniques and application areas

    Get PDF
    Big Data has significant impact in developing functional smart cities and supporting modern societies. In this paper, we investigate the importance of Big Data in modern life and economy, and discuss challenges arising from Big Data utilization. Different computational intelligence techniques have been considered as tools for Big Data analytics. We also explore the powerful combination of Big Data and Computational Intelligence (CI) and identify a number of areas, where novel applications in real world smart city problems can be developed by utilizing these powerful tools and techniques. We present a case study for intelligent transportation in the context of a smart city, and a novel data modelling methodology based on a biologically inspired universal generative modelling approach called Hierarchical Spatial-Temporal State Machine (HSTSM). We further discuss various implications of policy, protection, valuation and commercialization related to Big Data, its applications and deployment

    Review of the environmental and organisational implications of cloud computing: final report.

    Get PDF
    Cloud computing – where elastic computing resources are delivered over the Internet by external service providers – is generating significant interest within HE and FE. In the cloud computing business model, organisations or individuals contract with a cloud computing service provider on a pay-per-use basis to access data centres, application software or web services from any location. This provides an elasticity of provision which the customer can scale up or down to meet demand. This form of utility computing potentially opens up a new paradigm in the provision of IT to support administrative and educational functions within HE and FE. Further, the economies of scale and increasingly energy efficient data centre technologies which underpin cloud services means that cloud solutions may also have a positive impact on carbon footprints. In response to the growing interest in cloud computing within UK HE and FE, JISC commissioned the University of Strathclyde to undertake a Review of the Environmental and Organisational Implications of Cloud Computing in Higher and Further Education [19]

    Partitioning workflow applications over federated clouds to meet non-functional requirements

    Get PDF
    PhD ThesisWith cloud computing, users can acquire computer resources when they need them on a pay-as-you-go business model. Because of this, many applications are now being deployed in the cloud, and there are many di erent cloud providers worldwide. Importantly, all these various infrastructure providers o er services with di erent levels of quality. For example, cloud data centres are governed by the privacy and security policies of the country where the centre is located, while many organisations have created their own internal \private cloud" to meet security needs. With all this varieties and uncertainties, application developers who decide to host their system in the cloud face the issue of which cloud to choose to get the best operational conditions in terms of price, reliability and security. And the decision becomes even more complicated if their application consists of a number of distributed components, each with slightly di erent requirements. Rather than trying to identify the single best cloud for an application, this thesis considers an alternative approach, that is, combining di erent clouds to meet users' non-functional requirements. Cloud federation o ers the ability to distribute a single application across two or more clouds, so that the application can bene t from the advantages of each one of them. The key challenge for this approach is how to nd the distribution (or deployment) of application components, which can yield the greatest bene ts. In this thesis, we tackle this problem and propose a set of algorithms, and a framework, to partition a work ow-based application over federated clouds in order to exploit the strengths of each cloud. The speci c goal is to split a distributed application structured as a work ow such that the security and reliability requirements of each component are met, whilst the overall cost of execution is minimised. To achieve this, we propose and evaluate a cloud broker for partitioning a work ow application over federated clouds. The broker integrates with the e-Science Central cloud platform to automatically deploy a work ow over public and private clouds. We developed a deployment planning algorithm to partition a large work ow appli- - i - cation across federated clouds so as to meet security requirements and minimise the monetary cost. A more generic framework is then proposed to model, quantify and guide the partitioning and deployment of work ows over federated clouds. This framework considers the situation where changes in cloud availability (including cloud failure) arise during work ow execution

    16th SC@RUG 2019 proceedings 2018-2019

    Get PDF

    16th SC@RUG 2019 proceedings 2018-2019

    Get PDF
    • …
    corecore