85 research outputs found

    Plasma Edge Kinetic-MHD Modeling in Tokamaks Using Kepler Workflow for Code Coupling, Data Management and Visualization

    Get PDF
    A new predictive computer simulation tool targeting the development of the H-mode pedestal at the plasma edge in tokamaks and the triggering and dynamics of edge localized modes (ELMs) is presented in this report. This tool brings together, in a coordinated and effective manner, several first-principles physics simulation codes, stability analysis packages, and data processing and visualization tools. A Kepler workflow is used in order to carry out an edge plasma simulation that loosely couples the kinetic code, XGC0, with an ideal MHD linear stability analysis code, ELITE, and an extended MHD initial value code such as M3D or NIMROD. XGC0 includes the neoclassical ion-electron-neutral dynamics needed to simulate pedestal growth near the separatrix. The Kepler workflow processes the XGC0 simulation results into simple images that can be selected and displayed via the Dashboard, a monitoring tool implemented in AJAX allowing the scientist to track computational resources, examine running and archived jobs, and view key physics data, all within a standard Web browser. The XGC0 simulation is monitored for the conditions needed to trigger an ELM crash by periodically assessing the edge plasma pressure and current density profiles using the ELITE code. If an ELM crash is triggered, the Kepler workflow launches the M3D code on a moderate-size Opteron cluster to simulate the nonlinear ELM crash and to compute the relaxation of plasma profiles after the crash. This process is monitored through periodic outputs of plasma fluid quantities that are automatically visualized with AVS/Express and may be displayed on the Dashboard. Finally, the Kepler workflow archives all data outputs and processed images using HPSS, as well as provenance information about the software and hardware used to create the simulation. The complete process of preparing, executing and monitoring a coupled-code simulation of the edge pressure pedestal buildup and the ELM cycle using the Kepler scientific workflow system is described in this paper

    Finding and sharing GIS methods based on the questions they answer

    Get PDF
    Geographic information has become central for data scientists of many disciplines to put their analyses into a spatio-temporal perspective. However, just as the volume and variety of data sources on the Web grow, it becomes increasingly harder for analysts to be familiar with all the available geospatial tools, including toolboxes in Geographic Information Systems (GIS), R packages, and Python modules. Even though the semantics of the questions answered by these tools can be broadly shared, tools and data sources are still divided by syntax and platform-specific technicalities. It would, therefore, be hugely beneficial for information science if analysts could simply ask questions in generic and familiar terms to obtain the tools and data necessary to answer them. In this article, we systematically investigate the analytic questions that lie behind a range of common GIS tools, and we propose a semantic framework to match analytic questions and tools that are capable of answering them. To support the matching process, we define a tractable subset of SPARQL, the query language of the Semantic Web, and we propose and test an algorithm for computing query containment. We illustrate the identification of tools to answer user questions on a set of common user requests

    What the Web Has Done for Scientific Data – and What It Hasn’t

    Full text link
    The web, together with database technology, has radically changed the way scientific research is conducted. Scientists now have access to an unprecedented quantity and range of data, and the speed and ease of communication of all forms of scientific data has increased hugely. This change has come at a price. Web and database technology no longer support some of the desirable properties of paper publication, and it has introduced new problems in maintaining the scientific record. This brief paper is an examination of some of these issues

    Building Science Gateways for Analysing Molecular Docking Results Using a Generic Framework and Methodology

    Get PDF
    Molecular docking and virtual screening experiments require large computational and data resources and high-level user interfaces in the form of science gateways. While science gateways supporting such experiments are relatively common, there is a clearly identified need to design and implement more complex environments for further analysis of docking results. This paper describes a generic framework and a related methodology that supports the efficient development of such environments. The framework is modular enabling the reuse of already existing components. The methodology, which proposes three techniques that the development team can use, is agile and encourages active participation of end-users. Based on the framework and methodology, two prototype implementations of science-gateway-based docking environments are presented and evaluated. The first system recommends a receptor-ligand pair for the next docking experiment, and the second filters docking results based on ligand properties

    High Speed Simulation Analytics

    Get PDF
    Simulation, especially Discrete-event simulation (DES) and Agent-based simulation (ABS), is widely used in industry to support decision making. It is used to create predictive models or Digital Twins of systems used to analyse what-if scenarios, perform sensitivity analytics on data and decisions and even to optimise the impact of decisions. Simulation-based Analytics, or just Simulation Analytics, therefore has a major role to play in Industry 4.0. However, a major issue in Simulation Analytics is speed. Extensive, continuous experimentation demanded by Industry 4.0 can take a significant time, especially if many replications are required. This is compounded by detailed models as these can take a long time to simulate. Distributed Simulation (DS) techniques use multiple computers to either speed up the simulation of a single model by splitting it across the computers and/or to speed up experimentation by running experiments across multiple computers in parallel. This chapter discusses how DS and Simulation Analytics, as well as concepts from contemporary e-Science, can be combined to contribute to the speed problem by creating a new approach called High Speed Simulation Analytics. We present a vision of High Speed Simulation Analytics to show how this might be integrated with the future of Industry 4.0

    A lightweight, flow-based toolkit for parallel and distributed bioinformatics pipelines

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Bioinformatic analyses typically proceed as chains of data-processing tasks. A pipeline, or 'workflow', is a well-defined protocol, with a specific structure defined by the topology of data-flow interdependencies, and a particular functionality arising from the data transformations applied at each step. In computer science, the dataflow programming (DFP) paradigm defines software systems constructed in this manner, as networks of message-passing components. Thus, bioinformatic workflows can be naturally mapped onto DFP concepts.</p> <p>Results</p> <p>To enable the flexible creation and execution of bioinformatics dataflows, we have written a modular framework for parallel pipelines in Python ('PaPy'). A PaPy workflow is created from re-usable components connected by data-pipes into a directed acyclic graph, which together define nested higher-order map functions. The successive functional transformations of input data are evaluated on flexibly pooled compute resources, either local or remote. Input items are processed in batches of adjustable size, all flowing one to tune the trade-off between parallelism and lazy-evaluation (memory consumption). An add-on module ('NuBio') facilitates the creation of bioinformatics workflows by providing domain specific data-containers (<it>e.g</it>., for biomolecular sequences, alignments, structures) and functionality (<it>e.g</it>., to parse/write standard file formats).</p> <p>Conclusions</p> <p>PaPy offers a modular framework for the creation and deployment of parallel and distributed data-processing workflows. Pipelines derive their functionality from user-written, data-coupled components, so PaPy also can be viewed as a lightweight toolkit for extensible, flow-based bioinformatics data-processing. The simplicity and flexibility of distributed PaPy pipelines may help users bridge the gap between traditional desktop/workstation and grid computing. PaPy is freely distributed as open-source Python code at <url>http://muralab.org/PaPy</url>, and includes extensive documentation and annotated usage examples.</p

    Performance Analysis Tool for HPC and Big Data Applications on Scientific Clusters

    Full text link
    Big data is prevalent in HPC computing. Many HPC projects rely on complex workflows to analyze terabytes or petabytes of data. These workflows often require running over thousands of CPU cores and performing simultaneous data accesses, data movements, and computation. It is challenging to analyze the performance involving terabytes or petabytes of workflow data or measurement data of the executions, from complex workflows over a large number of nodes and multiple parallel task executions. To help identify performance bottlenecks or debug the performance issues in large-scale scientific applications and scientific clusters, we have developed a performance analysis framework, using state-ofthe- art open-source big data processing tools. Our tool can ingest system logs and application performance measurements to extract key performance features, and apply the most sophisticated statistical tools and data mining methods on the performance data. It utilizes an efficient data processing engine to allow users to interactively analyze a large amount of different types of logs and measurements. To illustrate the functionality of the big data analysis framework, we conduct case studies on the workflows from an astronomy project known as the Palomar Transient Factory (PTF) and the job logs from the genome analysis scientific cluster
    • …
    corecore