22,447 research outputs found

    Predicting Intermediate Storage Performance for Workflow Applications

    Full text link
    Configuring a storage system to better serve an application is a challenging task complicated by a multidimensional, discrete configuration space and the high cost of space exploration (e.g., by running the application with different storage configurations). To enable selecting the best configuration in a reasonable time, we design an end-to-end performance prediction mechanism that estimates the turn-around time of an application using storage system under a given configuration. This approach focuses on a generic object-based storage system design, supports exploring the impact of optimizations targeting workflow applications (e.g., various data placement schemes) in addition to other, more traditional, configuration knobs (e.g., stripe size or replication level), and models the system operation at data-chunk and control message level. This paper presents our experience to date with designing and using this prediction mechanism. We evaluate this mechanism using micro- as well as synthetic benchmarks mimicking real workflow applications, and a real application.. A preliminary evaluation shows that we are on a good track to meet our objectives: it can scale to model a workflow application run on an entire cluster while offering an over 200x speedup factor (normalized by resource) compared to running the actual application, and can achieve, in the limited number of scenarios we study, a prediction accuracy that enables identifying the best storage system configuration

    A Taxonomy of Workflow Management Systems for Grid Computing

    Full text link
    With the advent of Grid and application technologies, scientists and engineers are building more and more complex applications to manage and process large data sets, and execute scientific experiments on distributed resources. Such application scenarios require means for composing and executing complex workflows. Therefore, many efforts have been made towards the development of workflow management systems for Grid computing. In this paper, we propose a taxonomy that characterizes and classifies various approaches for building and executing workflows on Grids. We also survey several representative Grid workflow systems developed by various projects world-wide to demonstrate the comprehensiveness of the taxonomy. The taxonomy not only highlights the design and engineering similarities and differences of state-of-the-art in Grid workflow systems, but also identifies the areas that need further research.Comment: 29 pages, 15 figure

    Enhancing Energy Production with Exascale HPC Methods

    Get PDF
    High Performance Computing (HPC) resources have become the key actor for achieving more ambitious challenges in many disciplines. In this step beyond, an explosion on the available parallelism and the use of special purpose processors are crucial. With such a goal, the HPC4E project applies new exascale HPC techniques to energy industry simulations, customizing them if necessary, and going beyond the state-of-the-art in the required HPC exascale simulations for different energy sources. In this paper, a general overview of these methods is presented as well as some specific preliminary results.The research leading to these results has received funding from the European Union's Horizon 2020 Programme (2014-2020) under the HPC4E Project (www.hpc4e.eu), grant agreement n° 689772, the Spanish Ministry of Economy and Competitiveness under the CODEC2 project (TIN2015-63562-R), and from the Brazilian Ministry of Science, Technology and Innovation through Rede Nacional de Pesquisa (RNP). Computer time on Endeavour cluster is provided by the Intel Corporation, which enabled us to obtain the presented experimental results in uncertainty quantification in seismic imagingPostprint (author's final draft

    Many-Task Computing and Blue Waters

    Full text link
    This report discusses many-task computing (MTC) generically and in the context of the proposed Blue Waters systems, which is planned to be the largest NSF-funded supercomputer when it begins production use in 2012. The aim of this report is to inform the BW project about MTC, including understanding aspects of MTC applications that can be used to characterize the domain and understanding the implications of these aspects to middleware and policies. Many MTC applications do not neatly fit the stereotypes of high-performance computing (HPC) or high-throughput computing (HTC) applications. Like HTC applications, by definition MTC applications are structured as graphs of discrete tasks, with explicit input and output dependencies forming the graph edges. However, MTC applications have significant features that distinguish them from typical HTC applications. In particular, different engineering constraints for hardware and software must be met in order to support these applications. HTC applications have traditionally run on platforms such as grids and clusters, through either workflow systems or parallel programming systems. MTC applications, in contrast, will often demand a short time to solution, may be communication intensive or data intensive, and may comprise very short tasks. Therefore, hardware and software for MTC must be engineered to support the additional communication and I/O and must minimize task dispatch overheads. The hardware of large-scale HPC systems, with its high degree of parallelism and support for intensive communication, is well suited for MTC applications. However, HPC systems often lack a dynamic resource-provisioning feature, are not ideal for task communication via the file system, and have an I/O system that is not optimized for MTC-style applications. Hence, additional software support is likely to be required to gain full benefit from the HPC hardware

    BioWorkbench: A High-Performance Framework for Managing and Analyzing Bioinformatics Experiments

    Get PDF
    Advances in sequencing techniques have led to exponential growth in biological data, demanding the development of large-scale bioinformatics experiments. Because these experiments are computation- and data-intensive, they require high-performance computing (HPC) techniques and can benefit from specialized technologies such as Scientific Workflow Management Systems (SWfMS) and databases. In this work, we present BioWorkbench, a framework for managing and analyzing bioinformatics experiments. This framework automatically collects provenance data, including both performance data from workflow execution and data from the scientific domain of the workflow application. Provenance data can be analyzed through a web application that abstracts a set of queries to the provenance database, simplifying access to provenance information. We evaluate BioWorkbench using three case studies: SwiftPhylo, a phylogenetic tree assembly workflow; SwiftGECKO, a comparative genomics workflow; and RASflow, a RASopathy analysis workflow. We analyze each workflow from both computational and scientific domain perspectives, by using queries to a provenance and annotation database. Some of these queries are available as a pre-built feature of the BioWorkbench web application. Through the provenance data, we show that the framework is scalable and achieves high-performance, reducing up to 98% of the case studies execution time. We also show how the application of machine learning techniques can enrich the analysis process

    11th German Conference on Chemoinformatics (GCC 2015) : Fulda, Germany. 8-10 November 2015.

    Get PDF
    • …
    corecore