6,285 research outputs found

    The Research Object Suite of Ontologies: Sharing and Exchanging Research Data and Methods on the Open Web

    Get PDF
    Research in life sciences is increasingly being conducted in a digital and online environment. In particular, life scientists have been pioneers in embracing new computational tools to conduct their investigations. To support the sharing of digital objects produced during such research investigations, we have witnessed in the last few years the emergence of specialized repositories, e.g., DataVerse and FigShare. Such repositories provide users with the means to share and publish datasets that were used or generated in research investigations. While these repositories have proven their usefulness, interpreting and reusing evidence for most research results is a challenging task. Additional contextual descriptions are needed to understand how those results were generated and/or the circumstances under which they were concluded. Because of this, scientists are calling for models that go beyond the publication of datasets to systematically capture the life cycle of scientific investigations and provide a single entry point to access the information about the hypothesis investigated, the datasets used, the experiments carried out, the results of the experiments, the people involved in the research, etc. In this paper we present the Research Object (RO) suite of ontologies, which provide a structured container to encapsulate research data and methods along with essential metadata descriptions. Research Objects are portable units that enable the sharing, preservation, interpretation and reuse of research investigation results. The ontologies we present have been designed in the light of requirements that we gathered from life scientists. They have been built upon existing popular vocabularies to facilitate interoperability. Furthermore, we have developed tools to support the creation and sharing of Research Objects, thereby promoting and facilitating their adoption.Comment: 20 page

    Data Workflow - A Workflow Model for Continuous Data Processing

    Get PDF
    Online data or streaming data are getting more and more important for enterprise information systems, e.g. by integrating sensor data and workflows. The continuous flow of data provided e.g. by sensors requires new workflow models addressing the data perspective of these applications, since continuous data is potentially infinite while business process instances are always finite.\ud In this paper a formal workflow model is proposed with data driven coordination and explicating properties of the continuous data processing. These properties can be used to optimize data workflows, i.e., reducing the computational power for processing the workflows in an engine by reusing intermediate processing results in several workflows

    AiiDA: Automated Interactive Infrastructure and Database for Computational Science

    Full text link
    Computational science has seen in the last decades a spectacular rise in the scope, breadth, and depth of its efforts. Notwithstanding this prevalence and impact, it is often still performed using the renaissance model of individual artisans gathered in a workshop, under the guidance of an established practitioner. Great benefits could follow instead from adopting concepts and tools coming from computer science to manage, preserve, and share these computational efforts. We illustrate here our paradigm sustaining such vision, based around the four pillars of Automation, Data, Environment, and Sharing. We then discuss its implementation in the open-source AiiDA platform (http://www.aiida.net), that has been tuned first to the demands of computational materials science. AiiDA's design is based on directed acyclic graphs to track the provenance of data and calculations, and ensure preservation and searchability. Remote computational resources are managed transparently, and automation is coupled with data storage to ensure reproducibility. Last, complex sequences of calculations can be encoded into scientific workflows. We believe that AiiDA's design and its sharing capabilities will encourage the creation of social ecosystems to disseminate codes, data, and scientific workflows.Comment: 30 pages, 7 figure

    MOLNs: A cloud platform for interactive, reproducible and scalable spatial stochastic computational experiments in systems biology using PyURDME

    Full text link
    Computational experiments using spatial stochastic simulations have led to important new biological insights, but they require specialized tools, a complex software stack, as well as large and scalable compute and data analysis resources due to the large computational cost associated with Monte Carlo computational workflows. The complexity of setting up and managing a large-scale distributed computation environment to support productive and reproducible modeling can be prohibitive for practitioners in systems biology. This results in a barrier to the adoption of spatial stochastic simulation tools, effectively limiting the type of biological questions addressed by quantitative modeling. In this paper, we present PyURDME, a new, user-friendly spatial modeling and simulation package, and MOLNs, a cloud computing appliance for distributed simulation of stochastic reaction-diffusion models. MOLNs is based on IPython and provides an interactive programming platform for development of sharable and reproducible distributed parallel computational experiments

    Harnessing the Power of Many: Extensible Toolkit for Scalable Ensemble Applications

    Full text link
    Many scientific problems require multiple distinct computational tasks to be executed in order to achieve a desired solution. We introduce the Ensemble Toolkit (EnTK) to address the challenges of scale, diversity and reliability they pose. We describe the design and implementation of EnTK, characterize its performance and integrate it with two distinct exemplar use cases: seismic inversion and adaptive analog ensembles. We perform nine experiments, characterizing EnTK overheads, strong and weak scalability, and the performance of two use case implementations, at scale and on production infrastructures. We show how EnTK meets the following general requirements: (i) implementing dedicated abstractions to support the description and execution of ensemble applications; (ii) support for execution on heterogeneous computing infrastructures; (iii) efficient scalability up to O(10^4) tasks; and (iv) fault tolerance. We discuss novel computational capabilities that EnTK enables and the scientific advantages arising thereof. We propose EnTK as an important addition to the suite of tools in support of production scientific computing

    Generating eScience Workflows from Statistical Analysis of Prior Data

    Get PDF
    A number of workflow design tools have been developed specifically to enable easy graphical specification of workflows that ensure systematic scientific data capture and analysis and precise provenance information. We believe that an important component that is missing from these existing workflow specification and enactment systems is integration with tools that enable prior detailed analysis of the existing data - and in particular statistical analysis. By thoroughly analyzing the existing relevant datasets first, it is possible to determine precisely where the existing data is sparse or insufficient and what further experimentation is required. Introducing statistical analysis to experimental design will reduce duplication and costs associated with fruitless experimentation and maximize opportunities for scientific breakthroughs. In this paper we describe a workflow specification system that we have developed for a particular eScience application (fuel cell optimization). Experimental workflow instances are generated as a result of detailed statistical analysis and interactive exploration of the existing datasets. This is carried out through a graphical data exploration interface that integrates the widely-used open source statistical analysis software package, R, as a web service
    • 

    corecore