14 research outputs found
A new approach for publishing workflows: abstractions, standards, and linked data
In recent years, a variety of systems have been developed that export the workflows used to analyze data and make them part of published articles. We argue that the workflows that are published in current approaches are dependent on the specific codes used for execution, the specific workflow system used, and the specific workflow catalogs where they are published. In this paper, we describe a new approach that addresses these shortcomings and makes workflows more reusable through: 1) the use of abstract workflows to complement executable workflows to make them reusable when the execution environment is different, 2) the publication of both abstract and executable workflows using standards such as the Open Provenance Model that can be imported by other workflow systems, 3) the publication of workflows as Linked Data that results in open web accessible workflow repositories. We illustrate this approach using a complex workflow that we re-created from an influential publication that describes the generation of 'drugomes'
A Provenance-Enabled Service for News and Blog Aggregation
Actualmente existe una gran cantidad de información publicada en la Web en forma de noticias online y blogs. Los usuarios que requieren usar dicha información para distintos propósitos (desde intereses personales a toma de decisiones profesionales) necesitan saber de su procedencia y su evolución con el fin de determinar su calidad y aplicarle un determinado grado de confianza.
Se han desarrollado diversos modelos de provenance con el propósito de representar y gestionar el historial de los contenidos en una gran cantidad de dominios distintos. Sin embargo aplicarlos en escenarios reales plantea retos metodológicos y tecnológicos aún no resueltos, tal y como ha señalado el W3C Provenance Incubator Group. En esta tesis definimos un marco de comparación para analizar las distintas propuestas, y justificamos la selección de la más apropiada, el “Open Provenance Model” (OPM), como referencia para modelar un escenario real de noticias online y blogs, en un contexto de turismo, para una de las empresas de comunicaciones y publicaciones más importantes de nuestro país: PRISACOM. Además, en este documento se define un servicio de anotación y recuperación de provenance usando el modelo previo como referencia, describiendo las decisiones de modelado y de diseño efectuadas en el contexto del uso y gestión de provenance para una serie de plataformas pertenecientes a la compañía. Finalmente, se presenta la evaluación realizada, con unos resultados prometedores que solucionan los retos planteados en los objetivos del proyecto. Señalar también que nuestro caso de uso es una contribución adicional, porque muestra cómo el OPM puede ser utilizado fuera de dominios científicos, que es donde ha sido comúnmente aplicado hasta el momento.
Durante el periodo en el que se ha realizado este trabajo, el autor ha sido miembro del W3C Provenance Incubator Group (realizando múltiples contribuciones a su informe final ), y ha participado en la discusión acerca de los mappings entre los modelos de provenance de mayor aceptación en la comunidad científic
Extending DCAM for Metadata Provenance
The Metadata Provenance Task Group aims to define a data model that allows for making assertions about description sets. Creating a shared model of the data elements required to describe an aggregation of metadata statements allows to collectively import, access, use and publish facts about the quality, rights, timeliness, data source type, trust situation, etc. of the described statements. In this paper we outline the preliminary model created by the task group, together with first examples that demonstrate how the model is to be used
Towards workflow ecosystems through standard representations
Workflows are increasingly used to manage and share scientific
computations and methods. Workflow tools can be used to design,
validate, execute and visualize scientific workflows and their
execution results. Other tools manage workflow libraries or mine
their contents. There has been a lot of recent work on workflow
system integration as well as common workflow interlinguas, but
the interoperability among workflow systems remains a challenge.
Ideally, these tools would form a workflow ecosystem such that it
should be possible to create a workflow with a tool, execute it
with another, visualize it with another, and use yet another tool to
mine a repository of such workflows or their executions. In this
paper, we describe our approach to create a workflow ecosystem
through the use of standard models for provenance (OPM and
W3C PROV) and extensions (P-PLAN and OPMW) to represent
workflows. The ecosystem integrates different workflow tools
with diverse functions (workflow generation, execution,
browsing, mining, and visualization) created by a variety of
research groups. This is, to our knowledge, the first time that such
a variety of workflow systems and functions are integrated
Provenance and Trust
The interest in data provenance and trust has been increasing in the last years and the community is putting now a lot of effort in finding a standard model representation. The W3C provenance incubator group is focused on this area, analyzing different provenance models and making mappings between them and the Open Provenance Model (OPM)[1], which is the model they intend to make the standard.
We want to develop a provenance system based in OPM and a trust algorithm from that provenance information. Our aim will be a platform that will not store the contents generated by the users, but it will store all the references to them, the opinions from the users, information from social networks, etc. to obtain semantic information from the Web. In this context, being able to predict the trust of a source or being able to track the content we’ve generated is a great choice for any use
Common motifs in scientific workflows: An empirical analysis
While workflow technology has gained momentum in the last decade as a means for specifying and enacting computational experiments in modern science, reusing and repurposing existing workflows to build new scientific experiments is still a daunting task. This is partly due to the difficulty that scientists experience when attempting to understand existing workflows, which contain several data preparation and adaptation steps in addition to the scientifically significant analysis steps. One way to tackle the understandability problem is through providing abstractions that give a high-level view of activities undertaken within workflows. As a first step towards abstractions, we report in this paper on the results of a manual analysis performed over a set of real-world scientific workflows from Taverna and Wings systems. Our analysis has resulted in a set of scientific workflow motifs that outline i) the kinds of data intensive activities that are observed in workflows (data oriented motifs), and ii) the different manners in which activities are implemented within workflows (workflow oriented motifs). These motifs can be useful to inform workflow designers on the good and bad practices for workflow development, to inform the design of automated tools for the generation of workflow abstractions, etc
Bernstein polynomials in element-free Galerkin method
In the recent decades, meshless methods (MMs), like the element-free Galerkin method (EFGM), have been widely studied and interesting results have been reached when solving partial differential equations. However, such solutions show a problem around boundary conditions, where the accuracy is not adequately achieved. This is caused by the use of moving least squares or residual kernel particle method methods to obtain the shape functions needed in MM, since such methods are good enough in the inner of the integration domains, but not so accurate in boundaries. This way, Bernstein curves, which are a partition of unity themselves,can solve this problem with the same accuracy in the inner area of the domain and at their boundaries
Quantifying Reproducibility in Computational Biology: The Case of the Tuberculosis Drugome
How easy is it to reproduce the results found in a typical computational biology paper? Either through experience or intuition the reader will already know that the answer is with difficulty or not at all. In this paper we attempt to quantify this difficulty by reproducing a previously published paper for different classes of users (ranging from users with little expertise to domain experts) and suggest ways in which the situation might be improved. Quantification is achieved by estimating the time required to reproduce each of the steps in the method described in the original paper and make them part of an explicit workflow that reproduces the original results. Reproducing the method took several months of effort, and required using new versions and new software that posed challenges to reconstructing and validating the results. The quantification leads to “reproducibility maps” that reveal that novice researchers would only be able to reproduce a few of the steps in the method, and that only expert researchers with advance knowledge of the domain would be able to reproduce the method in its entirety. The workflow itself is published as an online resource together with supporting software and data. The paper concludes with a brief discussion of the complexities of requiring reproducibility in terms of cost versus benefit, and a desiderata with our observations and guidelines for improving reproducibility. This has implications not only in reproducing the work of others from published papers, but reproducing work from one’s own laboratory
Transforming meteorological data into linked data
This paper describes the process followed in order to make some of the public meterological data from the Agencia Estatal de Meteorología (AEMET, Spanish Meteorological Office) available as Linked Data. The method followed has been already used to publish geographical, statistical, and leisure data. The data selected for publication are generated every ten minutes by the 250 automatic stations that belong to AEMET and that are deployed across Spain. These data are available as spreadsheets in the AEMET data catalog, and contain more than twenty types of measurements per station. Spreadsheets are retrieved from the website, processed with Python scripts, transformed to RDF according to an ontology network about meteorology that reuses the W3C SSN Ontology, published in a triple store and visualized in maps with Map4rdf
Workflow reuse in practice: a study of neuroimaging pipeline users
Workflow reuse is a major benefit of workflow systems and shared workflow repositories, but there are barely any studies that quantify the degree of reuse of workflows or the practical barriers that may stand in the way of successful reuse. In our own work, we hypothesize that defining workflow fragments improves reuse, since end-to-end workflows may be very specific and only partially reusable by others. This paper reports on a study of the current use of workflows and workflow fragments in labs that use the LONI Pipeline, a popular workflow system used mainly for neuroimaging research that enables users to define and reuse workflow fragments. We present an overview of the benefits of workflows and workflow fragments reported by users in informal discussions. We also report on a survey of researchers in a lab that has the LONI Pipeline installed, asking them about their experiences with reuse of workflow fragments and the actual benefits they perceive. This leads to quantifiable indicators of the reuse of workflows and workflow fragments in practice. Finally, we discuss barriers to further adoption of workflow fragments and workflow reuse that motivate further work