19 research outputs found
Capturing the "Whole Tale" of Computational Research: Reproducibility in Computing Environments
We present an overview of the recently funded "Merging Science and
Cyberinfrastructure Pathways: The Whole Tale" project (NSF award #1541450). Our
approach has two nested goals: 1) deliver an environment that enables
researchers to create a complete narrative of the research process including
exposure of the data-to-publication lifecycle, and 2) systematically and
persistently link research publications to their associated digital scholarly
objects such as the data, code, and workflows. To enable this, Whole Tale will
create an environment where researchers can collaborate on data, workspaces,
and workflows and then publish them for future adoption or modification.
Published data and applications will be consumed either directly by users using
the Whole Tale environment or can be integrated into existing or future domain
Science Gateways
Recommended from our members
Structured Composition of Dataflow and Control-Flow for Reusable and Robust Scientific Workflows
Data-centric scientific workflows are often modeled as dataflow process networks. The simplicity of the dataflow framework facilitates workflow design, analysis, and optimization. However, some workflow tasks are particularly ''control-flow intensive'', e.g., procedures to make workflows more fault-tolerant and adaptive in an unreliable, distributed computing environment. Modeling complex control-flow directly within a dataflow framework often leads to overly complicated workflows that are hard to comprehend, reuse, schedule, and maintain. In this paper, we develop a framework that allows a structured embedding of control-flow intensive subtasks within dataflow process networks. In this way, we can seamlessly handle complex control-flows without sacrificing the benefits of dataflow. We build upon a flexible actor-oriented modeling and design approach and extend it with (actor) frames and (workflow) templates. A frame is a placeholder for an (existing or planned) collection of components with similar function and signature. A template partially specifies the behavior of a subworkflow by leaving ''holes'' (i.e., frames) in the subworkflow definition. Taken together, these abstraction mechanisms facilitate the separation and structured re-combination of control-flow and dataflow in scientific workflow applications. We illustrate our approach with a real-world scientific workflow from the astrophysics domain. This data-intensive workflow requires remote execution and file transfer in a semi-reliable environment. For such work-flows, we propose a 3-layered architecture: The top-level, typically a dataflow process network, includes Generic Data Transfer (GDT) frames and Generic remote eXecution (GX) frames. At the second level, the user can specialize the behavior of these generic components by embedding a suitable template (here: transducer templates for control-flow intensive tasks). At the third level, frames inside the transducer template are specialized by embedding the desired implementation. Our approach yields workflows that are more robust (fault-tolerance strategies can be define by control-flow driven transducer templates) and at the same time more reuseable, since the embedding of frames and templates yields more structured and modular workflows
Modeling views in the layered view model for XML using UML
In data engineering, view formalisms are used to provide flexibility to users and user applications by allowing them to extract and elaborate data from the stored data sources. Conversely, since the introduction of Extensible Markup Language (XML), it is fast emerging as the dominant standard for storing, describing, and interchanging data among various web and heterogeneous data sources. In combination with XML Schema, XML provides rich facilities for defining and constraining user-defined data semantics and properties, a feature that is unique to XML. In this context, it is interesting to investigate traditional database features, such as view models and view design techniques for XML. However, traditional view formalisms are strongly coupled to the data language and its syntax, thus it proves to be a difficult task to support views in the case of semi-structured data models. Therefore, in this paper we propose a Layered View Model (LVM) for XML with conceptual and schemata extensions. Here our work is three-fold; first we propose an approach to separate the implementation and conceptual aspects of the views that provides a clear separation of concerns, thus, allowing analysis and design of views to be separated from their implementation. Secondly, we define representations to express and construct these views at the conceptual level. Thirdly, we define a view transformation methodology for XML views in the LVM, which carries out automated transformation to a view schema and a view query expression in an appropriate query language. Also, to validate and apply the LVM concepts, methods and transformations developed, we propose a view-driven application development framework with the flexibility to develop web and database applications for XML, at varying levels of abstraction
TOLKIN – Tree of Life Knowledge and Information Network: Filling a Gap for Collaborative Research in Biological Systematics
The development of biological informatics infrastructure capable of supporting growing data management and analysis environments is an increasing need within the systematics biology community. Although significant progress has been made in recent years on developing new algorithms and tools for analyzing and visualizing large phylogenetic data and trees, implementation of these resources is often carried out by bioinformatics experts, using one-off scripts. Therefore, a gap exists in providing data management support for a large set of non-technical users. The TOLKIN project (Tree of Life Knowledge and Information Network) addresses this need by supporting capabilities to manage, integrate, and provide public access to molecular, morphological, and biocollections data and research outcomes through a collaborative, web application. This data management framework allows aggregation and import of sequences, underlying documentation about their source, including vouchers, tissues, and DNA extraction. It combines features of LIMS and workflow environments by supporting management at the level of individual observations, sequences, and specimens, as well as assembly and versioning of data sets used in phylogenetic inference. As a web application, the system provides multi-user support that obviates current practices of sharing data sets as files or spreadsheets via email
BBQ: A Visual Interface for Integrated Browsing and Querying of XML
In this paper we present BBQ (Blended Browsing and Querying), a graphic user interface for seamlessly browsing and querying XML data sources. BBQ displays the structure of multiple data sources using a paradigm that resembles drilling-down in Windows' directory structures. BBQ allows queries incorporating one or more of the sources. Queries are constructed in a query-by-example (QBE) manner, where DTDs play the role of schema. The queries are arbitrary conjunctive queries with GROUPBY, and their results can be subsequently used and refined. To support query refinement, BBQ introduces virtual result views: standalone virtual data sources that (i) are constructed by user queries, from elements in other data sources, and (ii) can be used in subsequent queries as first-class data sources themselves. Furthermore, BBQ allows users to query data sources with loose or incomplete schema, and can augment such schema with a DTD inference mechanism