Search CORE

25 research outputs found

Capturing the "Whole Tale" of Computational Research: Reproducibility in Computing Environments

Author: Chard Kyle
Gaffney Niall
Jones Matthew B.
Ludaescher Bertram
Nabrzyski Jaroslaw
Stodden Victoria
Turk Matthew
Publication venue
Publication date: 28/10/2016
Field of study

We present an overview of the recently funded "Merging Science and Cyberinfrastructure Pathways: The Whole Tale" project (NSF award #1541450). Our approach has two nested goals: 1) deliver an environment that enables researchers to create a complete narrative of the research process including exposure of the data-to-publication lifecycle, and 2) systematically and persistently link research publications to their associated digital scholarly objects such as the data, code, and workflows. To enable this, Whole Tale will create an environment where researchers can collaborate on data, workspaces, and workflows and then publish them for future adoption or modification. Published data and applications will be consumed either directly by users using the Whole Tale environment or can be integrated into existing or future domain Science Gateways

arXiv.org e-Print Archive

The Francis Crick Institute

Modeling views in the layered view model for XML using UML

Author: Abiteboul S.
Abiteboul S.
Abiteboul S.
Abiteboul S.
Abiteboul S.
AIIM
Augurusa E.
Balsters
Benjelloun O.
Braga D.
Chang E.
Chang E.
Chen Y. B.
Chen Y. B.
Cluet S.
Codd E. F.
Date C. J.
Dillon T. S.
Do H. H.
Doorn J. H.
Elizabeth Chang
Elmasri R.
Feng L.
Gaevic D.
Gardner W.
Goldman R.
Gopalkrishnan V.
Graham I.
Gupta A.
HyOntUse
ITEC
KAON
Ling Feng
Ludaescher B.
Ludaescher B.
Mohania M. K.
Munroe
Nassis V.
OMG
Path
Query
Rajugan E.
Rajugan R.
Rajugan R.
Rajugan R.
Rajugan R.
Rajugan Rajagopalapillai
SPARQL
Steele R.
Tharam S. Dillon
Trujillo J.
Uceda-Sosa R.
Volz R.
Volz R.
Warmer J. B.
Wouters C.
Wouters C.
Wouters C.
Wouters C.
Xyleme
Zhuge Y.
Publication venue: Troubador Publishing Ltd.
Publication date: 01/01/2006
Field of study

In data engineering, view formalisms are used to provide flexibility to users and user applications by allowing them to extract and elaborate data from the stored data sources. Conversely, since the introduction of Extensible Markup Language (XML), it is fast emerging as the dominant standard for storing, describing, and interchanging data among various web and heterogeneous data sources. In combination with XML Schema, XML provides rich facilities for defining and constraining user-defined data semantics and properties, a feature that is unique to XML. In this context, it is interesting to investigate traditional database features, such as view models and view design techniques for XML. However, traditional view formalisms are strongly coupled to the data language and its syntax, thus it proves to be a difficult task to support views in the case of semi-structured data models. Therefore, in this paper we propose a Layered View Model (LVM) for XML with conceptual and schemata extensions. Here our work is three-fold; first we propose an approach to separate the implementation and conceptual aspects of the views that provides a clear separation of concerns, thus, allowing analysis and design of views to be separated from their implementation. Secondly, we define representations to express and construct these views at the conceptual level. Thirdly, we define a view transformation methodology for XML views in the LVM, which carries out automated transformation to a view schema and a view query expression in an appropriate query language. Also, to validate and apply the LVM concepts, methods and transformations developed, we propose a view-driven application development framework with the flexibility to develop web and database applications for XML, at varying levels of abstraction

CiteSeerX

Crossref

OPUS - University of Technology Sydney

University of Twente Research Information

espace@Curtin

Recommended from our members

Structured Composition of Dataflow and Control-Flow for Reusable and Robust Scientific Workflows

Author: Bowers S
Critchlow T
Ludaescher B
Ngu A
Publication venue: Lawrence Livermore National Laboratory
Publication date: 07/09/2005
Field of study

Data-centric scientific workflows are often modeled as dataflow process networks. The simplicity of the dataflow framework facilitates workflow design, analysis, and optimization. However, some workflow tasks are particularly ''control-flow intensive'', e.g., procedures to make workflows more fault-tolerant and adaptive in an unreliable, distributed computing environment. Modeling complex control-flow directly within a dataflow framework often leads to overly complicated workflows that are hard to comprehend, reuse, schedule, and maintain. In this paper, we develop a framework that allows a structured embedding of control-flow intensive subtasks within dataflow process networks. In this way, we can seamlessly handle complex control-flows without sacrificing the benefits of dataflow. We build upon a flexible actor-oriented modeling and design approach and extend it with (actor) frames and (workflow) templates. A frame is a placeholder for an (existing or planned) collection of components with similar function and signature. A template partially specifies the behavior of a subworkflow by leaving ''holes'' (i.e., frames) in the subworkflow definition. Taken together, these abstraction mechanisms facilitate the separation and structured re-combination of control-flow and dataflow in scientific workflow applications. We illustrate our approach with a real-world scientific workflow from the astrophysics domain. This data-intensive workflow requires remote execution and file transfer in a semi-reliable environment. For such work-flows, we propose a 3-layered architecture: The top-level, typically a dataflow process network, includes Generic Data Transfer (GDT) frames and Generic remote eXecution (GX) frames. At the second level, the user can specialize the behavior of these generic components by embedding a suitable template (here: transducer templates for control-flow intensive tasks). At the third level, frames inside the transducer template are specialized by embedding the desired implementation. Our approach yields workflows that are more robust (fault-tolerance strategies can be define by control-flow driven transducer templates) and at the same time more reuseable, since the embedding of frames and templates yields more structured and modular workflows

UNT (University of North Texas) Digital Library

TOLKIN – Tree of Life Knowledge and Information Network: Filling a Gap for Collaborative Research in Biological Systematics

Author: B Ludaescher
Christopher A. Dell
CS Parr
DE Soltis
DR Maddison
DS Carneiro-Torres
E Pennisi
ES Lander
Greg H. Traub
H-J Esser
HD Zhimin W
J Wieczorek
JC Venter
Jin Koh
M Gross
MA O'Leary
MB Jones
Nestor Santiago
Nico Cellinese
PH Pahlevani
RA Vos
Reed S. Beaman
Robert DeSalle
SD Kahn
T Oinn
TJ Vision
WK Michener
Publication venue: Public Library of Science
Publication date: 01/01/2012
Field of study

The development of biological informatics infrastructure capable of supporting growing data management and analysis environments is an increasing need within the systematics biology community. Although significant progress has been made in recent years on developing new algorithms and tools for analyzing and visualizing large phylogenetic data and trees, implementation of these resources is often carried out by bioinformatics experts, using one-off scripts. Therefore, a gap exists in providing data management support for a large set of non-technical users. The TOLKIN project (Tree of Life Knowledge and Information Network) addresses this need by supporting capabilities to manage, integrate, and provide public access to molecular, morphological, and biocollections data and research outcomes through a collaborative, web application. This data management framework allows aggregation and import of sequences, underlying documentation about their source, including vouchers, tissues, and DNA extraction. It combines features of LIMS and workflow environments by supporting management at the level of individual observations, sequences, and specimens, as well as assembly and versioning of data sets used in phylogenetic inference. As a web application, the system provides multi-user support that obviates current practices of sharing data sets as files or spreadsheets via email

CiteSeerX

Public Library of Science (PLOS)

Crossref

PubMed Central

BBQ: A Visual Interface for Integrated Browsing and Querying of XML

Author: B Ludaescher
G Mecca
Y Papakonstantinou
Publication venue
Publication date: 01/01/2000
Field of study

In this paper we present BBQ (Blended Browsing and Querying), a graphic user interface for seamlessly browsing and querying XML data sources. BBQ displays the structure of multiple data sources using a paradigm that resembles drilling-down in Windows' directory structures. BBQ allows queries incorporating one or more of the sources. Queries are constructed in a query-by-example (QBE) manner, where DTDs play the role of schema. The queries are arbitrary conjunctive queries with GROUPBY, and their results can be subsequently used and refined. To support query refinement, BBQ introduces virtual result views: standalone virtual data sources that (i) are constructed by user queries, from elements in other data sources, and (ii) can be used in subsequent queries as first-class data sources themselves. Furthermore, BBQ allows users to query data sources with loose or incomplete schema, and can augment such schema with a DTD inference mechanism

CiteSeerX

Crossref

The Center for Plasma Edge Simulation Workflow Requirements

Author: B. Ludaescher
M. Parashar
S.A. Klasky
Publication venue: Institute of Electrical and Electronics Engineers (IEEE)
Publication date
Field of study

Crossref

Recommended from our members

Distributed Data Integration Infrastructure

Author: Critchlow T.
Ludaescher B.
Pu C.
Vouk M.
Publication venue: Lawrence Livermore National Laboratory
Publication date: 24/02/2003
Field of study

The Internet is becoming the preferred method for disseminating scientific data from a variety of disciplines. This can result in information overload on the part of the scientists, who are unable to query all of the relevant sources, even if they knew where to find them, what they contained, how to interact with them, and how to interpret the results. A related issue is keeping up with current trends in information technology often taxes the end-user's expertise and time. Thus instead of benefiting from this information rich environment, scientists become experts on a small number of sources and technologies, use them almost exclusively, and develop a resistance to innovations that can enhance their productivity. Enabling information based scientific advances, in domains such as functional genomics, requires fully utilizing all available information and the latest technologies. In order to address this problem we are developing a end-user centric, domain-sensitive workflow-based infrastructure, shown in Figure 1, that will allow scientists to design complex scientific workflows that reflect the data manipulation required to perform their research without an undue burden. We are taking a three-tiered approach to designing this infrastructure utilizing (1) abstract workflow definition, construction, and automatic deployment, (2) complex agent-based workflow execution and (3) automatic wrapper generation. In order to construct a workflow, the scientist defines an abstract workflow (AWF) in terminology (semantics and context) that is familiar to him/her. This AWF includes all of the data transformations, selections, and analyses required by the scientist, but does not necessarily specify particular data sources. This abstract workflow is then compiled into an executable workflow (EWF, in our case XPDL) that is then evaluated and executed by the workflow engine. This EWF contains references to specific data source and interfaces capable of performing the desired actions. In order to provide access to the largest number of resources possible, our lowest level utilizes automatic wrapper generation techniques to create information and data wrappers capable of interacting with the complex interfaces typical in scientific analysis. The remainder of this document outlines our work in these three areas, the impact our work has made, and our plans for the future

UNT (University of North Texas) Digital Library

A Web service composition and deployment framework for scientific workflows

Author: A. Memon
B. Ludaescher
E. Jaeger
I. Altintas
Kai Lin
Publication venue: Institute of Electrical and Electronics Engineers (IEEE)
Publication date
Field of study

Crossref

D-PROV: extending the PROV provenance model with workflow structure

Author: Belhajjame K
Cuevas-Vicenttin V
Dey S
Ludaescher B
Missier P
Publication venue: Newcastle University
Publication date
Field of study

Newcastle University E-Prints

Heterogeneous Workflows in Scientific Workflow Systems

Author: A. Rowe
AlSairafi
B. Ludaescher
E. Deelman
I. Taylor
Richards
W.M.P. Aalst van der
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2007
Field of study

Workflow systems are used to model a range of scientific and business applications, each requiring a different set of capabilities. We analyze how these heterogeneous approaches can be resolved, look at how existing workflow systems address this and present the solution in Discovery Net, which combines three levels of workflows, control, data and grid, at different levels of abstraction. © Springer-Verlag Berlin Heidelberg 2007

Crossref

Hong Kong University of Science and Technology Institutional Repository