Search CORE

746 research outputs found

BioWorkbench: A High-Performance Framework for Managing and Analyzing Bioinformatics Experiments

Author: Barbosa Helio J. C.
Foster Ian
Gadelha Jr Luiz M. R.
Katz Daniel S.
Loss Guilherme
Magalhães Thiago
Mattoso Marta
Mondelli Maria Luiza
Ocaña Kary
Vasconcelos Ana Tereza R.
Wilde Michael
Publication venue: 'PeerJ'
Publication date: 11/01/2018
Field of study

Advances in sequencing techniques have led to exponential growth in biological data, demanding the development of large-scale bioinformatics experiments. Because these experiments are computation- and data-intensive, they require high-performance computing (HPC) techniques and can benefit from specialized technologies such as Scientific Workflow Management Systems (SWfMS) and databases. In this work, we present BioWorkbench, a framework for managing and analyzing bioinformatics experiments. This framework automatically collects provenance data, including both performance data from workflow execution and data from the scientific domain of the workflow application. Provenance data can be analyzed through a web application that abstracts a set of queries to the provenance database, simplifying access to provenance information. We evaluate BioWorkbench using three case studies: SwiftPhylo, a phylogenetic tree assembly workflow; SwiftGECKO, a comparative genomics workflow; and RASflow, a RASopathy analysis workflow. We analyze each workflow from both computational and scientific domain perspectives, by using queries to a provenance and annotation database. Some of these queries are available as a pre-built feature of the BioWorkbench web application. Through the provenance data, we show that the framework is scalable and achieves high-performance, reducing up to 98% of the case studies execution time. We also show how the application of machine learning techniques can enrich the analysis process

arXiv.org e-Print Archive

Directory of Open Access Journals

Provenance-based trust for grid computing: Position Paper

Author: Chapman Syd
Cortes Ulises
Hempel Rolf
Moreau Luc
Rana Omer
Schreiber Andreas
Varga Laszlo
Willmott Steven
Publication venue: 'University of Southampton'
Publication date: 01/01/2004
Field of study

Current evolutions of Internet technology such as Web Services, ebXML, peer-to-peer and Grid computing all point to the development of large-scale open networks of diverse computing systems interacting with one another to perform tasks. Grid systems (and Web Services) are exemplary in this respect and are perhaps some of the first large-scale open computing systems to see widespread use - making them an important testing ground for problems in trust management which are likely to arise. From this perspective, today's grid architectures suffer from limitations, such as lack of a mechanism to trace results and lack of infrastructure to build up trust networks. These are important concerns in open grids, in which "community resources" are owned and managed by multiple stakeholders, and are dynamically organised in virtual organisations. Provenance enables users to trace how a particular result has been arrived at by identifying the individual services and the aggregation of services that produced such a particular output. Against this background, we present a research agenda to design, conceive and implement an industrial-strength open provenance architecture for grid systems. We motivate its use with three complex grid applications, namely aerospace engineering, organ transplant management and bioinformatics. Industrial-strength provenance support includes a scalable and secure architecture, an open proposal for standardising the protocols and data structures, a set of tools for configuring and using the provenance architecture, an open source reference implementation, and a deployment and validation in industrial context. The provision of such facilities will enrich grid capabilities by including new functionalities required for solving complex problems such as provenance data to provide complete audit trails of process execution and third-party analysis and auditing. As a result, we anticipate that a larger uptake of grid technology is likely to occur, since unprecedented possibilities will be offered to users and will give them a competitive edge

Institute of Transport Research:Publications

Southampton (e-Prints Soton)

The Research Object Suite of Ontologies: Sharing and Exchanging Research Data and Methods on the Open Web

Author: Bechhofer Sean
Belhajjame Khalid
Corcho Óscar
Garijo Daniel
Goble Carole
Gómez-Pérez José-Manuel
Hettne Kristina
Klyne Graham
Palma Raul
Zhao Jun
Publication venue
Publication date: 03/02/2014
Field of study

Research in life sciences is increasingly being conducted in a digital and online environment. In particular, life scientists have been pioneers in embracing new computational tools to conduct their investigations. To support the sharing of digital objects produced during such research investigations, we have witnessed in the last few years the emergence of specialized repositories, e.g., DataVerse and FigShare. Such repositories provide users with the means to share and publish datasets that were used or generated in research investigations. While these repositories have proven their usefulness, interpreting and reusing evidence for most research results is a challenging task. Additional contextual descriptions are needed to understand how those results were generated and/or the circumstances under which they were concluded. Because of this, scientists are calling for models that go beyond the publication of datasets to systematically capture the life cycle of scientific investigations and provide a single entry point to access the information about the hypothesis investigated, the datasets used, the experiments carried out, the results of the experiments, the people involved in the research, etc. In this paper we present the Research Object (RO) suite of ontologies, which provide a structured container to encapsulate research data and methods along with essential metadata descriptions. Research Objects are portable units that enable the sharing, preservation, interpretation and reuse of research investigation results. The ontologies we present have been designed in the light of requirements that we gathered from life scientists. They have been built upon existing popular vocabularies to facilitate interoperability. Furthermore, we have developed tools to support the creation and sharing of Research Objects, thereby promoting and facilitating their adoption.Comment: 20 page

arXiv.org e-Print Archive

The University of Manchester - Institutional Repository

Enhancing Workflow with a Semantic Description of Scientific Intent

Author: Edwards Peter
Gotts Nick
Pignotti Edoardo
Polhill Gary
Publication venue: 'Elsevier BV'
Publication date: 10/05/2011
Field of study

Peer reviewedPreprin

Aberdeen University Research

Data integration in myGrid with Taverna

Author: Duncan Hull
Publication venue
Publication date: 03/07/2007
Field of study

Many areas of life sciences research involve integrating terabytes of heterogeneous, 
distributed and autonomous data available on the Web. The myGrid project has addressed 
these challenging problems by developing and applying novel grid and semantic web 
services technology to life science data integration, particularly genome annotation and 
microarray analysis. This presentation outlines the past, present and future of the Taverna workflow system.&#xa

Crossref

Nature Precedings

EGI user forum 2011 : book of abstracts

Author
Publication venue
Publication date: 01/01/2011
Field of study

Hochschulschriftenserver - Universität Frankfurt am Main

Requirements for Provenance on the Web

Author: Cheney J
Gil Y
Groth P.T.
Miles S
Publication venue
Publication date: 01/01/2012
Field of study

From where did this tweet originate? Was this quote from the New York Times modified? Daily, we rely on data from the Web but often it is difficult or impossible to determine where it came from or how it was produced. This lack of provenance is particularly evident when people and systems deal with Web information or with any environment where information comes from sources of varying quality. Provenance is not captured pervasively in information systems. There are major technical, social, and economic impediments that stand in the way of using provenance effectively. This paper synthesizes requirements for provenance on the Web for a number of dimensions focusing on three key aspects of provenance: the content of provenance, the management of provenance records, and the uses of provenance information. To illustrate these requirements, we use three synthesized scenarios that encompass provenance problems faced by Web users toda

CiteSeerX

Crossref

VU Research Portal

Directory of Open Access Journals

Edinburgh Research Explorer

King's Research Portal

International Journal of Digital Curation