Search CORE

852 research outputs found

Fine-grained provenance of users' interpretations in a collaborative visualization architecture

Author: Al-Naser Aqeel
Brooke John
Irving Duncan
Rasheed Masroor
Publication venue: Science and Technology Publications Lda
Publication date: 01/01/2014
Field of study

The University of Manchester - Institutional Repository

Managing uncertainty in integrated environmental modelling:the UncertWeb framework

Author: Bastin Lucy
Cornford Dan
Heuvelink Gerard B.M.
Jones Richard
Mazzetti Paolo
Nativi Stefano
Pebesma Edzer
Stasch Christoph
Williams Matthew
Publication venue: 'Elsevier BV'
Publication date: 12/05/2011
Field of study

Web-based distributed modelling architectures are gaining increasing recognition as potentially useful tools to build holistic environmental models, combining individual components in complex workflows. However, existing web-based modelling frameworks currently offer no support for managing uncertainty. On the other hand, the rich array of modelling frameworks and simulation tools which support uncertainty propagation in complex and chained models typically lack the benefits of web based solutions such as ready publication, discoverability and easy access. In this article we describe the developments within the UncertWeb project which are designed to provide uncertainty support in the context of the proposed ‘Model Web’. We give an overview of uncertainty in modelling, review uncertainty management in existing modelling frameworks and consider the semantic and interoperability issues raised by integrated modelling. We describe the scope and architecture required to support uncertainty management as developed in UncertWeb. This includes tools which support elicitation, aggregation/disaggregation, visualisation and uncertainty/sensitivity analysis. We conclude by highlighting areas that require further research and development in UncertWeb, such as model calibration and inference within complex environmental models

JRC Publications Repository

Aston Publications Explorer

Wageningen University & Research Publications

Computing environments for reproducibility: Capturing the 'Whole Tale'

Author: Brinckman Adam
Chard Kyle
Gaffney Niall
Hategan Mihael
Jones Matthew B.
Kowalik Kacper
Kulasekaran Sivakumar
Ludäscher Bertram
Mecum Bryce D.
Nabrzyski Jarek
Stodden Victoria
Taylor Ian J.
Turk Matthew J.
Turner Kandace
Publication venue: 'Elsevier BV'
Publication date: 01/02/2018
Field of study

The act of sharing scientific knowledge is rapidly evolving away from traditional articles and presentations to the delivery of executable objects that integrate the data and computational details (e.g., scripts and workflows) upon which the findings rely. This envisioned coupling of data and process is essential to advancing science but faces technical and institutional barriers. The Whole Tale project aims to address these barriers by connecting computational, data-intensive research efforts with the larger research process—transforming the knowledge discovery and dissemination process into one where data products are united with research articles to create “living publications” or tales. The Whole Tale focuses on the full spectrum of science, empowering users in the long tail of science, and power users with demands for access to big data and compute resources. We report here on the design, architecture, and implementation of the Whole Tale environment

arXiv.org e-Print Archive

Online Research @ Cardiff

Polyflow: a Polystore-compliant mechanism to provide interoperability to heterogeneous provenance graphs

Author: Ferreira Yan Mendes
Publication venue: 'Revista Educacao em Foco (UFJF)'
Publication date: 13/11/2020
Field of study

Many scientific experiments are modeled as workflows. Workflows usually output massive amounts of data. To guarantee the reproducibility of workflows, they are usually orchestrated by Workflow Management Systems (WfMS), that capture provenance data. Provenance represents the lineage of a data fragment throughout its transformations by activities in a workflow. Provenance traces are usually represented as graphs. These graphs allows scientists to analyze and evaluate results produced by a workflow. However, each WfMS has a proprietary format for provenance and do it in different granularity levels. Therefore, in more complex scenarios in which the scientist needs to interpret provenance graphs generated by multiple WfMSs and workflows, a challenge arises. To first understand the research landscape, we conduct a Systematic Literature Mapping, assessing existing solutions under several different lenses. With a clearer understanding of the state of the art, we propose a tool called Polyflow, which is based on the concept of Polystore systems, integrating several databases of heterogeneous origin by adopting a global ProvONE schema. Polyflow allows scientists to query multiple provenance graphs in an integrated way. Polyflow was evaluated by experts using provenance data collected from real experiments that generate phylogenetic trees through workflows. The experiment results suggest that Polyflow is a viable solution for interoperating heterogeneous provenance data generated by different WfMSs, from both a usability and performance standpoint.Muitos experimentos científicos são modelados como workflows (fluxos de trabalho). Workflows produzem comumente um grande volume de dados. De forma a garantir a reprodutibilidade desses workflows, estes geralmente são orquestrados por Sistemas de Gerência de Workflows (SGWfs), garantindo que dados de proveniência sejam capturados. Dados de proveniência representam o histórico de derivação de um dado ao longo da execução do workflow. Assim, o histórico de derivação dos dados pode ser representado por meio de um grafo de proveniência. Este grafo possibilita aos cientistas analisarem e avaliarem resultados produzidos por um workflow. Todavia, cada SGWf tem seu formato proprietário de representação para dados de proveniência, e os armazenam em diferentes granularidades. Consequentemente, em cenários mais complexos em que um cientista precisa analisar de forma integrada grafos de proveniência gerados por múltiplos workflows, isso se torna desafiador. Primeiramente, para entender o campo de pesquisa, realizamos um Mapeamento Sistemático da Literatura, avaliando soluções existentes sob diferentes lentes. Com uma compreensão mais clara do atual estado da arte, propomos uma ferramenta chamada Polyflow, inspirada em conceitos de sistemas Polystore, possibilitando a integração de várias bases de dados heterogêneas por meio de uma interface de consulta única que utiliza o ProvONE como schema global. Polyflow permite que cientistas submetam consultas em múltiplos grafos de proveniência de maneira integrada. Polyflow foi avaliado em conjunto com especialistas usando dados de proveniência coletados de workflows reais que apoiam o estudo de geração de árvores filogenéticas. O resultado da avaliação mostrou a viabilidade do Polyflow para interoperar semanticamente dados de proveniência gerado por distintos SGWfs, tanto do ponto de vista de desempenho quanto de usabilidade

Repositório Institucional - UFJF

Adaptive Big Data Pipeline

Author: Orozco-GómezSerrano Aldo
Publication venue: 'ITESO, A.C.'
Publication date: 01/09/2020
Field of study

Over the past three decades, data has exponentially evolved from being a simple software by-product to one of the most important companies’ assets used to understand their customers and foresee trends. Deep learning has demonstrated that big volumes of clean data generally provide more flexibility and accuracy when modeling a phenomenon. However, handling ever-increasing data volumes entail new challenges: the lack of expertise to select the appropriate big data tools for the processing pipelines, as well as the speed at which engineers can take such pipelines into production reliably, leveraging the cloud. We introduce a system called Adaptive Big Data Pipelines: a platform to automate data pipelines creation. It provides an interface to capture the data sources, transformations, destinations and execution schedule. The system builds up the cloud infrastructure, schedules and fine-tunes the transformations, and creates the data lineage graph. This system has been tested on data sets of 50 gigabytes, processing them in just a few minutes without user intervention.ITESO, A. C

Repositorio Institucional del ITESO

TOWARDS HARNESSING COMPUTATIONAL WORKFLOW PROVENANCE FOR EXPERIMENT REPORTING

Author: Alper Pinar
Publication venue
Publication date: 01/08/2016
Field of study

The University of Manchester - Institutional Repository

Big Data Analytics in Static and Streaming Provenance

Author: Chen Peng
Publication venue: [Bloomington, Ind.] : Indiana University
Publication date: 01/04/2016
Field of study

Thesis (Ph.D.) - Indiana University, Informatics and Computing,, 2016With recent technological and computational advances, scientists increasingly integrate sensors and model simulations to understand spatial, temporal, social, and ecological relationships at unprecedented scale. Data provenance traces relationships of entities over time, thus providing a unique view on over-time behavior under study. However, provenance can be overwhelming in both volume and complexity; the now forecasting potential of provenance creates additional demands. This dissertation focuses on Big Data analytics of static and streaming provenance. It develops filters and a non-preprocessing slicing technique for in-situ querying of static provenance. It presents a stream processing framework for online processing of provenance data at high receiving rate. While the former is sufficient for answering queries that are given prior to the application start (forward queries), the latter deals with queries whose targets are unknown beforehand (backward queries). Finally, it explores data mining on large collections of provenance and proposes a temporal representation of provenance that can reduce the high dimensionality while effectively supporting mining tasks like clustering, classification and association rules mining; and the temporal representation can be further applied to streaming provenance as well. The proposed techniques are verified through software prototypes applied to Big Data provenance captured from computer network data, weather models, ocean models, remote (satellite) imagery data, and agent-based simulations of agricultural decision making

IUScholarWorks (University of Indiana)

Evaluation of Storage Systems for Big Data Analytics

Author
Publication venue
Publication date: 01/01/2017
Field of study

abstract: Recent trends in big data storage systems show a shift from disk centric models to memory centric models. The primary challenges faced by these systems are speed, scalability, and fault tolerance. It is interesting to investigate the performance of these two models with respect to some big data applications. This thesis studies the performance of Ceph (a disk centric model) and Alluxio (a memory centric model) and evaluates whether a hybrid model provides any performance benefits with respect to big data applications. To this end, an application TechTalk is created that uses Ceph to store data and Alluxio to perform data analytics. The functionalities of the application include offline lecture storage, live recording of classes, content analysis and reference generation. The knowledge base of videos is constructed by analyzing the offline data using machine learning techniques. This training dataset provides knowledge to construct the index of an online stream. The indexed metadata enables the students to search, view and access the relevant content. The performance of the application is benchmarked in different use cases to demonstrate the benefits of the hybrid model.Dissertation/ThesisMasters Thesis Computer Science 201

ASU Digital Repository