Search CORE

8 research outputs found

Recommended from our members

Local clustering in provenance graphs

Author: Macko Peter
Margo Daniel Wyatt
Seltzer Margo I.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 20/09/2017
Field of study

Systems that capture and store data provenance, the record of how an object has arrived at its current state, accumulate historical metadata over time, forming a large graph. Local clustering in these graphs, in which we start with a seed vertex and grow a cluster around it, is of paramount importance because it supports critical provenance applications such as identifying semantically meaningful tasks in an object's history. However, generic graph clustering algorithms are not effective at these tasks. We identify three key properties of provenance graphs and exploit them to justify two new centrality metrics we developed for use in performing local clustering on provenance graphs.Engineering and Applied Science

Harvard University - DASH

Sciunits: Reusable Research Objects

Author: Fils Gabriel
Malik Tanu
That Dai Hai Ton
Yuan Zhihao
Publication venue
Publication date: 11/09/2017
Field of study

Science is conducted collaboratively, often requiring knowledge sharing about computational experiments. When experiments include only datasets, they can be shared using Uniform Resource Identifiers (URIs) or Digital Object Identifiers (DOIs). An experiment, however, seldom includes only datasets, but more often includes software, its past execution, provenance, and associated documentation. The Research Object has recently emerged as a comprehensive and systematic method for aggregation and identification of diverse elements of computational experiments. While a necessary method, mere aggregation is not sufficient for the sharing of computational experiments. Other users must be able to easily recompute on these shared research objects. In this paper, we present the sciunit, a reusable research object in which aggregated content is recomputable. We describe a Git-like client that efficiently creates, stores, and repeats sciunits. We show through analysis that sciunits repeat computational experiments with minimal storage and processing overhead. Finally, we provide an overview of sharing and reproducible cyberinfrastructure based on sciunits gaining adoption in the domain of geosciences

arXiv.org e-Print Archive

Crossref

NeuroProv: Provenance data visualisation for neuroimaging analyses

Author: Bechhofer
Belhajjame
Bose
Cheung
Cohen
Corbett
Deelman
Deelman
Del Rio
Dey
Dolgert
El-Jaick
Fleisher
Freire
Gil
Goble
Kehoe
Kunde
Kwok
MacKenzie-Graham
Macko
Macko
McClatchey
McPhillips
Miksa
Moreau
Mueller
Munir
Oinn
Quan
Rusinek
Tian
Ton That
Yuan
Zhao
Publication venue: 'Elsevier BV'
Publication date: 01/06/2019
Field of study

© 2019 Elsevier Ltd Visualisation underpins the understanding of scientific data both through exploration and explanation of analysed data. Provenance strengthens the understanding of data by showing the process of how a result has been achieved. With the significant increase in data volumes and algorithm complexity, clinical researchers are struggling with information tracking, analysis reproducibility and the verification of scientific output. In addition, data coming from various heterogeneous sources with varying levels of trust in a collaborative environment adds to the uncertainty of the scientific outputs. This provides the motivation for provenance data capture and visualisation support for analyses. In this paper a system, NeuroProv is presented, to visualise provenance data in order to aid in the process of verification of scientific outputs, comparison of analyses, progression and evolution of results for neuroimaging analyses. The experimental results show the effectiveness of visualising provenance data for neuroimaging analyses

Crossref

UWE Bristol Research Repository

Towards Specificationless Monitoring of Provenance-Emitting Systems

Author: Stoffers Martin
Weinert Alexander
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2022
Field of study

Monitoring often requires insight into the monitored system as well as concrete specifications of expected behavior. More and more systems, however, provide information about their inner procedures by emitting provenance information in a W3C-standardized graph format. In this work, we present an approach to monitor such provenance data for anomalous behavior by performing spectral graph analysis on slices of the constructed provenance graph and by comparing the characteristics of each slice with those of a sliding window over recently seen slices. We argue that this approach not only simplifies the monitoring of heterogeneous distributed systems, but also enables applying a host of well-studied techniques to monitor such systems

Institute of Transport Research:Publications

Utilizing Provenance in Reusable Research Objects

Author: Fils Gabriel
Kothari Siddhant
Malik Tanu
That Dai Hai Ton
Yuan Zhihao
Publication venue: 'MDPI AG'
Publication date: 01/03/2018
Field of study

Science is conducted collaboratively, often requiring the sharing of knowledge about computational experiments. When experiments include only datasets, they can be shared using Uniform Resource Identifiers (URIs) or Digital Object Identifiers (DOIs). An experiment, however, seldom includes only datasets, but more often includes software, its past execution, provenance, and associated documentation. The Research Object has recently emerged as a comprehensive and systematic method for aggregation and identification of diverse elements of computational experiments. While a necessary method, mere aggregation is not sufficient for the sharing of computational experiments. Other users must be able to easily recompute on these shared research objects. Computational provenance is often the key to enable such reuse. In this paper, we show how reusable research objects can utilize provenance to correctly repeat a previous reference execution, to construct a subset of a research object for partial reuse, and to reuse existing contents of a research object for modified reuse. We describe two methods to summarize provenance that aid in understanding the contents and past executions of a research object. The first method obtains a process-view by collapsing low-level system information, and the second method obtains a summary graph by grouping related nodes and edges with the goal to obtain a graph view similar to application workflow. Through detailed experiments, we show the efficacy and efficiency of our algorithms.Comment: 25 page

arXiv.org e-Print Archive

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

Enhancing Traceability in Clinical Research Data Through an Information Product Framework

Author: Hume Samuel
Publication venue: Beadle Scholar
Publication date: 23/06/2017
Field of study

Beadle Scholar at Dakota State University

Graph Analysis and Applications in Clustering and Content-based Image Retrieval

Author: Zhang Honglei
Publication venue: Tampere University
Publication date: 09/08/2019
Field of study

About 300 years ago, when studying Seven Bridges of Königsberg problem - a famous problem concerning paths on graphs - the great mathematician Leonhard Euler said, “This question is very banal, but seems to me worthy of attention”. Since then, graph theory and graph analysis have not only become one of the most important branches of mathematics, but have also found an enormous range of important applications in many other areas. A graph is a mathematical model that abstracts entities and the relationships between them as nodes and edges. Many types of interactions between the entities can be modeled by graphs, for example, social interactions between people, the communications between the entities in computer networks and relations between biological species. Although not appearing to be a graph, many other types of data can be converted into graphs by cer- tain operations, for example, the k-nearest neighborhood graph built from pixels in an image. Cluster structure is a common phenomenon in many real-world graphs, for example, social networks. Finding the clusters in a large graph is important to understand the underlying relationships between the nodes. Graph clustering is a technique that partitions nodes into clus- ters such that connections among nodes in a cluster are dense and connections between nodes in diﬀerent clusters are sparse. Various approaches have been proposed to solve graph clustering problems. A common approach is to optimize a predeﬁned clustering metric using diﬀerent optimization methods. However, most of these optimization problems are NP-hard due to the discrete set-up of the hard-clustering. These optimization problems can be relaxed, and a sub-optimal solu- tion can be found. A diﬀerent approach is to apply data clustering algorithms in solving graph clustering problems. With this approach, one must ﬁrst ﬁnd appropriate features for each node that represent the local structure of the graph. Limited Random Walk algorithm uses the random walk procedure to explore the graph and extracts ef- ﬁcient features for the nodes. It incorporates the embarrassing parallel paradigm, thus, it can process large graph data eﬃciently using mod- ern high-performance computing facilities. This thesis gives the details of this algorithm and analyzes the stability issues of the algorithm. Based on the study of the cluster structures in a graph, we deﬁne the authenticity score of an edge as the diﬀerence between the actual and the expected number of edges that connect the two groups of the neighboring nodes of the two end nodes. Authenticity score can be used in many important applications, such as graph clustering, outlier detection, and graph data preprocessing. In particular, a data clus- tering algorithm that uses the authenticity scores on mutual k-nearest neighborhood graph achieves more reliable and superior performance comparing to other popular algorithms. This thesis also theoretically proves that this algorithm can asymptotically ﬁnd the complete re- covery of the ground truth of the graphs that were generated by a stochastic r-block model. Content-based image retrieval (CBIR) is an important application in computer vision, media information retrieval, and data mining. Given a query image, a CBIR system ranks the images in a large image database by their “similarities” to the query image. However, because of the ambiguities of the deﬁnition of the “similarity”, it is very diﬃ- cult for a CBIR system to select the optimal feature set and ranking algorithm to satisfy the purpose of the query. Graph technologies have been used to improve the performance of CBIR systems in var- ious ways. In this thesis, a novel method is proposed to construct a visual-semantic graph—a graph where nodes represent semantic concepts and edges represent visual associations between concepts. The constructed visual-semantic graph not only helps the user to locate the target images quickly but also helps answer the questions related to the query image. Experiments show that the eﬀorts of locating the target image are reduced by 25% with the help of visual-semantic graphs. Graph analysis will continue to play an important role in future data analysis. In particular, the visual-semantic graph that captures important and interesting visual associations between the concepts is worthyof further attention

Trepo - Institutional Repository of Tampere University

Local clustering in provenance graphs

Author
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2013
Field of study

Crossref