548 research outputs found
When Provenance Aids and Complicates Reproducibility Judgments
It is well-established that the provenance of a scientific result is
important, sometimes more important than the actual result. For computational
analyses that involve visualization, this provenance information may contain
the steps involved in generating visualizations from raw data. Specifically,
data provenance tracks the lineage of data and process provenance tracks the
steps executed. In this paper, we argue that the utility of computational
provenance may not be as clear-cut as we might like. One common use case for
provenance is that the information can be used to reproduce the original
result. However, in visualization, the goal is often to communicate results to
a user or viewer, and thus the insights obtained are ultimately most important.
Viewers can miss important changes or react to unimportant ones. Here,
interaction provenance, which tracks a user's actions with a visualization, or
insight provenance, which tracks the decision-making process, can help capture
what happened but don't remove the issues. In this paper, we present scenarios
where provenance impacts reproducibility in different ways. We also explore how
provenance and visualizations can be better related
Doctor of Philosophy
dissertationServing as a record of what happened during a scientific process, often computational, provenance has become an important piece of computing. The importance of archiving not only data and results but also the lineage of these entities has led to a variety of systems that capture provenance as well as models and schemas for this information. Despite significant work focused on obtaining and modeling provenance, there has been little work on managing and using this information. Using the provenance from past work, it is possible to mine common computational structure or determine differences between executions. Such information can be used to suggest possible completions for partial workflows, summarize a set of approaches, or extend past work in new directions. These applications require infrastructure to support efficient queries and accessible reuse. In order to support knowledge discovery and reuse from provenance information, the management of those data is important. One component of provenance is the specification of the computations; workflows provide structured abstractions of code and are commonly used for complex tasks. Using change-based provenance, it is possible to store large numbers of similar workflows compactly. This storage also allows efficient computation of differences between specifications. However, querying for specific structure across a large collection of workflows is difficult because comparing graphs depends on computing subgraph isomorphism which is NP-Complete. Graph indexing methods identify features that help distinguish graphs of a collection to filter results for a subgraph containment query and reduce the number of subgraph isomorphism computations. For provenance, this work extends these methods to work for more exploratory queries and collections with significant overlap. However, comparing workflow or provenance graphs may not require exact equality; a match between two graphs may allow paired nodes to be similar yet not equivalent. This work presents techniques to better correlate graphs to help summarize collections. Using this infrastructure, provenance can be reused so that users can learn from their own and others' history. Just as textual search has been augmented with suggested completions based on past or common queries, provenance can be used to suggest how computations can be completed or which steps might connect to a given subworkflow. In addition, provenance can help further science by accelerating publication and reuse. By incorporating provenance into publications, authors can more easily integrate their results, and readers can more easily verify and repeat results. However, reusing past computations requires maintaining stronger associations with any input data and underlying code as well as providing paths for migrating old work to new hardware or algorithms. This work presents a framework for maintaining data and code as well as supporting upgrades for workflow computations
Provenance for computational tasks: a survey
Journal ArticleThe problem of systematically capturing and managing provenance for computational tasks has recently received significant attention because of its relevance to a wide range of domains and applications. The authors give an overview of important concepts related to provenance management, so that potential users can make informed decisions when selecting or designing a provenance solution
VisComplete: automating suggestions for visualization pipelines
Journal ArticleBuilding visualization and analysis pipelines is a large hurdle in the adoption of visualization and workflow systems by domain scientists. In this paper, we propose techniques to help users construct pipelines by consensus-automatically suggesting completions based on a database of previously created pipelines. In particular, we compute correspondences between existing pipeline subgraphs from the database, and use these to predict sets of likely pipeline additions to a given partial pipeline. By presenting these predictions in a carefully designed interface, users can create visualizations and other data products more efficiently because they can augment their normal work patterns with the suggested completions. We present an implementation of our technique in a publicly-available, open-source scientific workflow system and demonstrate efficiency gains in real-world situations
Functions that are the Directed X-Ray of a Planar Convex Body
We characterize functions that are the directed X-ray
of a planar convex body from a source that is a positive distance
from the body. In addition to a concavity condition the necessary
and sufficient conditions involve the structure of points of zero
curvature and a priori estimates for derivatives of the directed
X-ray near supporting rays and points of zero curvature. The
techniques employed also lead to explicit methods for constructing
families of planar convex bodies with a common directed X-ray
Toward Systematic Design Considerations of Organizing Multiple Views
Multiple-view visualization (MV) has been used for visual analytics in
various fields (e.g., bioinformatics, cybersecurity, and intelligence
analysis). Because each view encodes data from a particular perspective,
analysts often use a set of views laid out in 2D space to link and synthesize
information. The difficulty of this process is impacted by the spatial
organization of these views. For instance, connecting information from views
far from each other can be more challenging than neighboring ones. However,
most visual analysis tools currently either fix the positions of the views or
completely delegate this organization of views to users (who must manually drag
and move views). This either limits user involvement in managing the layout of
MV or is overly flexible without much guidance. Then, a key design challenge in
MV layout is determining the factors in a spatial organization that impact
understanding. To address this, we review a set of MV-based systems and
identify considerations for MV layout rooted in two key concerns: perception,
which considers how users perceive view relationships, and content, which
considers the relationships in the data. We show how these allow us to study
and analyze the design of MV layout systematically.Comment: Short paper with 4 pages + 1 reference page, 2 figures, 1 table,
accepted at IEEE VIS 2022 conferenc
- …