903 research outputs found
A two-stage framework for designing visual analytics systems to augment organizational analytical processes
A perennially interesting research topic in the field of visual analytics is how to effectively develop systems that support organizational knowledge workerâs decision-making and reasoning processes. The primary objective of a visual analytic system is to facilitate analytical reasoning and discovery of insights through interactive visual interfaces. It also enables the transfer of capability and expertise from where it resides to where it is neededâacross individuals, and organizations as necessary.
The problem is, however, most domain analytical practices generally vary from organizations to organizations. This leads to the diversified design of visual analytics systems in incorporating domain analytical processes, making it difficult to generalize the success from one domain to another. Exacerbating this problem is the dearth of general models of analytical workflows available to enable such timely and effective designs.
To alleviate these problems, this dissertation presents a two-stage framework for informing the design of a visual analytics system. This two-stage design framework builds upon and extends current practices pertaining to analytical workflow and focuses, in particular, on investigating its effect on the design of visual analytics systems for organizational environments. It aims to empower organizations with more systematic and purposeful information analyses through modeling the domain usersâ reasoning processes.
The first stage in this framework is an Observation and Designing stage,
in which a visual analytic system is designed and implemented to abstract and encapsulate general organizational analytical processes, through extensive collaboration with domain users. The second stage is the User-centric Refinement stage, which aims at interactively enriching and refining the already encapsulated domain analysis process based on understanding userâs intentions through analyzing their task behavior. To implement this framework in the process of designing a visual analytics system, this dissertation proposes four general design recommendations that, when followed, empower such systems to bring the users closer to the center of their analytical processes.
This dissertation makes three primary contributions: first, it presents a general characterization of the analytical workflow in organizational environments. This characterization fills in the blank of the current lack of such an analytical model and further represents a set of domain analytical tasks that are commonly applicable to various organizations. Secondly, this dissertation describes a two-stage framework for facilitating the domain usersâ workflows through integrating their analytical models
into interactive visual analytics systems. Finally, this dissertation presents recommendations and suggestions on enriching and refining domain analysis through capturing and analyzing knowledge workersâ analysis processes.
To exemplify the generalizability of these design recommendations, this dissertation presents three visual analytics systems that are developed following the proposed recommendations, including Taste for Xerox Corporation, OpsVis for Microsoft, and IRSV for the U.S. Department of Transportation. All of these systems are deployed to domain knowledge workers and are adopted for their analytical practices. Extensive empirical evaluations are further conducted to demonstrate efficacy of these systems in facilitating domain analytical processes
Data integration and FAIR data management in Solid Earth Science
Integrated use of multidisciplinary data is nowadays a recognized trend in scientific research, in particular in the domain of solid Earth science where the understanding of a physical process is improved and made complete by different types of measurements â for instance, ground acceleration, SAR imaging, crustal deformation â describing a physical phenomenon. FAIR principles are recognized as a means to foster data integration by providing a common set of criteria for building data stewardship systems for Open Science. However, the implementation of FAIR principles raises issues along dimensions like governance and legal beyond, of course, the technical one. In the latter, in particular, the development of FAIR data provision systems is often delegated to Research Infrastructures or data providers, with support in terms of metrics and best practices offered by cluster projects or dedicated initiatives. In the current work, we describe the approach to FAIR data management in the European Plate Observing System (EPOS), a distributed research infrastructure in the solid Earth science domain that includes more than 250 individual research infrastructures across 25 countries in Europe. We focus in particular on the technical aspects, but including also governance, policies and organizational elements, by describing the architecture of the EPOS delivery framework both from the organizational and technical point of view and by outlining the key principles used in the technical design. We describe how a combination of approaches, namely rich metadata and service-based systems design, are required to achieve data integration. We show the system architecture and the basic features of the EPOS data portal, that integrates data from more than 220 services in a FAIR way. The construction of such a portal was driven by the EPOS FAIR data management approach, that by defining a clear roadmap for compliance with the FAIR principles, produced a number of best practices and technical approaches for complying with the FAIR principles.
Such a work, that spans over a decade but concentrates the key efforts in the last 5 years with the EPOS Implementation Phase project and the establishment of EPOS-ERIC, was carried out in synergy with other EU initiatives dealing with FAIR data. On the basis of the EPOS experience, future directions are outlined, emphasizing the need to provide i) FAIR reference architectures that can ease data practitioners and engineers from the domain communities to adopt FAIR principles and build FAIR data systems; ii) a FAIR data management framework addressing FAIR through the entire data lifecycle, including reproducibility and provenance; and iii) the extension of the FAIR principles to policies and governance dimensions.publishedVersio
Recommended from our members
Physical Plan Instrumentation in Databases: Mechanisms and Applications
Database management systems (DBMSs) are designed with the goal set to compile SQL queries to physical plans that, when executed, provide results to the SQL queries. Building on this functionality, an ever-increasing number of application domains (e.g., provenance management, online query optimization, physical database design, interactive data profiling, monitoring, and interactive data visualization) seek to operate on how queries are executed by the DBMS for a wide variety of purposes ranging from debugging and data explanation to optimization and monitoring. Unfortunately, DBMSs provide little, if any, support to facilitate the development of this class of important application domains. The effect is such that database application developers and database system architects either rewrite the database internals in ad-hoc ways; work around the SQL interface, if possible, with inevitable performance penalties; or even build new databases from scratch only to express and optimize their domain-specific application logic over how queries are executed.
To address this problem in a principled manner in this dissertation, we introduce a prototype DBMS, namely, Smoke, that exposes instrumentation mechanisms in the form of a framework to allow external applications to manipulate physical plans. Intuitively, a physical plan is the underlying representation that DBMSs use to encode how a SQL query will be executed, and providing instrumentation mechanisms at this representation level allows applications to express and optimize their logic on how queries are executed.
Having such an instrumentation-enabled DBMS in-place, we then consider how to express and optimize applications that rely their logic on how queries are executed. To best demonstrate the expressive and optimization power of instrumentation-enabled DBMSs, we express and optimize applications across several important domains including provenance management, interactive data visualization, interactive data profiling, physical database design, online query optimization, and query discovery. Expressivity-wise, we show that Smoke can express known techniques, introduce novel semantics on known techniques, and introduce new techniques across domains. Performance-wise, we show case-by-case that Smoke is on par with or up-to several orders of magnitudes faster than state-of-the-art imperative and declarative implementations of important applications across domains.
As such, we believe our contributions provide evidence and form the basis towards a class of instrumentation-enabled DBMSs with the goal set to express and optimize applications across important domains with core logic over how queries are executed by DBMSs
PerCon: A Personal Digital Library for Heterogeneous Data Management and Analysis
Systems are needed to support access to and analysis of larger and more heterogeneous scientific datasets. Users need support in the location, organization, analysis, and interpretation of data to support their current activities with appropriate services and tools. We developed PerCon, a data management and analysis environment, to support such use.
PerCon processes and integrates data gathered via queries to existing data providers to create a personal or a small group digital library of data. Users may then search, browse, visualize, annotate, and organize the data as they proceed with analysis and interpretation. Analysis and interpretation in PerCon takes place in a visual workspace in which multiple data visualizations and annotations are placed into spatial arrangements based on the current task. The system watches for patterns in the userâs data selection, exploration, and organization, then through mixed-initiative interaction assists users by suggesting potentially relevant data from unexplored data sources. In order to identify relevant data, PerCon builds up various precomputed feature tables of data objects including their metadata (e.g. similarities, distances) and a user interest model to infer the user interest or specific information need. In particular, probabilistic networks in PerCon model user interactions (i.e. event features) and predict the data type of greatest interest through network training. In turn, the most relevant data objects of interest in the inferred data type are identified through a weighted feature computation then recommended to the user.
PerConâs data location and analysis capabilities were evaluated in a controlled study with 24 users. The study participants were asked to locate and analyze heterogeneous weather and river data with and without the visual workspace and mixed-initiative interaction, respectively. Results indicate that the visual workspace facilitated information representation and aided in the identification of relationships between datasets. The systemâs suggestions encouraged data exploration, leading participants to identify more evidences of correlation among data streams and more potential interactions among weather and river data
Active provenance for data intensive research
The role of provenance information in data-intensive research is a significant topic of
discussion among technical experts and scientists. Typical use cases addressing traceability,
versioning and reproducibility of the research findings are extended with more
interactive scenarios in support, for instance, of computational steering and results
management. In this thesis we investigate the impact that lineage records can have on
the early phases of the analysis, for instance performed through near-real-time systems
and Virtual Research Environments (VREs) tailored to the requirements of a specific
community. By positioning provenance at the centre of the computational research
cycle, we highlight the importance of having mechanisms at the data-scientistsâ side
that, by integrating with the abstractions offered by the processing technologies, such
as scientific workflows and data-intensive tools, facilitate the expertsâ contribution to
the lineage at runtime. Ultimately, by encouraging tuning and use of provenance for
rapid feedback, the thesis aims at improving the synergy between different user groups
to increase productivity and understanding of their processes.
We present a model of provenance, called S-PROV, that uses and further extends
PROV and ProvONE. The relationships and properties characterising the workflowâs
abstractions and their concrete executions are re-elaborated to include aspects related
to delegation, distribution and steering of stateful streaming operators. The model is
supported by the Active framework for tuneable and actionable lineage ensuring the
userâs engagement by fostering rapid exploitation. Here, concepts such as provenance
types, configuration and explicit state management allow users to capture complex
provenance scenarios and activate selective controls based on domain and user-defined
metadata. We outline how the traces are recorded in a new comprehensive system,
called S-ProvFlow, enabling different classes of consumers to explore the provenance
data with services and tools for monitoring, in-depth validation and comprehensive
visual-analytics. The work of this thesis will be discussed in the context of an existing
computational framework and the experience matured in implementing provenance-aware
tools for seismology and climate VREs. It will continue to evolve through
newly funded projects, thereby providing generic and user-centred solutions for data-intensive
research
Making Social Dynamics and Content Evolution Transparent in Collaboratively Written Text
This dissertation presents models and algorithms for accurately and efficiently extracting data from revisioned content in Collaborative Writing Systems about (i) the provenance and history of specific sequences of text, as well as (ii) interactions between editors via the content changes they perform, especially disagreement. Visualization tools are presented to gain further insights into the extracted data. Collaboration mechanisms to be researched with these new data and tools are discussed
Linked Data Supported Information Retrieval
Um Inhalte im World Wide Web ausfindig zu machen, sind Suchmaschienen nicht mehr wegzudenken. Semantic Web und Linked Data Technologien ermöglichen ein detaillierteres und eindeutiges Strukturieren der Inhalte und erlauben vollkommen neue Herangehensweisen an die Lösung von Information Retrieval Problemen. Diese Arbeit befasst sich mit den Möglichkeiten, wie Information Retrieval Anwendungen von der Einbeziehung von Linked Data profitieren können. Neue Methoden der computer-gestĂŒtzten semantischen Textanalyse, semantischen Suche, Informationspriorisierung und -visualisierung werden vorgestellt und umfassend evaluiert. Dabei werden Linked Data Ressourcen und ihre Beziehungen in die Verfahren integriert, um eine Steigerung der EffektivitĂ€t der Verfahren bzw. ihrer Benutzerfreundlichkeit zu erzielen. ZunĂ€chst wird eine EinfĂŒhrung in die Grundlagen des Information Retrieval und Linked Data gegeben. AnschlieĂend werden neue manuelle und automatisierte Verfahren zum semantischen Annotieren von Dokumenten durch deren VerknĂŒpfung mit Linked Data Ressourcen vorgestellt (Entity Linking). Eine umfassende Evaluation der Verfahren wird durchgefĂŒhrt und das zu Grunde liegende Evaluationssystem umfangreich verbessert. Aufbauend auf den Annotationsverfahren werden zwei neue Retrievalmodelle zur semantischen Suche vorgestellt und evaluiert. Die Verfahren basieren auf dem generalisierten Vektorraummodell und beziehen die semantische Ăhnlichkeit anhand von taxonomie-basierten Beziehungen der Linked Data Ressourcen in Dokumenten und Suchanfragen in die Berechnung der Suchergebnisrangfolge ein. Mit dem Ziel die Berechnung von semantischer Ăhnlichkeit weiter zu verfeinern, wird ein Verfahren zur Priorisierung von Linked Data Ressourcen vorgestellt und evaluiert. Darauf aufbauend werden Visualisierungstechniken aufgezeigt mit dem Ziel, die Explorierbarkeit und Navigierbarkeit innerhalb eines semantisch annotierten Dokumentenkorpus zu verbessern. HierfĂŒr werden zwei Anwendungen prĂ€sentiert. Zum einen eine Linked Data basierte explorative Erweiterung als ErgĂ€nzung zu einer traditionellen schlĂŒsselwort-basierten Suchmaschine, zum anderen ein Linked Data basiertes Empfehlungssystem
- âŠ