60 research outputs found
Clear Visual Separation of Temporal Event Sequences
Extracting and visualizing informative insights from temporal event sequences
becomes increasingly difficult when data volume and variety increase. Besides
dealing with high event type cardinality and many distinct sequences, it can be
difficult to tell whether it is appropriate to combine multiple events into one
or utilize additional information about event attributes. Existing approaches
often make use of frequent sequential patterns extracted from the dataset,
however, these patterns are limited in terms of interpretability and utility.
In addition, it is difficult to assess the role of absolute and relative time
when using pattern mining techniques.
In this paper, we present methods that addresses these challenges by
automatically learning composite events which enables better aggregation of
multiple event sequences. By leveraging event sequence outcomes, we present
appropriate linked visualizations that allow domain experts to identify
critical flows, to assess validity and to understand the role of time.
Furthermore, we explore information gain and visual complexity metrics to
identify the most relevant visual patterns. We compare composite event learning
with two approaches for extracting event patterns using real world company
event data from an ongoing project with the Danish Business Authority.Comment: In Proceedings of the 3rd IEEE Symposium on Visualization in Data
Science (VDS), 201
Outliagnostics: Visualizing Temporal Discrepancy in Outlying Signatures of Data Entries
This paper presents an approach to analyzing two-dimensional temporal
datasets focusing on identifying observations that are significant in
calculating the outliers of a scatterplot. We also propose a prototype, called
Outliagnostics, to guide users when interactively exploring abnormalities in
large time series. Instead of focusing on detecting outliers at each time
point, we monitor and display the discrepant temporal signatures of each data
entry concerning the overall distributions. Our prototype is designed to handle
these tasks in parallel to improve performance. To highlight the benefits and
performance of our approach, we illustrate and validate the use of
Outliagnostics on real-world datasets of various sizes in different parallelism
configurations. This work also discusses how to extend these ideas to handle
time series with a higher number of dimensions and provides a prototype for
this type of datasets.Comment: in IEEE Visualization in Data Science (IEEE VDS) (2019
NeighViz: Towards Better Understanding of Neighborhood Effects on Social Groups with Spatial Data
Understanding how local environments influence individual behaviors, such as
voting patterns or suicidal tendencies, is crucial in social science to reveal
and reduce spatial disparities and promote social well-being. With the
increasing availability of large-scale individual-level census data, new
analytical opportunities arise for social scientists to explore human behaviors
(e.g., political engagement) among social groups at a fine-grained level.
However, traditional statistical methods mostly focus on global, aggregated
spatial correlations, which are limited to understanding and comparing the
impact of local environments (e.g., neighborhoods) on human behaviors among
social groups. In this study, we introduce a new analytical framework for
analyzing multi-variate neighborhood effects between social groups. We then
propose NeighVi, an interactive visual analytics system that helps social
scientists explore, understand, and verify the influence of neighborhood
effects on human behaviors. Finally, we use a case study to illustrate the
effectiveness and usability of our system.Comment: Symposium on Visualization in Data Science (VDS) at IEEE VIS 202
How Do Data Science Workers Communicate Intermediate Results?
Data science workers increasingly collaborate on large-scale projects before
communicating insights to a broader audience in the form of visualization.
While prior work has modeled how data science teams, oftentimes with distinct
roles and work processes, communicate knowledge to outside stakeholders, we
have little knowledge of how data science workers communicate intermediately
before delivering the final products. In this work, we contribute a nuanced
description of the intermediate communication process within data science
teams. By analyzing interview data with 8 self-identified data science workers,
we characterized the data science intermediate communication process with four
factors, including the types of audience, communication goals, shared
artifacts, and mode of communication. We also identified overarching challenges
in the current communication process. We also discussed design implications
that might inform better tools that facilitate intermediate communication
within data science teams.Comment: This paper was accepted for presentation as part of the eighth
Symposium on Visualization in Data Science (VDS) at ACM KDD 2022 as well as
IEEE VIS 2022. http://www.visualdatascience.org/2022/index.htm
Prediction Scores as a Window into Classifier Behavior
Most multi-class classifiers make their prediction for a test sample by
scoring the classes and selecting the one with the highest score. Analyzing
these prediction scores is useful to understand the classifier behavior and to
assess its reliability. We present an interactive visualization that
facilitates per-class analysis of these scores. Our system, called Classilist,
enables relating these scores to the classification correctness and to the
underlying samples and their features. We illustrate how such analysis reveals
varying behavior of different classifiers. Classilist is available for use
online, along with source code, video tutorials, and plugins for R, RapidMiner,
and KNIME at https://katehara.github.io/classilist-site/.Comment: Presented at NIPS 2017 Symposium on Interpretable Machine Learnin
DPVis: Visual Analytics with Hidden Markov Models for Disease Progression Pathways
Clinical researchers use disease progression models to understand patient
status and characterize progression patterns from longitudinal health records.
One approach for disease progression modeling is to describe patient status
using a small number of states that represent distinctive distributions over a
set of observed measures. Hidden Markov models (HMMs) and its variants are a
class of models that both discover these states and make inferences of health
states for patients. Despite the advantages of using the algorithms for
discovering interesting patterns, it still remains challenging for medical
experts to interpret model outputs, understand complex modeling parameters, and
clinically make sense of the patterns. To tackle these problems, we conducted a
design study with clinical scientists, statisticians, and visualization
experts, with the goal to investigate disease progression pathways of chronic
diseases, namely type 1 diabetes (T1D), Huntington's disease, Parkinson's
disease, and chronic obstructive pulmonary disease (COPD). As a result, we
introduce DPVis which seamlessly integrates model parameters and outcomes of
HMMs into interpretable and interactive visualizations. In this study, we
demonstrate that DPVis is successful in evaluating disease progression models,
visually summarizing disease states, interactively exploring disease
progression patterns, and building, analyzing, and comparing clinically
relevant patient subgroups.Comment: to appear at IEEE Transactions on Visualization and Computer Graphic
- …