5,036 research outputs found
Capturing Evolution Genes for Time Series Data
The modeling of time series is becoming increasingly critical in a wide
variety of applications. Overall, data evolves by following different patterns,
which are generally caused by different user behaviors. Given a time series, we
define the evolution gene to capture the latent user behaviors and to describe
how the behaviors lead to the generation of time series. In particular, we
propose a uniform framework that recognizes different evolution genes of
segments by learning a classifier, and adopt an adversarial generator to
implement the evolution gene by estimating the segments' distribution.
Experimental results based on a synthetic dataset and five real-world datasets
show that our approach can not only achieve a good prediction results (e.g.,
averagely +10.56% in terms of F1), but is also able to provide explanations of
the results.Comment: a preprint version. arXiv admin note: text overlap with
arXiv:1703.10155 by other author
Overcoming Language Dichotomies: Toward Effective Program Comprehension for Mobile App Development
Mobile devices and platforms have become an established target for modern
software developers due to performant hardware and a large and growing user
base numbering in the billions. Despite their popularity, the software
development process for mobile apps comes with a set of unique, domain-specific
challenges rooted in program comprehension. Many of these challenges stem from
developer difficulties in reasoning about different representations of a
program, a phenomenon we define as a "language dichotomy". In this paper, we
reflect upon the various language dichotomies that contribute to open problems
in program comprehension and development for mobile apps. Furthermore, to help
guide the research community towards effective solutions for these problems, we
provide a roadmap of directions for future work.Comment: Invited Keynote Paper for the 26th IEEE/ACM International Conference
on Program Comprehension (ICPC'18
Unsupervised learning with contrastive latent variable models
In unsupervised learning, dimensionality reduction is an important tool for
data exploration and visualization. Because these aims are typically
open-ended, it can be useful to frame the problem as looking for patterns that
are enriched in one dataset relative to another. These pairs of datasets occur
commonly, for instance a population of interest vs. control or signal vs.
signal free recordings.However, there are few methods that work on sets of data
as opposed to data points or sequences. Here, we present a probabilistic model
for dimensionality reduction to discover signal that is enriched in the target
dataset relative to the background dataset. The data in these sets do not need
to be paired or grouped beyond set membership. By using a probabilistic model
where some structure is shared amongst the two datasets and some is unique to
the target dataset, we are able to recover interesting structure in the latent
space of the target dataset. The method also has the advantages of a
probabilistic model, namely that it allows for the incorporation of prior
information, handles missing data, and can be generalized to different
distributional assumptions. We describe several possible variations of the
model and demonstrate the application of the technique to de-noising, feature
selection, and subgroup discovery settings
Visual Analysis of High-Dimensional Event Sequence Data via Dynamic Hierarchical Aggregation
Temporal event data are collected across a broad range of domains, and a
variety of visual analytics techniques have been developed to empower analysts
working with this form of data. These techniques generally display aggregate
statistics computed over sets of event sequences that share common patterns.
Such techniques are often hindered, however, by the high-dimensionality of many
real-world event sequence datasets because the large number of distinct event
types within such data prevents effective aggregation. A common coping strategy
for this challenge is to group event types together as a pre-process, prior to
visualization, so that each group can be represented within an analysis as a
single event type. However, computing these event groupings as a pre-process
also places significant constraints on the analysis. This paper presents a
dynamic hierarchical aggregation technique that leverages a predefined
hierarchy of dimensions to computationally quantify the informativeness of
alternative levels of grouping within the hierarchy at runtime. This allows
users to dynamically explore the hierarchy to select the most appropriate level
of grouping to use at any individual step within an analysis. Key contributions
include an algorithm for interactively determining the most informative set of
event groupings from within a large-scale hierarchy of event types, and a
scatter-plus-focus visualization that supports interactive hierarchical
exploration. While these contributions are generalizable to other types of
problems, we apply them to high-dimensional event sequence analysis using
large-scale event type hierarchies from the medical domain. We describe their
use within a medical cohort analysis tool called Cadence, demonstrate an
example in which the proposed technique supports better views of event sequence
data, and report findings from domain expert interviews.Comment: To Appear in IEEE Transactions on Visualization and Computer Graphics
(TVCG), Volume 26 Issue 1, 2020. Also part of proceedings for IEEE VAST 201
Implementation of an interactive pattern mining framework on electronic health record datasets
Large collections of electronic patient records contain a broad range of clinical information highly relevant for data analysis. However, they are maintained primarily for patient administration, and automated methods are required to extract valuable knowledge for predictive, preventive, personalized and participatory medicine. Sequential pattern mining is a fundamental task in data mining which can be used to find statistically relevant, non-trivial temporal dependencies of events such as disease comorbidities. This works objective is to use this mining technique to identify disease associations based on ICD-9-CM codes data of the entire Taiwanese population obtained from Taiwan’s National Health Insurance Research Database.
This thesis reports the development and implementation of the Disease Pattern Miner – a pattern mining framework in a medical domain. The framework was designed as a Web application which can be used to run several state-of-the-art sequence mining algorithms on electronic health records, collect and filter the results to reduce the number of patterns to a meaningful size, and visualize the disease associations as an interactive model in a specific population group. This may be crucial to discover new disease associations and offer novel insights to explain disease pathogenesis. A structured evaluation of the data and models are required before medical data-scientist may use this application as a tool for further research to get a better understanding of disease comorbidities
- …