302 research outputs found
A Survey on ML4VIS: Applying Machine Learning Advances to Data Visualization
Inspired by the great success of machine learning (ML), researchers have
applied ML techniques to visualizations to achieve a better design,
development, and evaluation of visualizations. This branch of studies, known as
ML4VIS, is gaining increasing research attention in recent years. To
successfully adapt ML techniques for visualizations, a structured understanding
of the integration of ML4VISis needed. In this paper, we systematically survey
88 ML4VIS studies, aiming to answer two motivating questions: "what
visualization processes can be assisted by ML?" and "how ML techniques can be
used to solve visualization problems?" This survey reveals seven main processes
where the employment of ML techniques can benefit visualizations:Data
Processing4VIS, Data-VIS Mapping, InsightCommunication, Style Imitation, VIS
Interaction, VIS Reading, and User Profiling. The seven processes are related
to existing visualization theoretical models in an ML4VIS pipeline, aiming to
illuminate the role of ML-assisted visualization in general
visualizations.Meanwhile, the seven processes are mapped into main learning
tasks in ML to align the capabilities of ML with the needs in visualization.
Current practices and future opportunities of ML4VIS are discussed in the
context of the ML4VIS pipeline and the ML-VIS mapping. While more studies are
still needed in the area of ML4VIS, we hope this paper can provide a
stepping-stone for future exploration. A web-based interactive browser of this
survey is available at https://ml4vis.github.ioComment: 19 pages, 12 figures, 4 table
You can't always sketch what you want: Understanding Sensemaking in Visual Query Systems
Visual query systems (VQSs) empower users to interactively search for line
charts with desired visual patterns, typically specified using intuitive
sketch-based interfaces. Despite decades of past work on VQSs, these efforts
have not translated to adoption in practice, possibly because VQSs are largely
evaluated in unrealistic lab-based settings. To remedy this gap in adoption, we
collaborated with experts from three diverse domains---astronomy, genetics, and
material science---via a year-long user-centered design process to develop a
VQS that supports their workflow and analytical needs, and evaluate how VQSs
can be used in practice. Our study results reveal that ad-hoc sketch-only
querying is not as commonly used as prior work suggests, since analysts are
often unable to precisely express their patterns of interest. In addition, we
characterize three essential sensemaking processes supported by our enhanced
VQS. We discover that participants employ all three processes, but in different
proportions, depending on the analytical needs in each domain. Our findings
suggest that all three sensemaking processes must be integrated in order to
make future VQSs useful for a wide range of analytical inquiries.Comment: Accepted for presentation at IEEE VAST 2019, to be held October 20-25
in Vancouver, Canada. Paper will also be published in a special issue of IEEE
Transactions on Visualization and Computer Graphics (TVCG) IEEE VIS
(InfoVis/VAST/SciVis) 2019 ACM 2012 CCS - Human-centered computing,
Visualization, Visualization design and evaluation method
Exploratory multivariate longitudinal data analysis and models for multivariate longitudinal binary data
Longitudinal data occurs when repeated measurements from the same subject are observed over time. In this thesis, exploratory data analysis and models are utilized jointly to analyze longitudinal data which leads to stronger and better justified conclusions. The complex structure of longitudinal data with covariates requires new visual methods that enable interactive exploration. Here we catalog the general principles of exploratory data analysis for multivariate longitudinal data, and illustrate the use of the linked brushing approach for studying the mean structure over time. It is possible to reveal the unexpected, to explore the interaction between responses and covariates, to observe the individual variations, understand structure in multiple dimensions, and diagnose and fix models by using these methods. We also propose models for multivariate longitudinal binary data that directly model marginal covariate effects while accounting for the dependence across time via a transition structure and across responses within a subject for a given time via random effects. Markov Chain Monte Carlo Methods, specifically Gibbs sampling with Hybrid steps, are used to sample from the posterior distribution of parameters. Graphical and quantitative checks are used to assess model fit. The methods are illustrated on several real datasets, primarily the Iowa Youth and Families Project.*;*This dissertation is a compound document (contains both a paper copy and a CD as part of the dissertation)
Statistical Algorithms and Bioinformatics Tools Development for Computational Analysis of High-throughput Transcriptomic Data
Next-Generation Sequencing technologies allow for a substantial increase in the amount of data available for various biological studies. In order to effectively and efficiently analyze this data, computational approaches combining mathematics, statistics, computer science, and biology are implemented. Even with the substantial efforts devoted to development of these approaches, numerous issues and pitfalls remain. One of these issues is mapping uncertainty, in which read alignment results are biased due to the inherent difficulties associated with accurately aligning RNA-Sequencing reads. GeneQC is an alignment quality control tool that provides insight into the severity of mapping uncertainty in each annotated gene from alignment results. GeneQC used feature extraction to identify three levels of information for each gene and implements elastic net regularization and mixture model fitting to provide insight in the severity of mapping uncertainty and the quality of read alignment. In combination with GeneQC, the Ambiguous Reads Mapping (ARM) algorithm works to re-align ambiguous reads through the integration of motif prediction from metabolic pathways to establish coregulatory gene modules for re-alignment using a negative binomial distribution-based probabilistic approach. These two tools work in tandem to address the issue of mapping uncertainty and provide more accurate read alignments, and thus more accurate expression estimates. Also presented in this dissertation are two approaches to interpreting the expression estimates. The first is IRIS-EDA, an integrated shiny web server that combines numerous analyses to investigate gene expression data generated from RNASequencing data. The second is ViDGER, an R/Bioconductor package that quickly generates high-quality visualizations of differential gene expression results to assist users in comprehensive interpretations of their differential gene expression results, which is a non-trivial task. These four presented tools cover a variety of aspects of modern RNASeq analyses and aim to address bottlenecks related to algorithmic and computational issues, as well as more efficient and effective implementation methods
- …