9,723 research outputs found

    Superheat: An R package for creating beautiful and extendable heatmaps for visualizing complex data

    Full text link
    The technological advancements of the modern era have enabled the collection of huge amounts of data in science and beyond. Extracting useful information from such massive datasets is an ongoing challenge as traditional data visualization tools typically do not scale well in high-dimensional settings. An existing visualization technique that is particularly well suited to visualizing large datasets is the heatmap. Although heatmaps are extremely popular in fields such as bioinformatics for visualizing large gene expression datasets, they remain a severely underutilized visualization tool in modern data analysis. In this paper we introduce superheat, a new R package that provides an extremely flexible and customizable platform for visualizing large datasets using extendable heatmaps. Superheat enhances the traditional heatmap by providing a platform to visualize a wide range of data types simultaneously, adding to the heatmap a response variable as a scatterplot, model results as boxplots, correlation information as barplots, text information, and more. Superheat allows the user to explore their data to greater depths and to take advantage of the heterogeneity present in the data to inform analysis decisions. The goal of this paper is two-fold: (1) to demonstrate the potential of the heatmap as a default visualization method for a wide range of data types using reproducible examples, and (2) to highlight the customizability and ease of implementation of the superheat package in R for creating beautiful and extendable heatmaps. The capabilities and fundamental applicability of the superheat package will be explored via three case studies, each based on publicly available data sources and accompanied by a file outlining the step-by-step analytic pipeline (with code).Comment: 26 pages, 10 figure

    Evaluation of a Bundling Technique for Parallel Coordinates

    Full text link
    We describe a technique for bundled curve representations in parallel-coordinates plots and present a controlled user study evaluating their effectiveness. Replacing the traditional C^0 polygonal lines by C^1 continuous piecewise Bezier curves makes it easier to visually trace data points through each coordinate axis. The resulting Bezier curves can then be bundled to visualize data with given cluster structures. Curve bundles are efficient to compute, provide visual separation between data clusters, reduce visual clutter, and present a clearer overview of the dataset. A controlled user study with 14 participants confirmed the effectiveness of curve bundling for parallel-coordinates visualization: 1) compared to polygonal lines, it is equally capable of revealing correlations between neighboring data attributes; 2) its geometric cues can be effective in displaying cluster information. For some datasets curve bundling allows the color perceptual channel to be applied to other data attributes, while for complex cluster patterns, bundling and color can represent clustering far more clearly than either alone

    SimpliFly: A Methodology for Simplification and Thematic Enhancement of Trajectories.

    Get PDF
    Movement data sets collected using today's advanced tracking devices consist of complex trajectories in terms of length, shape, and number of recorded positions. Multiple additional attributes characterizing the movement and its environment are often also included making the level of complexity even higher. Simplification of trajectories can improve the visibility of relevant information by reducing less relevant details while maintaining important movement patterns. We propose a systematic stepwise methodology for simplifying and thematically enhancing trajectories in order to support their visual analysis. The methodology is applied iteratively and is composed of: (a) a simplification step applied to reduce the morphological complexity of the trajectories, (b) a thematic enhancement step which aims at accentuating patterns of movement, and (c) the representation and interactive exploration of the results in order to make interpretations of the findings and further refinement to the simplification and enhancement process. We illustrate our methodology through an analysis example of two different types of tracks, aircraft and pedestrian movement

    Data analytics enhanced data visualization and interrogation with parallel coordinates plots

    Full text link
    © 2018 IEEE. Parallel coordinates plots (PCPs) suffer from curse of dimensionality when used with larger multidimensional datasets. Curse of dimentionality results in clutter which hides important visual data trends among coordinates. A number of solutions to address this problem have been proposed including filtering, aggregation, and dimension reordering. These solutions, however, have their own limitations with regard to exploring relationships and trends among the coordinates in PCPs. Correlation based coordinates reordering techniques are among the most popular and have been widely used in PCPs to reduce clutter, though based on the conducted experiments, this research has identified some of their limitations. To achieve better visualization with reduced clutter, we have proposed and evaluated dimensions reordering approach based on minimization of the number of crossing pairs. In the last step, k-means clustering is combined with reordered coordinates to highlight key trends and patterns. The conducted comparative analysis have shown that minimum crossings pairs approach performed much better than other applied techniques for coordinates reordering, and when combined with k-means clustering, resulted in better visualization with significantly reduced clutter

    QoS: Quality Driven Data Abstraction for Large Databases

    Get PDF
    Data abstraction is the process of reducing a large dataset into one of moderate size, while maintaining dominant characteristics of the original dataset. Data abstraction quality refers to the degree by which the abstraction represents original data. Clearly, the quality of an abstraction directly affects the confidence an analyst can have in results derived from such abstracted views about the actual data. While some initial measures to quantify the quality of abstraction have been proposed, they currently can only be used as an after thought. While an analyst can be made aware of the quality of the data he works with, he cannot control the desired quality and the trade off between the size of the abstraction and its quality. While some analysts require atleast a certain minimal level of quality, others must be able to work with certain sized abstraction due to resource limitations. consider the quality of the data while generating an abstraction. To tackle these problems, we propose a new data abstraction generation model, called the QoS model, that presents the performance quality trade-off to the analyst and considers that quality of the data while generating an abstraction. As the next step, it generates abstraction based on the desired level of quality versus time as indicated by the analyst. The framework has been integrated into XmdvTool, a freeware multi-variate data visualization tool developed at WPI. Our experimental results show that our approach provides better quality with the same resource usage compared to existing abstraction techniques

    Enhanced interactive parallel coordinates using machine learning and uncertainty propagation for engineering design

    Get PDF
    © 2019 IEEE. The design process of an engineering system requires thorough consideration of varied specifications, each with potentially large number of dimensions. The sheer volume of data, as well as its complexity, can overwhelm the designer and obscure vital information. Visualisation of big data can mitigate the issue of information overload but static display can suffer from overplotting. To tackle the issue of overplotting and cluttered data, we present an interactive and touch-screen capable visualisation toolkit that combines Parallel Coordinates and Scatter Plot approaches for managing multidimensional engineering design data. As engineering projects require a multitude of varied software to handle the various aspects of the design process, the combined datasets often do not have an underlying mathematical model. We address this issue by enhancing our visualisation software with Machine Learning methods which also facilitate further insights into the data. Furthermore, various software within the engineering design cycle produce information of different level of fidelity (accuracy and trustworthiness), as well as with different speed. The induced uncertainty is also considered and modelled in the synthetic dataset and is also presented in an interactive way. This paper describes a new visualisation software package and demonstrates its functionality on a complex aircraft systems design dataset
    • …
    corecore