Search CORE

449 research outputs found

Doctor of Philosophy

Author: Nguyen Hoa Thanh
Publication venue: University of Utah
Publication date: 01/01/2017
Field of study

dissertationCorrelation is a powerful relationship measure used in many fields to estimate trends and make forecasts. When the data are complex, large, and high dimensional, correlation identification is challenging. Several visualization methods have been proposed to solve these problems, but they all have limitations in accuracy, speed, or scalability. In this dissertation, we propose a methodology that provides new visual designs that show details when possible and aggregates when necessary, along with robust interactive mechanisms that together enable quick identification and investigation of meaningful relationships in large and high-dimensional data. We propose four techniques using this methodology. Depending on data size and dimensionality, the most appropriate visualization technique can be provided to optimize the analysis performance. First, to improve correlation identification tasks between two dimensions, we propose a new correlation task-specific visualization method called correlation coordinate plot (CCP). CCP transforms data into a powerful coordinate system for estimating the direction and strength of correlations among dimensions. Next, we propose three visualization designs to optimize correlation identification tasks in large and multidimensional data. The first is snowflake visualization (Snowflake), a focus+context layout for exploring all pairwise correlations. The next proposed design is a new interactive design for representing and exploring data relationships in parallel coordinate plots (PCPs) for large data, called data scalable parallel coordinate plots (DSPCP). Finally, we propose a novel technique for storing and accessing the multiway dependencies through visualization (MultiDepViz). We evaluate these approaches by using various use cases, compare them to prior work, and generate user studies to demonstrate how our proposed approaches help users explore correlation in large data efficiently. Our results confirmed that CCP/Snowflake, DSPCP, and MultiDepViz methods outperform some current visualization techniques such as scatterplots (SCPs), PCPs, SCP matrix, Corrgram, Angular Histogram, and UntangleMap in both accuracy and timing. Finally, these approaches are applied in real-world applications such as a debugging tool, large-scale code performance data, and large-scale climate data

The University of Utah: J. Willard Marriott Digital Library

Computational Statistics and Data Visualization

Author: Antony Unwin
Chun-houh Chen
Wolfgang Härdle
Publication venue
Publication date
Field of study

This book is the third volume of the Handbook of Computational Statistics and covers the field of Data Visualization. In line with the companion volumes, it contains a collection of chapters by experts in the field to present readers with an up-to-date and comprehensive overview of the state of the art. Data Visualization is an active area of application and research and this is a good time to gather together a summary of current knowledge. Graphic displays are often very effective at communicating information. They are also very often not effective at communicating information. Two important reasons for this state of affairs are that graphics can be produced with a few clicks of the mouse without any thought, and that the design of graphics is not taken seriously in many scientific textbooks. Some people seem to think that preparing good graphics is just a matter of common sense (in which case their common sense cannot be in good shape) and others believe that preparing graphics is a low-level task, not appropriate for scientific attention. This volume of the Handbook of Computational Statistics takes graphics for Data Visualization seriously.Data Visualization, Exploratory Graphics.

Research Papers in Economics

Angular Histograms: Frequency-Based Visualizations for Large, High Dimensional Data

Author: J. C. Roberts
R. S. Laramee
R. Walker
Zhao Geng
ZhenMin Peng
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2011
Field of study

Crossref

Cronfa at Swansea University

The visual uncertainty paradigm for controlling screen-space information in visualization

Author: Dasgupta Aritra
NC DOCKS at The University of North Carolina at Charlotte
Publication venue
Publication date: 01/01/2012
Field of study

The information visualization pipeline serves as a lossy communication channel for presentation of data on a screen-space of limited resolution. The lossy communication is not just a machine-only phenomenon due to information loss caused by translation of data, but also a reflection of the degree to which the human user can comprehend visual information. The common entity in both aspects is the uncertainty associated with the visual representation. However, in the current linear model of the visualization pipeline, visual representation is mostly considered as the ends rather than the means for facilitating the analysis process. While the perceptual side of visualization is also being studied, little attention is paid to the way the visualization appears on the display. Thus, we believe there is a need to study the appearance of the visualization on a limited-resolution screen in order to understand its own properties and how they influence the way they represent the data. I argue that the visual uncertainty paradigm for controlling screen-space information will enable us in achieving user-centric optimization of a visualization in different application scenarios. Conceptualization of visual uncertainty enables us to integrate the encoding and decoding aspects of visual representation into a holistic framework facilitating the definition of metrics that serve as a bridge between the last stages of the visualization pipeline and the user's perceptual system. The goal of this dissertation is three-fold: i) conceptualize a visual uncertainty taxonomy in the context of pixel-based, multi-dimensional visualization techniques that helps systematic definition of screen-space metrics, ii) apply the taxonomy for identifying sources of useful visual uncertainty that helps in protecting privacy of sensitive data and also for identifying the types of uncertainty that can be reduced through interaction techniques, and iii) application of the metrics for designing information-assisted models that help in visualization of high-dimensional, temporal data

The University of North Carolina at Greensboro

A Distance-preserving Matrix Sketch

Author: Luo Hengrui
Wilkinson Leland
Publication venue
Publication date: 19/11/2021
Field of study

Visualizing very large matrices involves many formidable problems. Various popular solutions to these problems involve sampling, clustering, projection, or feature selection to reduce the size and complexity of the original task. An important aspect of these methods is how to preserve relative distances between points in the higher-dimensional space after reducing rows and columns to fit in a lower dimensional space. This aspect is important because conclusions based on faulty visual reasoning can be harmful. Judging dissimilar points as similar or similar points as dissimilar on the basis of a visualization can lead to false conclusions. To ameliorate this bias and to make visualizations of very large datasets feasible, we introduce two new algorithms that respectively select a subset of rows and columns of a rectangular matrix. This selection is designed to preserve relative distances as closely as possible. We compare our matrix sketch to more traditional alternatives on a variety of artificial and real datasets.Comment: 38 pages, 13 figure

arXiv.org e-Print Archive

Empirically measuring soft knowledge in visualization

Author: Abdul-Rahman Alfie
Chen Min
Kijmongkolchai Natchaya
Publication venue: 'Wiley'
Publication date: 01/01/2017
Field of study

In this paper, we present an empirical study designed to evaluate the hypothesis that humans’ soft knowledge can enhance the cost-benefit ratio of a visualization process by reducing the potential distortion. In particular, we focused on the impact of three classes of soft knowledge: (i) knowledge about application contexts, (ii) knowledge about the patterns to be observed (i.e., in relation to visualization task), and (iii) knowledge about statistical measures. We mapped these classes into three control variables, and used real-world time series data to construct stimuli. The results of the study confirmed the positive contribution of each class of knowledge towards the reduction of the potential distortion, while the knowledge about the patterns prevents distortion more effectively than the other two classes

Crossref

Oxford University Research Archive

King's Research Portal

Effective Visualization Approaches For Ultra-High Dimensional Datasets

Author: Kaur Gurminder
Publication venue: LSU Digital Commons
Publication date: 23/10/2018
Field of study

Multivariate informational data, which are abstract as well as complex, are becoming increasingly common in many areas such as scientific, medical, social, business, and so on. Displaying and analyzing large amounts of multivariate data with more than three variables of different types is quite challenging. Visualization of such multivariate data suffers from a high degree of clutter when the numbers of dimensions/variables and data observations become too large. We propose multiple approaches to effectively visualize large datasets of ultrahigh number of dimensions by generalizing two standard multivariate visualization methods, namely star plot and parallel coordinates plot. We refine three variants of the star plot, which include overlapped star plot, shifted origin plot, and multilevel star plot by embedding distribution plots, displaying dataset in groups, and supporting adjustable positioning of the star axes. We introduce a bifocal parallel coordinates plot (BPCP) based on the focus + context approach. BPCP splits vertically the overall rendering area into the focus and context regions. The focus area maps a few selected dimensions of interest at sufficiently wide spacing. The remaining dimensions are represented in the context area in a compact way to retain useful information and provide the data continuity. The focus display can be further enriched with various options, such as axes overlays, scatterplot, and nested PCPs. In order to accommodate an arbitrarily large number of dimensions, the context display supports the multi-level stacked view. Finally, we present two innovative ways of enhancing parallel coordinates axes to better understand all variables and their interrelationships in high-dimensional datasets. Histogram and circle/ellipse plots based on uniform and non-uniform frequency/density mappings are adopted to visualize distributions of numerical and categorical data values. Color-mapped axis stripes are designed in the parallel coordinates layout so that correlations can be fully realized in the same display plot irrespective of axes locations. These colors are also propagated to histograms as stacked bars and categorical values as pie charts to further facilitate data exploration. By using the datasets consisting of 25 to 130 variables of different data types we have demonstrated effectiveness of the proposed multivariate visualization enhancements

Louisiana State University

Learning trajectories related to bivariate data in contemporary high school mathematics textbook series in the United States

Author: Tran Dung, 1981-
Publication venue: 'University of Missouri Libraries'
Publication date
Field of study

Bivariate relationships play a critical role in school statistics, and textbooks are significant in determining student learning. In recent years, researchers have emphasized the importance of learning trajectories (LTs) in mathematics education. In this study, I examined LTs for bivariate data in relation to the development of covariational reasoning in three high school textbooks series: Holt McDougal Larson (HML), The University of Chicago School of Mathematics Project (UCSMP), and Core-Plus Mathematics Project (CPMP). The LTs were generated by coding for the presence of variable combinations, learning goals, and techniques and theories. Task features were analyzed in relation to the GAISE Framework, NAEP mathematical complexity, purpose and utility, and the CCSSM Standards for Mathematical Practice. The LTs varied by the presence, development, and emphases of bivariate content and alignment with the GAISE Framework and CCSSM. Across three series, about 80% to 90% of the 582 bivariate instances addressed two numerical variables. The CPMP series followed the GAISE's developmental progression for all combinations whereas UCSMP deviated for two categorical variables. All CCSSM learning expectations were found in HML and CPMP but not in UCSMP. At the same time, several bivariate learning expectations present in textbooks were not found in CCSSM. For the task features, few instances were at a high level of mathematical complexity and rarely included a Collect Data component. Analyses revealed the accordance of the GAISE and mathematical complexity frameworks. Research findings provide implications for curriculum development, content analysis, and teacher education, and challenge the notion of CCSSM aligned curricula.Includes bibliographical references (pages 235-246)

University of Missouri: MOspace