592,983 research outputs found

    An Introduction to Data Visualization with Tableau

    Get PDF
    Tony Nguyen, MLIS, AHIP, is Technology & Communications Coordinator, National Network of Libraries of Medicine (NNLM), Southeastern/Atlantic Region (SEA), University of Maryland, Baltimore. This presentation is an introduction to the concepts of visually representing data with the tool Tableau

    Teaching Stats for Data Science

    Get PDF
    “Data science” is a useful catchword for methods and concepts original to the field of statistics, but typically being applied to large, multivariate, observational records. Such datasets call for techniques not often part of an introduction to statistics: modeling, consideration of covariates, sophisticated visualization, and causal reasoning. This article re-imagines introductory statistics as an introduction to data science and proposes a sequence of 10 blocks that together compose a suitable course for extracting information from contemporary data. Recent extensions to the mosaic packages for R together with tools from the “tidyverse” provide a concise and readable notation for wrangling, visualization, model-building, and model interpretation: the fundamental computational tasks of data science

    Introduction to Data Visualization

    Get PDF
    This would be an introduction to the domain of data visualization. The presentation core is composed of a series of examples organized in three time-based sections: classic visualizations (18th and 19th centuries), modern visualizations (20th century), and contemporary visualizations (21st century). This presentation is not intended to be exhaustive at a historical level, the intent is rather to introduce the subjects of discussion that make these examples interesting, according to a temporal sequence

    Introduction to Data Visualization with Tableau

    Get PDF

    Image processing mini manual

    Get PDF
    The intent is to provide an introduction to the image processing capabilities available at the Langley Research Center (LaRC) Central Scientific Computing Complex (CSCC). Various image processing software components are described. Information is given concerning the use of these components in the Data Visualization and Animation Laboratory at LaRC

    Teaching Data Science

    Get PDF
    We describe an introductory data science course, entitled Introduction to Data Science, offered at the University of Illinois at Urbana-Champaign. The course introduced general programming concepts by using the Python programming language with an emphasis on data preparation, processing, and presentation. The course had no prerequisites, and students were not expected to have any programming experience. This introductory course was designed to cover a wide range of topics, from the nature of data, to storage, to visualization, to probability and statistical analysis, to cloud and high performance computing, without becoming overly focused on any one subject. We conclude this article with a discussion of lessons learned and our plans to develop new data science courses.Comment: 10 pages, 4 figures, International Conference on Computational Science (ICCS 2016

    PROPHECY—a database for high-resolution phenomics

    Get PDF
    The rapid recent evolution of the field phenomics—the genome-wide study of gene dispensability by quantitative analysis of phenotypes—has resulted in an increasing demand for new data analysis and visualization tools. Following the introduction of a novel approach for precise, genome-wide quantification of gene dispensability in Saccharomyces cerevisiae we here announce a public resource for mining, filtering and visualizing phenotypic data—the PROPHECY database. PROPHECY is designed to allow easy and flexible access to physiologically relevant quantitative data for the growth behaviour of mutant strains in the yeast deletion collection during conditions of environmental challenges. PROPHECY is publicly accessible at http://prophecy.lundberg.gu.se

    Parallel Hierarchies: Interactive Visualization of Multidimensional Hierarchical Aggregates

    Get PDF
    Exploring multi-dimensional hierarchical data is a long-standing problem present in a wide range of fields such as bioinformatics, software systems, social sciences and business intelligence. While each hierarchical dimension within these data structures can be explored in isolation, critical information lies in the relationships between dimensions. Existing approaches can either simultaneously visualize multiple non-hierarchical dimensions, or only one or two hierarchical dimensions. Yet, the challenge of visualizing multi-dimensional hierarchical data remains open. To address this problem, we developed a novel data visualization approach -- Parallel Hierarchies -- that we demonstrate on a real-life SAP SE product called SAP Product Lifecycle Costing. The starting point of the research is a thorough customer-driven requirement engineering phase including an iterative design process. To avoid restricting ourselves to a domain-specific solution, we abstract the data and tasks gathered from users, and demonstrate the approach generality by applying Parallel Hierarchies to datasets from bioinformatics and social sciences. Moreover, we report on a qualitative user study conducted in an industrial scenario with 15 experts from 9 different companies. As a result of this co-innovation experience, several SAP customers requested a product feature out of our solution. Moreover, Parallel Hierarchies integration as a standard diagram type into SAP Analytics Cloud platform is in progress. This thesis further introduces different uncertainty representation methods applicable to Parallel Hierarchies and in general to flow diagrams. We also present a visual comparison taxonomy for time-series of hierarchically structured data with one or multiple dimensions. Moreover, we propose several visual solutions for comparing hierarchies employing flow diagrams. Finally, after presenting two application examples of Parallel Hierarchies on industrial datasets, we detail two validation methods to examine the effectiveness of the visualization solution. Particularly, we introduce a novel design validation table to assess the perceptual aspects of eight different visualization solutions including Parallel Hierarchies.:1 Introduction 1.1 Motivation and Problem Statement 1.2 Research Goals 1.3 Outline and Contributions 2 Foundations of Visualization 2.1 Information Visualization 2.1.1 Terms and Definition 2.1.2 What: Data Structures 2.1.3 Why: Visualization Tasks 2.1.4 How: Visualization Techniques 2.1.5 How: Interaction Techniques 2.2 Visual Perception 2.2.1 Visual Variables 2.2.2 Attributes of Preattentive and Attentive Processing 2.2.3 Gestalt Principles 2.3 Flow Diagrams 2.3.1 Classifications of Flow Diagrams 2.3.2 Main Visual Features 2.4 Summary 3 Related Work 3.1 Cross-tabulating Hierarchical Categories 3.1.1 Visualizing Categorical Aggregates of Item Sets 3.1.2 Hierarchical Visualization of Categorical Aggregates 3.1.3 Visualizing Item Sets and Their Hierarchical Properties 3.1.4 Hierarchical Visualization of Categorical Set Aggregates 3.2 Uncertainty Visualization 3.2.1 Uncertainty Taxonomies 3.2.2 Uncertainty in Flow Diagrams 3.3 Time-Series Data Visualization 3.3.1 Time & Data 3.3.2 User Tasks 3.3.3 Visual Representation 3.4 Summary ii Contents 4 Requirement Engineering Phase 4.1 Introduction 4.2 Environment 4.2.1 The Product 4.2.2 The Customers and Development Methodology 4.2.3 Lessons Learned 4.3 Visualization Requirements for Product Costing 4.3.1 Current Visualization Practice 4.3.2 Visualization Tasks 4.3.3 Data Structure and Size 4.3.4 Early Visualization Prototypes 4.3.5 Challenges and Lessons Learned 4.4 Data and Task Abstraction 4.4.1 Data Abstraction 4.4.2 Task Abstraction 4.5 Summary and Outlook 5 Parallel Hierarchies 5.1 Introduction 5.2 The Parallel Hierarchies Technique 5.2.1 The Individual Axis: Showing Hierarchical Categories 5.2.2 Two Interlinked Axes: Showing Pairwise Frequencies 5.2.3 Multiple Linked Axes: Propagating Frequencies 5.2.4 Fine-tuning Parallel Hierarchies through Reordering 5.3 Design Choices 5.4 Applying Parallel Hierarchies 5.4.1 US Census Data 5.4.2 Yeast Gene Ontology Annotations 5.5 Evaluation 5.5.1 Setup of the Evaluation 5.5.2 Procedure of the Evaluation 5.5.3 Results from the Evaluation 5.5.4 Validity of the Evaluation 5.6 Summary and Outlook 6 Visualizing Uncertainty in Flow Diagrams 6.1 Introduction 6.2 Uncertainty in Product Costing 6.2.1 Background 6.2.2 Main Causes of Bad Quality in Costing Data 6.3 Visualization Concepts 6.4 Uncertainty Visualization using Ribbons 6.4.1 Selected Visualization Techniques 6.4.2 Study Design and Procedure 6.4.3 Results 6.4.4 Discussion 6.5 Revised Visualization Approach using Ribbons 6.5.1 Application to Sankey Diagram 6.5.2 Application to Parallel Sets 6.5.3 Application to Parallel Hierarchies 6.6 Uncertainty Visualization using Nodes 6.6.1 Visual Design of Nodes 6.6.2 Expert Evaluation 6.7 Summary and Outlook 7 Visual Comparison Task 7.1 Introduction 7.2 Comparing Two One-dimensional Time Steps 7.2.1 Problem Statement 7.2.2 Visualization Design 7.3 Comparing Two N-dimensional Time Steps 7.4 Comparing Several One-dimensional Time Steps 7.5 Summary and Outlook 8 Parallel Hierarchies in Practice 8.1 Application to Plausibility Check Task 8.1.1 Plausibility Check Process 8.1.2 Visual Exploration of Machine Learning Results 8.2 Integration into SAP Analytics Cloud 8.2.1 SAP Analytics Cloud 8.2.2 Ocean to Table Project 8.3 Summary and Outlook 9 Validation 9.1 Introduction 9.2 Nested Model Validation Approach 9.3 Perceptual Validation of Visualization Techniques 9.3.1 Design Validation Table 9.3.2 Discussion 9.4 Summary and Outlook 10 Conclusion and Outlook 10.1 Summary of Findings 10.2 Discussion 10.3 Outlook A Questionnaires of the Evaluation B Survey of the Quality of Product Costing Data C Questionnaire of Current Practice Bibliograph

    Topological Data Analysis with Mapper

    Get PDF
    This project is an introduction and overview of Mapper. Mapper is a method of high dimensional data visualization. Data visualization is a very important part of data analysis as it allows for further interpretation and exploration of data. Visualization of high dimensional data sets can be challenging as each variable is a new dimension that must be represented on a 2D, or at most 3D, graph. Mapper allows for high dimensional visualization by using Topological methods to study the relationships between points. This project goes over two different data set: the Iris data set, and a high dimensional data set called Fetal Health. This project goes over the two data sets and shows how to tune and interpret a Mapper graph
    • …
    corecore