Search CORE

27,692 research outputs found

Viewpoints: A high-performance high-dimensional exploratory data analysis tool

Author: C. Levit
Høg E.
M. J. Way
P. R. Gazis
Publication venue: 'University of Chicago Press'
Publication date: 08/11/2010
Field of study

Scientific data sets continue to increase in both size and complexity. In the past, dedicated graphics systems at supercomputing centers were required to visualize large data sets, but as the price of commodity graphics hardware has dropped and its capability has increased, it is now possible, in principle, to view large complex data sets on a single workstation. To do this in practice, an investigator will need software that is written to take advantage of the relevant graphics hardware. The Viewpoints visualization package described herein is an example of such software. Viewpoints is an interactive tool for exploratory visual analysis of large, high-dimensional (multivariate) data. It leverages the capabilities of modern graphics boards (GPUs) to run on a single workstation or laptop. Viewpoints is minimalist: it attempts to do a small set of useful things very well (or at least very quickly) in comparison with similar packages today. Its basic feature set includes linked scatter plots with brushing, dynamic histograms, normalization and outlier detection/removal. Viewpoints was originally designed for astrophysicists, but it has since been used in a variety of fields that range from astronomy, quantum chemistry, fluid dynamics, machine learning, bioinformatics, and finance to information technology server log mining. In this article, we describe the Viewpoints package and show examples of its usage.Comment: 18 pages, 3 figures, PASP in press, this version corresponds more closely to that to be publishe

arXiv.org e-Print Archive

Crossref

Exploratory topic modeling with distributional semantics

Author: A Treisman
DA Keim
DM Blei
J Risch
L Barth
M Bostock
S Fortunato
S Lohmann
S Palmer
Y Bengio
Publication venue
Publication date: 16/07/2015
Field of study

As we continue to collect and store textual data in a multitude of domains, we are regularly confronted with material whose largely unknown thematic structure we want to uncover. With unsupervised, exploratory analysis, no prior knowledge about the content is required and highly open-ended tasks can be supported. In the past few years, probabilistic topic modeling has emerged as a popular approach to this problem. Nevertheless, the representation of the latent topics as aggregations of semi-coherent terms limits their interpretability and level of detail. This paper presents an alternative approach to topic modeling that maps topics as a network for exploration, based on distributional semantics using learned word vectors. From the granular level of terms and their semantic similarity relations global topic structures emerge as clustered regions and gradients of concepts. Moreover, the paper discusses the visual interactive representation of the topic map, which plays an important role in supporting its exploration.Comment: Conference: The Fourteenth International Symposium on Intelligent Data Analysis (IDA 2015

arXiv.org e-Print Archive

Crossref

Recommended from our members

Visualisation of Origins, Destinations and Flows with OD Maps

Author: Aidan Slingsby
Andrienko G.
Bertin J
Cui W.
Gilbert M.
Guo D
Guo D
Guo D.
Guo D.
Hernandez T
Holten D.
Jarvis R.
Jason Dykes
Jo Wood
Openshaw S
Paci R.
Slingsby A.
Tobler W
Wilkinson L.
Yi J. S.
Publication venue: 'Maney Publishing'
Publication date: 01/05/2010
Field of study

We present a new technique for the visual exploration of origins (O) and destinations (D) arranged in geographic space. Previous attempts to map the flows between origins and destinations have suffered from problems of occlusion usually requiring some form of generalisation, such as aggregation or flow density estimation before they can be visualized. This can lead to loss of detail or the introduction of arbitrary artefacts in the visual representation. Here, we propose mapping OD vectors as cells rather than lines, comparable with the process of constructing OD matrices, but unlike the OD matrix, we preserve the spatial layout of all origin and destination locations by constructing a gridded two‐level spatial treemap. The result is a set of spatially ordered small multiples upon which any arbitrary geographic data may be projected. Using a hash grid spatial data structure, we explore the characteristics of the technique through a software prototype that allows interactive query and visualisation of 105‐106 simulated and recorded OD vectors. The technique is illustrated using US county to county migration and commuting statistics

City Research Online

Crossref

UCL Discovery

Visual and interactive exploration of point data

Author: Tobon Carolina
Publication venue: Centre for Advanced Spatial Analysis (UCL)
Publication date: 01/01/2001
Field of study

Point data, such as Unit Postcodes (UPC), can provide very detailed information at fine scales of resolution. For instance, socio-economic attributes are commonly assigned to UPC. Hence, they can be represented as points and observable at the postcode level. Using UPC as a common field allows the concatenation of variables from disparate data sources that can potentially support sophisticated spatial analysis. However, visualising UPC in urban areas has at least three limitations. First, at small scales UPC occurrences can be very dense making their visualisation as points difficult. On the other hand, patterns in the associated attribute values are often hardly recognisable at large scales. Secondly, UPC can be used as a common field to allow the concatenation of highly multivariate data sets with an associated postcode. Finally, socio-economic variables assigned to UPC (such as the ones used here) can be non-Normal in their distributions as a result of a large presence of zero values and high variances which constrain their analysis using traditional statistics. This paper discusses a Point Visualisation Tool (PVT), a proof-of-concept system developed to visually explore point data. Various well-known visualisation techniques were implemented to enable their interactive and dynamic interrogation. PVT provides multiple representations of point data to facilitate the understanding of the relations between attributes or variables as well as their spatial characteristics. Brushing between alternative views is used to link several representations of a single attribute, as well as to simultaneously explore more than one variable. PVT’s functionality shows how the use of visual techniques embedded in an interactive environment enable the exploration of large amounts of multivariate point data

CiteSeerX

UCL Discovery

Self-Organizing Time Map: An Abstraction of Temporal Multivariate Patterns

Author: Agarwal
Andrienko
Aupetit
Back
Back
Barreto
Barreto
Barreto
Bertin
Chappell
Cottrell
Deboeck
Denny
Fritzke
Guimarães
Guimarães
Guo
Hagenbuchner
Hammer
Harrower
Horio
Kaski
Kohonen
Kohonen
Kohonen
Kohonen
Koskela
Martín-del-Brío
Peter Sarlin
Sammon
Sarlin
Strickert
Strickert
Vesanto
Voegtlin
Publication venue: 'Elsevier BV'
Publication date: 09/08/2012
Field of study

This paper adopts and adapts Kohonen's standard Self-Organizing Map (SOM) for exploratory temporal structure analysis. The Self-Organizing Time Map (SOTM) implements SOM-type learning to one-dimensional arrays for individual time units, preserves the orientation with short-term memory and arranges the arrays in an ascending order of time. The two-dimensional representation of the SOTM attempts thus twofold topology preservation, where the horizontal direction preserves time topology and the vertical direction data topology. This enables discovering the occurrence and exploring the properties of temporal structural changes in data. For representing qualities and properties of SOTMs, we adapt measures and visualizations from the standard SOM paradigm, as well as introduce a measure of temporal structural changes. The functioning of the SOTM, and its visualizations and quality and property measures, are illustrated on artificial toy data. The usefulness of the SOTM in a real-world setting is shown on poverty, welfare and development indicators

arXiv.org e-Print Archive

Crossref

Analyzing big time series data in solar engineering using features and PCA

Author: Dong Zibo
Lim Li Hong I.
Liu Licheng
Yang Dazhi
Publication venue: 'Elsevier BV'
Publication date: 01/09/2017
Field of study

In solar engineering, we encounter big time series data such as the satellite-derived irradiance data and string-level measurements from a utility-scale photovoltaic (PV) system. While storing and hosting big data are certainly possible using today’s data storage technology, it is challenging to effectively and efficiently visualize and analyze the data. We consider a data analytics algorithm to mitigate some of these challenges in this work. The algorithm computes a set of generic and/or application-specific features to characterize the time series, and subsequently uses principal component analysis to project these features onto a two-dimensional space. As each time series can be represented by features, it can be treated as a single data point in the feature space, allowing many operations to become more amenable. Three applications are discussed within the overall framework, namely (1) the PV system type identification, (2) monitoring network design, and (3) anomalous string detection. The proposed framework can be easily translated to many other solar engineer applications

Crossref

Enlighten