12 research outputs found
Linguistic Geometries for Unsupervised Dimensionality Reduction
Text documents are complex high dimensional objects. To effectively visualize
such data it is important to reduce its dimensionality and visualize the low
dimensional embedding as a 2-D or 3-D scatter plot. In this paper we explore
dimensionality reduction methods that draw upon domain knowledge in order to
achieve a better low dimensional embedding and visualization of documents. We
consider the use of geometries specified manually by an expert, geometries
derived automatically from corpus statistics, and geometries computed from
linguistic resources.Comment: 13 pages, 15 figure
09251 Abstracts Collection -- Scientific Visualization
From 06-14-2009 to 06-19-2009, the Dagstuhl Seminar 09251 ``Scientific Visualization \u27\u27 was held in Schloss Dagstuhl~--~Leibniz Center for Informatics.
During the seminar, over 50 international participants presented their current
research, and ongoing work and open problems were discussed. Abstracts of
the presentations given during the seminar as well as abstracts of
seminar results and ideas are put together in this paper. The first section
describes the seminar topics and goals in general
Axiomatic geometries for text documents
High-dimensional structured data such as text and images is often poorly understood and misrepresented in statistical modelling. Typical approaches to modelling such data involve, either explicitly or implicitly, arbitrary geometric assumptions. In this chapter, we consider statistical modelling of non-Euclidean data whose geometry is obtained by embedding the data in a statistical manifold. The resulting models perform better than their Euclidean counterparts on real world data and draw an interesting connection betweenÄŒencov and Campbell's axiomatic characterisation of the Fisher information and the recently proposed diffusion kernels and square root embedding
Abstract visualization of large-scale time-varying data
The explosion of large-scale time-varying datasets has created critical challenges for scientists to study and digest. One core problem for visualization is to develop effective approaches that can be used to study various data features and temporal relationships among large-scale time-varying datasets.
In this dissertation, we first present two abstract visualization approaches to visualizing and analyzing time-varying datasets. The first approach visualizes time-varying datasets with succinct lines to represent temporal relationships of the datasets. A time line visualizes time steps as points and temporal sequence as a line. They are generated by sampling the distributions of virtual words across time to study temporal features. The key idea of time line is to encode various data properties with virtual words. We apply virtual words to characterize feature points and use their distribution statistics to measure temporal relationships. The second approach is ensemble visualization, which provides a highly abstract platform for visualizing an ensemble of datasets. Both approaches can be used for exploration, analysis, and demonstration purposes.
The second component of this dissertation is an animated visualization approach to study dramatic temporal changes. Animation has been widely used to show trends, dynamic features and transitions in scientific simulations, while animated visualization is new. We present an automatic animation generation approach that simulates the composition and transition of storytelling techniques and synthesizes animations to describe various event features. We also extend the concept of animated visualization to non-traditional time-varying datasets--network protocols--for visualizing key information in abstract sequences. We have evaluated the effectiveness of our animated visualization with a formal user study and demonstrated the advantages of animated visualization for studying time-varying datasets
ProjectionPathExplorer: Exploring Visual Patterns in Projected Decision-Making Paths
In problem-solving, a path towards solutions can be viewed as a sequence of
decisions. The decisions, made by humans or computers, describe a trajectory
through a high-dimensional representation space of the problem. By means of
dimensionality reduction, these trajectories can be visualized in
lower-dimensional space. Such embedded trajectories have previously been
applied to a wide variety of data, but analysis has focused almost exclusively
on the self-similarity of single trajectories. In contrast, we describe
patterns emerging from drawing many trajectories---for different initial
conditions, end states, and solution strategies---in the same embedding space.
We argue that general statements about the problem-solving tasks and solving
strategies can be made by interpreting these patterns. We explore and
characterize such patterns in trajectories resulting from human and
machine-made decisions in a variety of application domains: logic puzzles
(Rubik's cube), strategy games (chess), and optimization problems (neural
network training). We also discuss the importance of suitably chosen
representation spaces and similarity metrics for the embedding.Comment: Final version; accepted for publication in the ACM TiiS Special Issue
on "Interactive Visual Analytics for Making Explainable and Accountable
Decisions
Visualization of dynamic multidimensional and hierarchical datasets
When it comes to tools and techniques designed to help understanding complex abstract data, visualization methods play a prominent role. They enable human operators to lever age their pattern finding, outlier detection, and questioning abilities to visually reason about a given dataset. Many methods exist that create suitable and useful visual represen tations of static abstract, non-spatial, data. However, for temporal abstract, non-spatial, datasets, in which the data changes and evolves through time, far fewer visualization tech niques exist. This thesis focuses on the particular cases of temporal hierarchical data representation via dynamic treemaps, and temporal high-dimensional data visualization via dynamic projec tions. We tackle the joint question of how to extend projections and treemaps to stably, accurately, and scalably handle temporal multivariate and hierarchical data. The literature for static visualization techniques is rich and the state-of-the-art methods have proven to be valuable tools in data analysis. Their temporal/dynamic counterparts, however, are not as well studied, and, until recently, there were few hierarchical and high-dimensional methods that explicitly took into consideration the temporal aspect of the data. In addi tion, there are few or no metrics to assess the quality of these temporal mappings, and even fewer comprehensive benchmarks to compare these methods. This thesis addresses the abovementioned shortcomings. For both dynamic treemaps and dynamic projections, we propose ways to accurately measure temporal stability; we eval uate existing methods considering the tradeoff between stability and visual quality; and we propose new methods that strike a better balance between stability and visual quality than existing state-of-the-art techniques. We demonstrate our methods with a wide range of real-world data, including an application of our new dynamic projection methods to support the analysis and classification of hyperkinetic movement disorder data.Quando se trata de ferramentas e técnicas projetadas para ajudar na compreensão dados abstratos complexos, métodos de visualização desempenham um papel proeminente. Eles permitem que os operadores humanos alavanquem suas habilidades de descoberta de padrões, detecção de valores discrepantes, e questionamento visual para a raciocinar sobre um determinado conjunto de dados. Existem muitos métodos que criam representações visuais adequadas e úteis de para dados estáticos, abstratos, e não-espaciais. No entanto, para dados temporais, abstratos, e não-espaciais, isto é, dados que mudam e evoluem no tempo, existem poucas técnicas apropriadas. Esta tese concentra-se nos casos especÃficos de representação temporal de dados hierárquicos por meio de treemaps dinâmicos, e visualização temporal de dados de alta dimen sionalidade via projeções dinâmicas. Nós abordar a questão conjunta de como estender projeções e treemaps de forma estável, precisa e escalável para lidar com conjuntos de dados hierárquico-temporais e multivariado-temporais. Em ambos os casos, a literatura para técnicas estáticas é rica e os métodos estado da arte provam ser ferramentas valiosas em análise de dados. Suas contrapartes temporais/dinâmicas, no entanto, não são tão bem estudadas e, até recentemente, existiam poucos métodos hierárquicos e de alta dimensão que explicitamente levavam em consideração o aspecto temporal dos dados. Além disso, existiam poucas métricas para avaliar a qualidade desses mapeamentos visuais temporais, e ainda menos benchmarks abrangentes para comparação esses métodos. Esta tese aborda as deficiências acima mencionadas para treemaps dinâmicos e projeções dinâmicas. Propomos maneiras de medir com precisão a estabilidade temporal; avalia mos os métodos existentes, considerando o compromisso entre estabilidade e qualidade visual; e propomos novos métodos que atingem um melhor equilÃbrio entre estabilidade e a qualidade visual do que as técnicas estado da arte atuais. Demonstramos nossos mé todos com uma ampla gama de dados do mundo real, incluindo uma aplicação de nossos novos métodos de projeção dinâmica para apoiar a análise e classificação dos dados de transtorno de movimentos
Advances in Data Mining Knowledge Discovery and Applications
Advances in Data Mining Knowledge Discovery and Applications aims to help data miners, researchers, scholars, and PhD students who wish to apply data mining techniques. The primary contribution of this book is highlighting frontier fields and implementations of the knowledge discovery and data mining. It seems to be same things are repeated again. But in general, same approach and techniques may help us in different fields and expertise areas. This book presents knowledge discovery and data mining applications in two different sections. As known that, data mining covers areas of statistics, machine learning, data management and databases, pattern recognition, artificial intelligence, and other areas. In this book, most of the areas are covered with different data mining applications. The eighteen chapters have been classified in two parts: Knowledge Discovery and Data Mining Applications