    Effective Visualization Approaches For Ultra-High Dimensional Datasets

    Multivariate informational data, which are abstract as well as complex, are becoming increasingly common in many areas such as scientific, medical, social, business, and so on. Displaying and analyzing large amounts of multivariate data with more than three variables of different types is quite challenging. Visualization of such multivariate data suffers from a high degree of clutter when the numbers of dimensions/variables and data observations become too large. We propose multiple approaches to effectively visualize large datasets of ultrahigh number of dimensions by generalizing two standard multivariate visualization methods, namely star plot and parallel coordinates plot. We refine three variants of the star plot, which include overlapped star plot, shifted origin plot, and multilevel star plot by embedding distribution plots, displaying dataset in groups, and supporting adjustable positioning of the star axes. We introduce a bifocal parallel coordinates plot (BPCP) based on the focus + context approach. BPCP splits vertically the overall rendering area into the focus and context regions. The focus area maps a few selected dimensions of interest at sufficiently wide spacing. The remaining dimensions are represented in the context area in a compact way to retain useful information and provide the data continuity. The focus display can be further enriched with various options, such as axes overlays, scatterplot, and nested PCPs. In order to accommodate an arbitrarily large number of dimensions, the context display supports the multi-level stacked view. Finally, we present two innovative ways of enhancing parallel coordinates axes to better understand all variables and their interrelationships in high-dimensional datasets. Histogram and circle/ellipse plots based on uniform and non-uniform frequency/density mappings are adopted to visualize distributions of numerical and categorical data values. Color-mapped axis stripes are designed in the parallel coordinates layout so that correlations can be fully realized in the same display plot irrespective of axes locations. These colors are also propagated to histograms as stacked bars and categorical values as pie charts to further facilitate data exploration. By using the datasets consisting of 25 to 130 variables of different data types we have demonstrated effectiveness of the proposed multivariate visualization enhancements

    dissertationCorrelation is a powerful relationship measure used in many fields to estimate trends and make forecasts. When the data are complex, large, and high dimensional, correlation identification is challenging. Several visualization methods have been proposed to solve these problems, but they all have limitations in accuracy, speed, or scalability. In this dissertation, we propose a methodology that provides new visual designs that show details when possible and aggregates when necessary, along with robust interactive mechanisms that together enable quick identification and investigation of meaningful relationships in large and high-dimensional data. We propose four techniques using this methodology. Depending on data size and dimensionality, the most appropriate visualization technique can be provided to optimize the analysis performance. First, to improve correlation identification tasks between two dimensions, we propose a new correlation task-specific visualization method called correlation coordinate plot (CCP). CCP transforms data into a powerful coordinate system for estimating the direction and strength of correlations among dimensions. Next, we propose three visualization designs to optimize correlation identification tasks in large and multidimensional data. The first is snowflake visualization (Snowflake), a focus+context layout for exploring all pairwise correlations. The next proposed design is a new interactive design for representing and exploring data relationships in parallel coordinate plots (PCPs) for large data, called data scalable parallel coordinate plots (DSPCP). Finally, we propose a novel technique for storing and accessing the multiway dependencies through visualization (MultiDepViz). We evaluate these approaches by using various use cases, compare them to prior work, and generate user studies to demonstrate how our proposed approaches help users explore correlation in large data efficiently. Our results confirmed that CCP/Snowflake, DSPCP, and MultiDepViz methods outperform some current visualization techniques such as scatterplots (SCPs), PCPs, SCP matrix, Corrgram, Angular Histogram, and UntangleMap in both accuracy and timing. Finally, these approaches are applied in real-world applications such as a debugging tool, large-scale code performance data, and large-scale climate data

    Hybrid visualizations for data exploration

    Information Visualization (Infovis) graphically encodes information to help a user explore a data set visually and interactively. This graphical encoding can take the form of widespread visualizations such as bar charts and scatterplots. Multiple visualizations can share the same functional space to form complete tools for visual exploration or for communicating information. There is multiple ways of combining these visualizations. The assembly of multiple visualizations can give some complex assemblies sometimes called hybrid visualizations. A hybrid visualization is the result of assembling multiple simpler visualizations. For example, NodeTrix (Henry et al., 2007a) is composed of a node-link diagram and an adjacency matrix, and MatLink (Henry and Fekete, 2007a) adds arc links to an adjacency matrix. This integration of multiple visualizations can be a way to combine their advantages into a coherent structure. The integration can be achieved, for example, through color coding, or through explicit linking (such as with arrows), or through interaction (such as when different visualizations respond to the manipulation of others). Recent literature contains several examples of new hybrid visualizations, most often to deal with complex datasets where the user can benefit from multiple, complementary visual encodings of the same data. However, to date, there is almost no theory or framework to help researchers understand and characterize existing hybrids or design new ones. This thesis advances the state of the art in hybrid visualizations in two ways: first, by developing a framework that defines and characterizes hybrid visualizations to help better identify, describe and design them, and second, by demonstrating a variety of novel hybrids. The hybrid visualizations we explored cover a wide range of possibilities. Two of the most general and widely used data types in Infovis, multidimensional multivariate data and graph (i.e., network) data, are each the subject of a chapter in the thesis, with novel hybrid visualization techniques presented for each. A wide range of possibilities for integration is also presented using a pipeline model. After some preliminary material, chapter 2 of the thesis presents a conceptual framework that defines and characterizes hybrid visualizations. This framework was itself derived from experience designing the hybrid visualizations presented in the subsequent chapters. A hybrid visualization is described as a graphical encoding using other visualizations as building blocks. We present a pipeline to illustrate the assembly of a visualization, starting from the generation of basic shapes or glyphs, then placed on a layout, embellished by adding other graphical elements, then sent to some view transform operators and assembled on the same space. Simple charts can be described with this pipeline as well as more complex assembly and new hybrids are described. Chapter 3 presents ConnectedCharts, an example of a hybrid assembled on the assembly level of the pipeline, made of multiple multidimensional and multivariate charts explicitly connected by lines or curves showing the relationship between their elements. A user interface enables the interactive assembly of ConnectedCharts, including a wide range of previously-published hybrid visualizations, as well as novel hybrid arrangements. ConnectedCharts serve as an illustration of the conceptual framework in chapter 2, by exploring possible connections between different graphics depending on the relationship of their encoded data types. Chapter 4 presents another user interface, this time for graph exploration, that incorporates several highly integrated hybrid visualizations. A Parallel Scatter Plot Matrix (P-SPLOM) is presented that constitutes a fusion of a Scatter Plot Matrix (SPLOM) and a Parallel Coordinates Plot (PCP). A radial menu called the FlowVizMenu enables the modification of a visualization integrated at the center of the menu. This menu is also used to select the dimensions for configuring a third hybrid based on an Attribute-Driven Layout (ADL) that combines a nodelink diagram and a scatterplot. The characterization of hybrid visualizations offered by the conceptual framework, as well as the illustration of the framework by innovative hybrid visualizations, are the main contributions of this thesis to the Infovis community

    Análise visual aplicada à análise de imagens

    Orientadores: Alexandre Xavier Falcão, Alexandru Cristian Telea, Pedro Jussieu de Rezende, Johannes Bernardus Theodorus Maria RoerdinkTese (doutorado) - Universidade Estadual de Campinas, Instituto de Computação e Universidade de GroningenResumo: Análise de imagens é o campo de pesquisa preocupado com a extração de informações a partir de imagens. Esse campo é bastante importante para aplicações científicas e comerciais. O objetivo principal do trabalho apresentado nesta tese é permitir interatividade com o usuário durante várias tarefas relacionadas à análise de imagens: segmentação, seleção de atributos, e classificação. Neste contexto, permitir interatividade com o usuário significa prover mecanismos que tornem possível que humanos auxiliem computadores em tarefas que são de difícil automação. Com respeito à segmentação de imagens, propomos uma nova técnica interativa que combina superpixels com a transformada imagem-floresta. A vantagem principal dessa técnica é permitir rápida segmentação interativa de imagens grandes, além de permitir extração de características potencialmente mais ricas. Os experimentos sugerem que nossa técnica é tão eficaz quanto a alternativa baseada em pixels. No contexto de seleção de atributos e classificação, propomos um novo sistema de visualização interativa que combina exploração do espaço de atributos (baseada em redução de dimensionalidade) com avaliação automática de atributos. Esse sistema tem como objetivo revelar informações que levem ao desenvolvimento de conjuntos de atributos eficazes para classificação de imagens. O mesmo sistema também pode ser aplicado para seleção de atributos para segmentação de imagens e para classificação de padrões, apesar dessas tarefas não serem nosso foco. Apresentamos casos de uso que mostram como esse sistema pode prover certos tipos de informação qualitativa sobre sistemas de classificação de imagens que seriam difíceis de obter por outros métodos. Também mostramos como o sistema interativo proposto pode ser adaptado para a exploração de resultados computacionais intermediários de redes neurais artificiais. Essas redes atualmente alcançam resultados no estado da arte em muitas aplicações de classificação de imagens. Através de casos de uso envolvendo conjuntos de dados de referência, mostramos que nosso sistema pode prover informações sobre como uma rede opera que levam a melhorias em sistemas de classificação. Já que os parâmetros de uma rede neural artificial são tipicamente adaptados iterativamente, a visualização de seus resultados intermediários pode ser vista como uma tarefa dependente de tempo. Com base nessa perspectiva, propomos uma nova técnica de redução de dimensionalidade dependente de tempo que permite a redução de mudanças desnecessárias nos resultados causadas por pequenas mudanças nos dados. Experimentos preliminares mostram que essa técnica é eficaz em manter a coerência temporal desejadaAbstract: We define image analysis as the field of study concerned with extracting information from images. This field is immensely important for commercial and interdisciplinary applications. The overarching goal behind the work presented in this thesis is enabling user interaction during several tasks related to image analysis: image segmentation, feature selection, and image classification. In this context, enabling user interaction refers to providing mechanisms that allow humans to assist machines in tasks that are difficult to automate. Such tasks are very common in image analysis. Concerning image segmentation, we propose a new interactive technique that combines superpixels with the image foresting transform. The main advantage of our proposed technique is enabling faster interactive segmentation of large images, although it also enables potentially richer feature extraction. Our experiments show that our technique is at least as effective as its pixel-based counterpart. In the context of feature selection and image classification, we propose a new interactive visualization system that combines feature space exploration (based on dimensionality reduction) with automatic feature scoring. This visualization system aims to provide insights that lead to the development of effective feature sets for image classification. The same system can also be applied to select features for image segmentation and (general) pattern classification, although these tasks are not our focus. We present use cases that show how this system may provide a kind of qualitative feedback about image classification systems that would be very difficult to obtain by other (non-visual) means. We also show how our proposed interactive visualization system can be adapted to explore intermediary computational results of artificial neural networks. Such networks currently achieve state-of-the-art results in many image classification applications. Through use cases involving traditional benchmark datasets, we show that our system may enable insights about how a network operates that lead to improvements along the classification pipeline. Because the parameters of an artificial neural network are typically adapted iteratively, visualizing its intermediary computational results can be seen as a time-dependent task. Motivated by this, we propose a new time-dependent dimensionality reduction technique that enables the reduction of apparently unnecessary changes in results due to small changes in the data (temporal coherence). Preliminary experiments show that this technique is effective in enforcing temporal coherenceDoutoradoCiência da ComputaçãoDoutor em Ciência da Computação2012/24121-9;FAPESPCAPE