12 research outputs found

    FeatureExplorer: Interactive Feature Selection and Exploration of Regression Models for Hyperspectral Images

    Full text link
    Feature selection is used in machine learning to improve predictions, decrease computation time, reduce noise, and tune models based on limited sample data. In this article, we present FeatureExplorer, a visual analytics system that supports the dynamic evaluation of regression models and importance of feature subsets through the interactive selection of features in high-dimensional feature spaces typical of hyperspectral images. The interactive system allows users to iteratively refine and diagnose the model by selecting features based on their domain knowledge, interchangeable (correlated) features, feature importance, and the resulting model performance.Comment: To appear in IEEE VIS 2019 Short Paper

    Projections as visual aids for classification system design.

    Get PDF
    Dimensionality reduction is a compelling alternative for high-dimensional data visualization. This method provides insight into high-dimensional feature spaces by mapping relationships between observations (high-dimensional vectors) to low (two or three) dimensional spaces. These low-dimensional representations support tasks such as outlier and group detection based on direct visualization. Supervised learning, a subfield of machine learning, is also concerned with observations. A key task in supervised learning consists of assigning class labels to observations based on generalization from previous experience. Effective development of such classification systems depends on many choices, including features descriptors, learning algorithms, and hyperparameters. These choices are not trivial, and there is no simple recipe to improve classification systems that perform poorly. In this context, we first propose the use of visual representations based on dimensionality reduction (projections) for predictive feedback on classification efficacy. Second, we propose a projection-based visual analytics methodology, and supportive tooling, that can be used to improve classification systems through feature selection. We evaluate our proposal through experiments involving four datasets and three representative learning algorithms

    Dual analysis of DNA microarrays

    Get PDF
    Microarray data represents the expression levels of genes for different samples and for different conditions. It has been a central topic in bioinformatics research for a long time already. Researchers try to discover groups of genes that are responsible for specific biological processes. Statistical analysis tools and visualizations have been widely used in the analysis of microarray data. Researchers try to build hypotheses on both the genes and the samples. Therefore,such analyses require the joint exploration of the genes and the samples. However, current methods in interactive visual analysis fail to provide the necessary mechanisms for this joint analysis. In this paper, we propose an interactive visual analysis framework that enables the dual analysis of the samples and the genes through the use of integrated statistical tools. We introduce a set of specialized views and a detailed analysis procedure to describe the utilization of our framework

    Effective Visualization Approaches For Ultra-High Dimensional Datasets

    Get PDF
    Multivariate informational data, which are abstract as well as complex, are becoming increasingly common in many areas such as scientific, medical, social, business, and so on. Displaying and analyzing large amounts of multivariate data with more than three variables of different types is quite challenging. Visualization of such multivariate data suffers from a high degree of clutter when the numbers of dimensions/variables and data observations become too large. We propose multiple approaches to effectively visualize large datasets of ultrahigh number of dimensions by generalizing two standard multivariate visualization methods, namely star plot and parallel coordinates plot. We refine three variants of the star plot, which include overlapped star plot, shifted origin plot, and multilevel star plot by embedding distribution plots, displaying dataset in groups, and supporting adjustable positioning of the star axes. We introduce a bifocal parallel coordinates plot (BPCP) based on the focus + context approach. BPCP splits vertically the overall rendering area into the focus and context regions. The focus area maps a few selected dimensions of interest at sufficiently wide spacing. The remaining dimensions are represented in the context area in a compact way to retain useful information and provide the data continuity. The focus display can be further enriched with various options, such as axes overlays, scatterplot, and nested PCPs. In order to accommodate an arbitrarily large number of dimensions, the context display supports the multi-level stacked view. Finally, we present two innovative ways of enhancing parallel coordinates axes to better understand all variables and their interrelationships in high-dimensional datasets. Histogram and circle/ellipse plots based on uniform and non-uniform frequency/density mappings are adopted to visualize distributions of numerical and categorical data values. Color-mapped axis stripes are designed in the parallel coordinates layout so that correlations can be fully realized in the same display plot irrespective of axes locations. These colors are also propagated to histograms as stacked bars and categorical values as pie charts to further facilitate data exploration. By using the datasets consisting of 25 to 130 variables of different data types we have demonstrated effectiveness of the proposed multivariate visualization enhancements

    Doctor of Philosophy

    Get PDF
    dissertationWith the ever-increasing amount of available computing resources and sensing devices, a wide variety of high-dimensional datasets are being produced in numerous fields. The complexity and increasing popularity of these data have led to new challenges and opportunities in visualization. Since most display devices are limited to communication through two-dimensional (2D) images, many visualization methods rely on 2D projections to express high-dimensional information. Such a reduction of dimension leads to an explosion in the number of 2D representations required to visualize high-dimensional spaces, each giving a glimpse of the high-dimensional information. As a result, one of the most important challenges in visualizing high-dimensional datasets is the automatic filtration and summarization of the large exploration space consisting of all 2D projections. In this dissertation, a new type of algorithm is introduced to reduce the exploration space that identifies a small set of projections that capture the intrinsic structure of high-dimensional data. In addition, a general framework for summarizing the structure of quality measures in the space of all linear 2D projections is presented. However, identifying the representative or informative projections is only part of the challenge. Due to the high-dimensional nature of these datasets, obtaining insights and arriving at conclusions based solely on 2D representations are limited and prone to error. How to interpret the inaccuracies and resolve the ambiguity in the 2D projections is the other half of the puzzle. This dissertation introduces projection distortion error measures and interactive manipulation schemes that allow the understanding of high-dimensional structures via data manipulation in 2D projections

    Análise visual aplicada à análise de imagens

    Get PDF
    Orientadores: Alexandre Xavier Falcão, Alexandru Cristian Telea, Pedro Jussieu de Rezende, Johannes Bernardus Theodorus Maria RoerdinkTese (doutorado) - Universidade Estadual de Campinas, Instituto de Computação e Universidade de GroningenResumo: Análise de imagens é o campo de pesquisa preocupado com a extração de informações a partir de imagens. Esse campo é bastante importante para aplicações científicas e comerciais. O objetivo principal do trabalho apresentado nesta tese é permitir interatividade com o usuário durante várias tarefas relacionadas à análise de imagens: segmentação, seleção de atributos, e classificação. Neste contexto, permitir interatividade com o usuário significa prover mecanismos que tornem possível que humanos auxiliem computadores em tarefas que são de difícil automação. Com respeito à segmentação de imagens, propomos uma nova técnica interativa que combina superpixels com a transformada imagem-floresta. A vantagem principal dessa técnica é permitir rápida segmentação interativa de imagens grandes, além de permitir extração de características potencialmente mais ricas. Os experimentos sugerem que nossa técnica é tão eficaz quanto a alternativa baseada em pixels. No contexto de seleção de atributos e classificação, propomos um novo sistema de visualização interativa que combina exploração do espaço de atributos (baseada em redução de dimensionalidade) com avaliação automática de atributos. Esse sistema tem como objetivo revelar informações que levem ao desenvolvimento de conjuntos de atributos eficazes para classificação de imagens. O mesmo sistema também pode ser aplicado para seleção de atributos para segmentação de imagens e para classificação de padrões, apesar dessas tarefas não serem nosso foco. Apresentamos casos de uso que mostram como esse sistema pode prover certos tipos de informação qualitativa sobre sistemas de classificação de imagens que seriam difíceis de obter por outros métodos. Também mostramos como o sistema interativo proposto pode ser adaptado para a exploração de resultados computacionais intermediários de redes neurais artificiais. Essas redes atualmente alcançam resultados no estado da arte em muitas aplicações de classificação de imagens. Através de casos de uso envolvendo conjuntos de dados de referência, mostramos que nosso sistema pode prover informações sobre como uma rede opera que levam a melhorias em sistemas de classificação. Já que os parâmetros de uma rede neural artificial são tipicamente adaptados iterativamente, a visualização de seus resultados intermediários pode ser vista como uma tarefa dependente de tempo. Com base nessa perspectiva, propomos uma nova técnica de redução de dimensionalidade dependente de tempo que permite a redução de mudanças desnecessárias nos resultados causadas por pequenas mudanças nos dados. Experimentos preliminares mostram que essa técnica é eficaz em manter a coerência temporal desejadaAbstract: We define image analysis as the field of study concerned with extracting information from images. This field is immensely important for commercial and interdisciplinary applications. The overarching goal behind the work presented in this thesis is enabling user interaction during several tasks related to image analysis: image segmentation, feature selection, and image classification. In this context, enabling user interaction refers to providing mechanisms that allow humans to assist machines in tasks that are difficult to automate. Such tasks are very common in image analysis. Concerning image segmentation, we propose a new interactive technique that combines superpixels with the image foresting transform. The main advantage of our proposed technique is enabling faster interactive segmentation of large images, although it also enables potentially richer feature extraction. Our experiments show that our technique is at least as effective as its pixel-based counterpart. In the context of feature selection and image classification, we propose a new interactive visualization system that combines feature space exploration (based on dimensionality reduction) with automatic feature scoring. This visualization system aims to provide insights that lead to the development of effective feature sets for image classification. The same system can also be applied to select features for image segmentation and (general) pattern classification, although these tasks are not our focus. We present use cases that show how this system may provide a kind of qualitative feedback about image classification systems that would be very difficult to obtain by other (non-visual) means. We also show how our proposed interactive visualization system can be adapted to explore intermediary computational results of artificial neural networks. Such networks currently achieve state-of-the-art results in many image classification applications. Through use cases involving traditional benchmark datasets, we show that our system may enable insights about how a network operates that lead to improvements along the classification pipeline. Because the parameters of an artificial neural network are typically adapted iteratively, visualizing its intermediary computational results can be seen as a time-dependent task. Motivated by this, we propose a new time-dependent dimensionality reduction technique that enables the reduction of apparently unnecessary changes in results due to small changes in the data (temporal coherence). Preliminary experiments show that this technique is effective in enforcing temporal coherenceDoutoradoCiência da ComputaçãoDoutor em Ciência da Computação2012/24121-9;FAPESPCAPE

    Visualizing multidimensional data similarities:Improvements and applications

    Get PDF
    Multidimensional data is increasingly more prominent and important in many application domains. Such data typically consist of a large set of elements, each of which described by several measurements (dimensions). During the design of techniques and tools to process this data, a key component is to gather insights into their structure and patterns, which can be described by the notion of similarity between elements. Among these techniques, multidimensional projections and similarity trees can effectively capture similarity patterns and handle a large number of data elements and dimensions. However, understanding and interpreting these patterns in terms of the original data dimensions is still hard. This thesis addresses the development of visual explanatory techniques for the easy interpretation of similarity patterns present in multidimensional projections and similarity trees, by several contributions. First, we propose methods that make the computation of similarity trees efficient for large datasets, and also enhance its visual representation to allow the exploration of more data in a limited screen. Secondly, we propose methods for the visual explanation of multidimensional projections in terms of groups of similar elements. These are automatically annotated to describe which dimensions are more important to define their notion of group similarity. We show next how these explanatory mechanisms can be adapted to handle both static and time-dependent data. Our proposed techniques are designed to be easy to use, work nearly automatically, and are demonstrated on a variety of real-world large data obtained from image collections, text archives, scientific measurements, and software engineering
    corecore