229,279 research outputs found

    Interactive Techniques and Exploratory Spatial Data Analysis

    Get PDF
    This chapter reviews the ideas behind interactive and exploratory spatial data analysis and their relation to GIS. Three important aspects are considered. First, an overview is presented of the principles behind interactive spatial data analysis, based on insights from the use of dynamic graphics in statistics and their extension to spatial data. This is followed by a review of spatialised exploratory data analysis (EDA) techniques, that is, ways in which a spatial representation can be given to standard EDA tools by associating them with particular locations or spatial subsets of the data. The third aspect covers the main ideas behind true exploratory spatial data analysis, emphasising the concern with visualising spatial distributions and local patterns of spatial autocorrelation. The geostatistical perspective is considered, typically taken in the physical sciences, as well as the lattice perspective, more familiar in the social sciences. The chapter closes with a brief discussion of implementation issues and future directions

    Enabling Interactive Analytics of Secure Data using Cloud Kotta

    Full text link
    Research, especially in the social sciences and humanities, is increasingly reliant on the application of data science methods to analyze large amounts of (often private) data. Secure data enclaves provide a solution for managing and analyzing private data. However, such enclaves do not readily support discovery science---a form of exploratory or interactive analysis by which researchers execute a range of (sometimes large) analyses in an iterative and collaborative manner. The batch computing model offered by many data enclaves is well suited to executing large compute tasks; however it is far from ideal for day-to-day discovery science. As researchers must submit jobs to queues and wait for results, the high latencies inherent in queue-based, batch computing systems hinder interactive analysis. In this paper we describe how we have augmented the Cloud Kotta secure data enclave to support collaborative and interactive analysis of sensitive data. Our model uses Jupyter notebooks as a flexible analysis environment and Python language constructs to support the execution of arbitrary functions on private data within this secure framework.Comment: To appear in Proceedings of Workshop on Scientific Cloud Computing, Washington, DC USA, June 2017 (ScienceCloud 2017), 7 page

    explorase: Multivariate Exploratory Analysis and Visualization for Systems Biology

    Get PDF
    The datasets being produced by high-throughput biological experiments, such as microarrays, have forced biologists to turn to sophisticated statistical analysis and visualization tools in order to understand their data. We address the particular need for an open-source exploratory data analysis tool that applies numerical methods in coordination with interactive graphics to the analysis of experimental data. The software package, known as explorase, provides a graphical user interface (GUI) on top of the R platform for statistical computing and the GGobi software for multivariate interactive graphics. The GUI is designed for use by biologists, many of whom are unfamiliar with the R language. It displays metadata about experimental design and biological entities in tables that are sortable and filterable. There are menu shortcuts to the analysis methods implemented in R, including graphical interfaces to linear modeling tools. The GUI is linked to data plots in GGobi through a brush tool that simultaneously colors rows in the entity information table and points in the GGobi plots.

    An intelligent assistant for exploratory data analysis

    Get PDF
    In this paper we present an account of the main features of SNOUT, an intelligent assistant for exploratory data analysis (EDA) of social science survey data that incorporates a range of data mining techniques. EDA has much in common with existing data mining techniques: its main objective is to help an investigator reach an understanding of the important relationships ina data set rather than simply develop predictive models for selectd variables. Brief descriptions of a number of novel techniques developed for use in SNOUT are presented. These include heuristic variable level inference and classification, automatic category formation, the use of similarity trees to identify groups of related variables, interactive decision tree construction and model selection using a genetic algorithm

    DataHub: Science data management in support of interactive exploratory analysis

    Get PDF
    The DataHub addresses four areas of significant needs: scientific visualization and analysis; science data management; interactions in a distributed, heterogeneous environment; and knowledge-based assistance for these functions. The fundamental innovation embedded within the DataHub is the integration of three technologies, viz. knowledge-based expert systems, science visualization, and science data management. This integration is based on a concept called the DataHub. With the DataHub concept, science investigators are able to apply a more complete solution to all nodes of a distributed system. Both computational nodes and interactives nodes are able to effectively and efficiently use the data services (access, retrieval, update, etc), in a distributed, interdisciplinary information system in a uniform and standard way. This allows the science investigators to concentrate on their scientific endeavors, rather than to involve themselves in the intricate technical details of the systems and tools required to accomplish their work. Thus, science investigators need not be programmers. The emphasis on the definition and prototyping of system elements with sufficient detail to enable data analysis and interpretation leading to information. The DataHub includes all the required end-to-end components and interfaces to demonstrate the complete concept
    corecore