10,278 research outputs found

    Interactive data exploration with targeted projection pursuit

    Get PDF
    Data exploration is a vital, but little considered, part of the scientific process; but few visualisation tools can cope with truly complex data. Targeted Projection Pursuit (TPP) is an interactive data exploration technique that provides an intuitive and transparent interface for data exploration. A prototype has been evaluated quantitatively and found to outperform algorithmic techniques on standard visual analysis tasks

    The use of linear projections in the visual analysis of signals in an indoor optical wireless link

    Get PDF

    Node-attribute graph layout for small-world networks

    Get PDF
    Small-world networks are a very commonly occurring type of graph in the real-world, which exhibit a clustered structure that is not well represented by current graph layout algorithms. In many cases we also have information about the nodes in such graphs, which are typically depicted on the graph as node colour, shape or size. Here we demonstrate that these attributes can instead be used to layout the graph in high-dimensional data space. Then using a dimension reduction technique, targeted projection pursuit, the graph layout can be optimised for displaying clustering. The technique out-performs force-directed layout methods in cluster separation when applied to a sample, artificially generated, small-world network

    Interactive visual data exploration with subjective feedback : an information-theoretic approach

    Get PDF
    Visual exploration of high-dimensional real-valued datasets is a fundamental task in exploratory data analysis (EDA). Existing projection methods for data visualization use predefined criteria to choose the representation of data. There is a lack of methods that (i) use information on what the user has learned from the data and (ii) show patterns that she does not know yet. We construct a theoretical model where identified patterns can be input as knowledge to the system. The knowledge syntax here is intuitive, such as "this set of points forms a cluster", and requires no knowledge of maths. This background knowledge is used to find a maximum entropy distribution of the data, after which the user is provided with data projections for which the data and the maximum entropy distribution differ the most, hence showing the user aspects of data that are maximally informative given the background knowledge. We study the computational performance of our model and present use cases on synthetic and real data. We find that the model allows the user to learn information efficiently from various data sources and works sufficiently fast in practice. In addition, we provide an open source EDA demonstrator system implementing our model with tailored interactive visualizations. We conclude that the information theoretic approach to EDA where patterns observed by a user are formalized as constraints provides a principled, intuitive, and efficient basis for constructing an EDA system.Peer reviewe

    Doctor of Philosophy

    Get PDF
    dissertationWith the ever-increasing amount of available computing resources and sensing devices, a wide variety of high-dimensional datasets are being produced in numerous fields. The complexity and increasing popularity of these data have led to new challenges and opportunities in visualization. Since most display devices are limited to communication through two-dimensional (2D) images, many visualization methods rely on 2D projections to express high-dimensional information. Such a reduction of dimension leads to an explosion in the number of 2D representations required to visualize high-dimensional spaces, each giving a glimpse of the high-dimensional information. As a result, one of the most important challenges in visualizing high-dimensional datasets is the automatic filtration and summarization of the large exploration space consisting of all 2D projections. In this dissertation, a new type of algorithm is introduced to reduce the exploration space that identifies a small set of projections that capture the intrinsic structure of high-dimensional data. In addition, a general framework for summarizing the structure of quality measures in the space of all linear 2D projections is presented. However, identifying the representative or informative projections is only part of the challenge. Due to the high-dimensional nature of these datasets, obtaining insights and arriving at conclusions based solely on 2D representations are limited and prone to error. How to interpret the inaccuracies and resolve the ambiguity in the 2D projections is the other half of the puzzle. This dissertation introduces projection distortion error measures and interactive manipulation schemes that allow the understanding of high-dimensional structures via data manipulation in 2D projections

    Using adjacency matrices to lay out larger small-world networks

    Get PDF
    Many networks exhibit small-world properties. The structure of a small-world network is characterized by short average path lengths and high clustering coefficients. Few graph layout methods capture this structure well which limits their effectiveness and the utility of the visualization itself. Here we present an extension to our novel graphTPP layout method for laying out small-world networks using only their topological properties rather than their node attributes. The Watts–Strogatz model is used to generate a variety of graphs with a small-world network structure. Community detection algorithms are used to generate six different clusterings of the data. These clusterings, the adjacency matrix and edgelist are loaded into graphTPP and, through user interaction combined with linear projections of the adjacency matrix, graphTPP is able to produce a layout which visually separates these clusters. These layouts are compared to the layouts of two force-based techniques. graphTPP is able to clearly separate each of the communities into a spatially distinct area and the edge relationships between the clusters show the strength of their relationship. As a secondary contribution, an edge-grouping algorithm for graphTPP is demonstrated as a means to reduce visual clutter in the layout and reinforce the display of the strength of the relationship between two communities

    Bagged projection methods for supervised classification in big data

    Get PDF
    Classification methods are widely used for types problems where rules to sort observations into groups are needed. There are many different methods to fit classification models but nothing is universally best. This research develops new classification methods, and visual tools for exploring the algorithms and results introduced in this work. The new classification method is a random forest built on trees using linear combinations of variables, which improves the predictive performance when the separation between classes is in combinations of variables. It is called a projection pursuit random forest (PPF). The benefit of the method is demonstrated using a simulation study, and on a suite of benchmark data. It is implemented in the R package, PPforest, with core functions in Rcpp to improve the computational speed. The process of bagging and combining results from multiple trees produces numerous diagnostics which, with interactive graphics, can provide a lot of insight into the class structure in high dimensions. A web app is designed and developed for this purpose. In the process of developing the PPF some deficiencies were observed in the tree algorithm, PPtree, forming the basic building block. This led to modifications to the algorithm, implemented in the R package, PPtreeExt, and a small web app to help digest differences between various model parameter choices

    A Constrained Randomization Approach to Interactive Visual Data Exploration with Subjective Feedback

    Get PDF
    Data visualization and iterative/interactive data mining are growing rapidly in attention, both in research as well as in industry. However, while there are a plethora of advanced data mining methods and lots of works in the field of visualization, integrated methods that combine advanced visualization and/or interaction with data mining techniques in a principled way are rare. We present a framework based on constrained randomization which lets users explore high-dimensional data via 'subjectively informative' two-dimensional data visualizations. The user is presented with 'interesting' projections, allowing users to express their observations using visual interactions that update a background model representing the user's belief state. This background model is then considered by a projection-finding algorithm employing data randomization to compute a new 'interesting' projection. By providing users with information that contrasts with the background model, we maximize the chance that the user encounters striking new information present in the data. This process can be iterated until the user runs out of time or until the difference between the randomized and the real data is insignificant. We present two case studies, one controlled study on synthetic data and another on census data, using the proof-of-concept tool SIDE that demonstrates the presented framework.Peer reviewe
    • 

    corecore