3 research outputs found

    Visual Analysis of High-Dimensional Point Clouds using Topological Abstraction

    Get PDF
    This thesis is about visualizing a kind of data that is trivial to process by computers but difficult to imagine by humans because nature does not allow for intuition with this type of information: high-dimensional data. Such data often result from representing observations of objects under various aspects or with different properties. In many applications, a typical, laborious task is to find related objects or to group those that are similar to each other. One classic solution for this task is to imagine the data as vectors in a Euclidean space with object variables as dimensions. Utilizing Euclidean distance as a measure of similarity, objects with similar properties and values accumulate to groups, so-called clusters, that are exposed by cluster analysis on the high-dimensional point cloud. Because similar vectors can be thought of as objects that are alike in terms of their attributes, the point cloud\''s structure and individual cluster properties, like their size or compactness, summarize data categories and their relative importance. The contribution of this thesis is a novel analysis approach for visual exploration of high-dimensional point clouds without suffering from structural occlusion. The work is based on implementing two key concepts: The first idea is to discard those geometric properties that cannot be preserved and, thus, lead to the typical artifacts. Topological concepts are used instead to shift away the focus from a point-centered view on the data to a more structure-centered perspective. The advantage is that topology-driven clustering information can be extracted in the data\''s original domain and be preserved without loss in low dimensions. The second idea is to split the analysis into a topology-based global overview and a subsequent geometric local refinement. The occlusion-free overview enables the analyst to identify features and to link them to other visualizations that permit analysis of those properties not captured by the topological abstraction, e.g. cluster shape or value distributions in particular dimensions or subspaces. The advantage of separating structure from data point analysis is that restricting local analysis only to data subsets significantly reduces artifacts and the visual complexity of standard techniques. That is, the additional topological layer enables the analyst to identify structure that was hidden before and to focus on particular features by suppressing irrelevant points during local feature analysis. This thesis addresses the topology-based visual analysis of high-dimensional point clouds for both the time-invariant and the time-varying case. Time-invariant means that the points do not change in their number or positions. That is, the analyst explores the clustering of a fixed and constant set of points. The extension to the time-varying case implies the analysis of a varying clustering, where clusters appear as new, merge or split, or vanish. Especially for high-dimensional data, both tracking---which means to relate features over time---but also visualizing changing structure are difficult problems to solve

    Topological visual analysis of clusterings in high-dimensional information spaces

    No full text

    An Effective Approach to Predicting Large Dataset in Spatial Data Mining Area

    Get PDF
    Due to enormous quantities of spatial satellite images, telecommunication images, health related tools etc., it is often impractical for users to have detailed and thorough examination of spatial data (S). Large dataset is very common and pervasive in a number of application areas. Discovering or predicting patterns from these datasets is very vital. This research focused on developing new methods, models and techniques for accomplishing advanced spatial data mining (ASDM) tasks. The algorithms were designed to challenge state-of-the-art data technologies and they are tested with randomly generated and actual real-world data. Two main approaches were adopted to achieve the objectives (1) identifying the actual data types (DTs), data structures and spatial content of a given dataset (to make our model versatile and robust) and (2) integrating these data types into an appropriate database management system (DBMS) framework, for easy management and manipulation. These two approaches helped to discover the general and varying types of patterns that exist within any given dataset non-spatial, spatial or even temporal (because spatial data are always influenced by temporal agents) datasets. An iterative method was adopted for system development methodology in this study. The method was adopted as a strategy to combat the irregularity that often exists within spatial datasets. In the course of this study, some of the challenges we encountered which also doubled as current challenges facing spatial data mining includes: (a) time complexity in availing useful data for analysis, (b) time complexity in loading data to storage and (c) difficulties in discovering spatial, non-spatial and temporal correlations between different data objects. However, despite the above challenges, there are some opportunities that spatial data can benefit from including: Cloud computing, Spark technology, Parallelisation, and Bulk-loading methods. Techniques and application areas of spatial data mining (SDM) were identified and their strength and limitations were equally documented. Finally, new methods and algorithms for mining very large data of spatial/non-spatial bias were created. The proposed models/systems are documented in the sections as follows: (a) Development of a new technique for parallel indexing of large dataset (PaX-DBSCAN), (b) Development of new techniques for clustering (X-DBSCAN) in a learning process, (c) Development of a new technique for detecting human skin in an image, (d) Development of a new technique for finding face in an image, (e) Development of a novel technique for management of large spatial and non-spatial datasets (aX-tree). The most prominent among our methods is the new structure used in (c) above -- packed maintained k-dimensional tree (Pmkd-tree), for fast spatial indexing and querying. The structure is a combination system that combines all the proposed algorithms to produce one solid, standard, useful and quality system. The intention of the new final algorithm (system) is to combine the entire initial proposed algorithms to come up with one strong generic effective tool for predicting large dataset SDM area, which it is capable of finding patterns that exist among spatial or non-spatial objects in a DBMS. In addition to Pmkd-tree, we also implemented a novel spatial structure, packed quad-tree (Pquad-Tree), to balance and speed up the performance of the regular quad-tree. Our systems so far have shown a manifestation of efficiency in terms of performance, storage and speed. The final Systems (Pmkd-tree and Pquad-Tree) are generic systems that are flexible, robust, light and stable. They are explicit spatial models for analysing any given problem and for predicting objects as spatially distributed events, using basic SDM algorithms. They can be applied to pattern matching, image processing, computer vision, bioinformatics, information retrieval, machine learning (classification and clustering) and many other computational tasks
    corecore