48 research outputs found
Recommended from our members
Visualization Support to Interactive Cluster Analysis
We demonstrate interactive visual embedding of partition-based clustering of multidimensional data using methods from the open-source machine learning library Weka. According to the visual analytics paradigm, knowledge is gradually built and refined by a human analyst through iterative application of clustering with different parameter settings and to different data subsets. To show clustering results to the analyst, cluster membership is typically represented by color coding. Our tools support the color consistency between different steps of the process. We shall demonstrate two-way clustering of spatial time series, in which clustering will be applied to places and to time steps
ICAP: An Interactive Cluster Analysis Procedure for analyzing remotely sensed data
An Interactive Cluster Analysis Procedure (ICAP) was developed to derive classifier training statistics from remotely sensed data. The algorithm interfaces the rapid numerical processing capacity of a computer with the human ability to integrate qualitative information. Control of the clustering process alternates between the algorithm, which creates new centroids and forms clusters and the analyst, who evaluate and elect to modify the cluster structure. Clusters can be deleted or lumped pairwise, or new centroids can be added. A summary of the cluster statistics can be requested to facilitate cluster manipulation. The ICAP was implemented in APL (A Programming Language), an interactive computer language. The flexibility of the algorithm was evaluated using data from different LANDSAT scenes to simulate two situations: one in which the analyst is assumed to have no prior knowledge about the data and wishes to have the clusters formed more or less automatically; and the other in which the analyst is assumed to have some knowledge about the data structure and wishes to use that information to closely supervise the clustering process. For comparison, an existing clustering method was also applied to the two data sets
Interactive spatiotemporal cluster analysis of vast challenge 2008 datasets
We describe a visual analytics method supporting the analysis of two different types of spatio-temporal data, point events and trajectories of moving agents. The method combines clustering with interactive visual displays, in particular, map and space-time cube. We demonstrate the use of the method by applying it to two datasets from the VAST Challenge 2008: evacuation traces (trajectories of people movement) and landings and interdictions of migrant boats (point events)
SEQOPTICS: a protein sequence clustering system
BACKGROUND: Protein sequence clustering has been widely used as a part of the analysis of protein structure and function. In most cases single linkage or graph-based clustering algorithms have been applied. OPTICS (Ordering Points To Identify the Clustering Structure) is an attractive approach due to its emphasis on visualization of results and support for interactive work, e.g., in choosing parameters. However, OPTICS has not been used, as far as we know, for protein sequence clustering. RESULTS: In this paper, a system of clustering proteins, SEQOPTICS (SEQuence clustering with OPTICS) is demonstrated. The system is implemented with Smith-Waterman as protein distance measurement and OPTICS at its core to perform protein sequence clustering. SEQOPTICS is tested with four data sets from different data sources. Visualization of the sequence clustering structure is demonstrated as well. CONCLUSION: The system was evaluated by comparison with other existing methods. Analysis of the results demonstrates that SEQOPTICS performs better based on some evaluation criteria including Jaccard coefficient, Precision, and Recall. It is a promising protein sequence clustering method with future possible improvement on parallel computing and other protein distance measurements
Incremental procedures for partitioning highly intermixed multi-class datasets into hyper-spherical and hyper-ellipsoidal clusters
Two procedures for partitioning large collections of highly intermixed datasets of different classes into a number of hyper-spherical or hyper-ellipsoidal clusters are presented. The incremental procedures are to generate a minimum numbers of hyper-spherical or hyper-ellipsoidal clusters with each cluster containing a maximum number of data points of the same class. The procedures extend the move-to-front algorithms originally designed for construction of minimum sized enclosing balls or ellipsoids for dataset of a single class. The resulting clusters of the dataset can be used for data modeling, outlier detection, discrimination analysis, and knowledge discovery
Recommended from our members
Integrating cluster formation and cluster evaluation in interactive visual analysis
Cluster analysis is a popular method for data investigation where data items are structured into groups called clusters. This analysis involves two sequential steps, namely cluster formation and cluster evaluation. In this paper, we propose the tight integration of cluster formation and cluster evaluation in interactive visual analysis in order to overcome the challenges that relate to the black-box nature of clustering algorithms. We present our conceptual framework in the form of an interactive visual environment. In this realization of our framework, we build upon general concepts such as cluster comparison, clustering tendency, cluster stability and cluster coherence. Additionally, we showcase our framework on the cluster analysis of mixed lipid bilayers
Revisiting Bertin Matrices: New Interactions for Crafting Tabular Visualizations
We present Bertifier, a web app for rapidly creating tabular visualizations from spreadsheets. Bertifier draws from Jacques Bertin's matrix analysis method, whose goal was to “simplify without destroying” by encoding cell values visually and grouping similar rows and columns. Although there were several attempts to bring this method to computers, no implementation exists today that is both exhaustive and accessible to a large audience. Bertifier remains faithful to Bertin's method while leveraging the power of today's interactive computers. Tables are formatted and manipulated through crossets, a new interaction technique for rapidly applying operations on rows and columns. We also introduce visual reordering, a semi-interactive reordering approach that lets users apply and tune automatic reordering algorithms in a WYSIWYG manner. Sessions with eight users from different backgrounds suggest that Bertifier has the potential to bring Bertin's method to a wider audience of both technical and non-technical users, and empower them with data analysis and communication tools that were so far only accessible to a handful of specialists
ClusterNet: A Perception-Based Clustering Model for Scattered Data
Visualizations for scattered data are used to make users understand certain
attributes of their data by solving different tasks, e.g. correlation
estimation, outlier detection, cluster separation. In this paper, we focus on
the later task, and develop a technique that is aligned to human perception,
that can be used to understand how human subjects perceive clusterings in
scattered data and possibly optimize for better understanding. Cluster
separation in scatterplots is a task that is typically tackled by widely used
clustering techniques, such as for instance k-means or DBSCAN. However, as
these algorithms are based on non-perceptual metrics, we can show in our
experiments, that their output do not reflect human cluster perception. We
propose a learning strategy which directly operates on scattered data. To learn
perceptual cluster separation on this data, we crowdsourced a large scale
dataset, consisting of 7,320 point-wise cluster affiliations for bivariate
data, which has been labeled by 384 human crowd workers. Based on this data, we
were able to train ClusterNet, a point-based deep learning model, trained to
reflect human perception of cluster separability. In order to train ClusterNet
on human annotated data, we use a PointNet++ architecture enabling inference on
point clouds directly. In this work, we provide details on how we collected our
dataset, report statistics of the resulting annotations, and investigate
perceptual agreement of cluster separation for real-world data. We further
report the training and evaluation protocol of ClusterNet and introduce a novel
metric, that measures the accuracy between a clustering technique and a group
of human annotators. Finally, we compare our approach against existing
state-of-the-art clustering techniques and can show, that ClusterNet is able to
generalize to unseen and out of scope data.Comment: Currently, this manuscript is under revision at TVC