9,689 research outputs found

    Optimized bi-dimensional data projection for clustering visualization

    Get PDF
    We propose a new method to project n-dimensional data onto two dimensions, for visualization purposes. Our goal is to produce a bi-dimensional representation that better separate existing clusters. Accordingly, to generate this projection we apply Differential Evolution as a meta-heuristic to optimize a divergence measure of the projected data. This divergence measure is based on the Cauchy–Schwartz divergence, extended for multiple classes. It accounts for the separability of the clusters in the projected space using the Renyi entropy and Information Theoretical Clustering analysis. We test the proposed method on two synthetic and five real world data sets, obtaining well separated projected clusters in two dimensions. These results were compared with results generated by PCA and a recent likelihood based visualization method

    Information visualization for DNA microarray data analysis: A critical review

    Get PDF
    Graphical representation may provide effective means of making sense of the complexity and sheer volume of data produced by DNA microarray experiments that monitor the expression patterns of thousands of genes simultaneously. The ability to use ldquoabstractrdquo graphical representation to draw attention to areas of interest, and more in-depth visualizations to answer focused questions, would enable biologists to move from a large amount of data to particular records they are interested in, and therefore, gain deeper insights in understanding the microarray experiment results. This paper starts by providing some background knowledge of microarray experiments, and then, explains how graphical representation can be applied in general to this problem domain, followed by exploring the role of visualization in gene expression data analysis. Having set the problem scene, the paper then examines various multivariate data visualization techniques that have been applied to microarray data analysis. These techniques are critically reviewed so that the strengths and weaknesses of each technique can be tabulated. Finally, several key problem areas as well as possible solutions to them are discussed as being a source for future work

    Dynamic Metric Learning from Pairwise Comparisons

    Full text link
    Recent work in distance metric learning has focused on learning transformations of data that best align with specified pairwise similarity and dissimilarity constraints, often supplied by a human observer. The learned transformations lead to improved retrieval, classification, and clustering algorithms due to the better adapted distance or similarity measures. Here, we address the problem of learning these transformations when the underlying constraint generation process is nonstationary. This nonstationarity can be due to changes in either the ground-truth clustering used to generate constraints or changes in the feature subspaces in which the class structure is apparent. We propose Online Convex Ensemble StrongLy Adaptive Dynamic Learning (OCELAD), a general adaptive, online approach for learning and tracking optimal metrics as they change over time that is highly robust to a variety of nonstationary behaviors in the changing metric. We apply the OCELAD framework to an ensemble of online learners. Specifically, we create a retro-initialized composite objective mirror descent (COMID) ensemble (RICE) consisting of a set of parallel COMID learners with different learning rates, demonstrate RICE-OCELAD on both real and synthetic data sets and show significant performance improvements relative to previously proposed batch and online distance metric learning algorithms.Comment: to appear Allerton 2016. arXiv admin note: substantial text overlap with arXiv:1603.0367
    corecore