217 research outputs found

    Unsupervised segmentation of mitochondria using model-based spectral clustering

    Get PDF
    Segmentation of mitochondria in microscopic images represents a significant challenge that is motivated by the wide morphological and structural variations that are characteristic for this category of membrane enclosed sub cellular organelles. To address the drawbacks associated with manual mark-up procedures (which are common in current clinical evaluations), a recent direction of research investigate the application of statistical machine learning methods to mitochondria segmentation. Within this field of research the main issue was generated by the complexity of the training set that is able to describe the vast structural variation that is associated with mitochondria. To avoid this problem, in this paper we apply perceptual organization models such as Figure-Ground, Similarity, Proximity and Closure which target the identification of the closed membranes in EM images using multistage spectral clustering [1,2]. Our unsupervised mitochondria segmentation algorithm is outlined in Fig. 1. The first stage of the spectral clustering implements foreground segmentation with the similarity model S1 that aims to identify the dark contours that are given by the outer membrane of the mitochondrion. In the second stage, the foreground data is re-clustered with a different similarity model S2 to identify the inner membrane of the mitochondrion. The last stage involves a contour processing step that eliminates the pixels that are not consistent with the minimum distance between the inner and outer membranes of the mitochondrion. The algorithm has been tested on a suite of EM images provided by the American Society of Cell Biology and a number of experimental results are presented in Fig. 2

    Nonparametric Feature Extraction from Dendrograms

    Full text link
    We propose feature extraction from dendrograms in a nonparametric way. The Minimax distance measures correspond to building a dendrogram with single linkage criterion, with defining specific forms of a level function and a distance function over that. Therefore, we extend this method to arbitrary dendrograms. We develop a generalized framework wherein different distance measures can be inferred from different types of dendrograms, level functions and distance functions. Via an appropriate embedding, we compute a vector-based representation of the inferred distances, in order to enable many numerical machine learning algorithms to employ such distances. Then, to address the model selection problem, we study the aggregation of different dendrogram-based distances respectively in solution space and in representation space in the spirit of deep representations. In the first approach, for example for the clustering problem, we build a graph with positive and negative edge weights according to the consistency of the clustering labels of different objects among different solutions, in the context of ensemble methods. Then, we use an efficient variant of correlation clustering to produce the final clusters. In the second approach, we investigate the sequential combination of different distances and features sequentially in the spirit of multi-layered architectures to obtain the final features. Finally, we demonstrate the effectiveness of our approach via several numerical studies

    The Metric Nearness Problem

    Get PDF
    Metric nearness refers to the problem of optimally restoring metric properties to distance measurements that happen to be nonmetric due to measurement errors or otherwise. Metric data can be important in various settings, for example, in clustering, classification, metric-based indexing, query processing, and graph theoretic approximation algorithms. This paper formulates and solves the metric nearness problem: Given a set of pairwise dissimilarities, find a ā€œnearestā€ set of distances that satisfy the properties of a metricā€”principally the triangle inequality. For solving this problem, the paper develops efficient triangle fixing algorithms that are based on an iterative projection method. An intriguing aspect of the metric nearness problem is that a special case turns out to be equivalent to the all pairs shortest paths problem. The paper exploits this equivalence and develops a new algorithm for the latter problem using a primal-dual method. Applications to graph clustering are provided as an illustration. We include experiments that demonstrate the computational superiority of triangle fixing over general purpose convex programming software. Finally, we conclude by suggesting various useful extensions and generalizations to metric nearness

    NeatMap - non-clustering heat map alternatives in R

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The clustered heat map is the most popular means of visualizing genomic data. It compactly displays a large amount of data in an intuitive format that facilitates the detection of hidden structures and relations in the data. However, it is hampered by its use of cluster analysis which does not always respect the intrinsic relations in the data, often requiring non-standardized reordering of rows/columns to be performed post-clustering. This sometimes leads to uninformative and/or misleading conclusions. Often it is more informative to use dimension-reduction algorithms (such as Principal Component Analysis and Multi-Dimensional Scaling) which respect the topology inherent in the data. Yet, despite their proven utility in the analysis of biological data, they are not as widely used. This is at least partially due to the lack of user-friendly visualization methods with the visceral impact of the heat map.</p> <p>Results</p> <p>NeatMap is an R package designed to meet this need. NeatMap offers a variety of novel plots (in 2 and 3 dimensions) to be used in conjunction with these dimension-reduction techniques. Like the heat map, but unlike traditional displays of such results, it allows the entire dataset to be displayed while visualizing relations between elements. It also allows superimposition of cluster analysis results for mutual validation. NeatMap is shown to be more informative than the traditional heat map with the help of two well-known microarray datasets.</p> <p>Conclusions</p> <p>NeatMap thus preserves many of the strengths of the clustered heat map while addressing some of its deficiencies. It is hoped that NeatMap will spur the adoption of non-clustering dimension-reduction algorithms.</p

    Local Multidimensional Scaling for Nonlinear Dimension Reduction, Graph Drawing and Proximity Analysis

    Get PDF
    In the past decade there has been a resurgence of interest in nonlinear dimension reduction. Among new proposals are ā€œLocal Linear Embedding,ā€ ā€œIsomap,ā€ and Kernel Principal Components Analysis which all construct global low-dimensional embeddings from local affine or metric information. We introduce a competing method called ā€œLocal Multidimensional Scalingā€ (LMDS). Like LLE, Isomap, and KPCA, LMDS constructs its global embedding from local information, but it uses instead a combination of MDS and ā€œforce-directedā€ graph drawing. We apply the force paradigm to create localized versions of MDS stress functions with a tuning parameter to adjust the strength of nonlocal repulsive forces. We solve the problem of tuning parameter selection with a meta-criterion that measures how well the sets of K-nearest neighbors agree between the data and the embedding. Tuned LMDS seems to be able to outperform MDS, PCA, LLE, Isomap, and KPCA, as illustrated with two well-known image datasets. The meta-criterion can also be used in a pointwise version as a diagnostic tool for measuring the local adequacy of embeddings and thereby detect local problems in dimension reductions

    A methodology to compare dimensionality reduction algorithms in terms of loss of quality

    Get PDF
    Dimensionality Reduction (DR) is attracting more attention these days as a result of the increasing need to handle huge amounts of data effectively. DR methods allow the number of initial features to be reduced considerably until a set of them is found that allows the original properties of the data to be kept. However, their use entails an inherent loss of quality that is likely to affect the understanding of the data, in terms of data analysis. This loss of quality could be determinant when selecting a DR method, because of the nature of each method. In this paper, we propose a methodology that allows different DR methods to be analyzed and compared as regards the loss of quality produced by them. This methodology makes use of the concept of preservation of geometry (quality assessment criteria) to assess the loss of quality. Experiments have been carried out by using the most well-known DR algorithms and quality assessment criteria, based on the literature. These experiments have been applied on 12 real-world datasets. Results obtained so far show that it is possible to establish a method to select the most appropriate DR method, in terms of minimum loss of quality. Experiments have also highlighted some interesting relationships between the quality assessment criteria. Finally, the methodology allows the appropriate choice of dimensionality for reducing data to be established, whilst giving rise to a minimum loss of quality

    Computation of Heterogeneous Object Co-embeddings from Relational Measurements

    Get PDF
    Dimensionality reduction and data embedding methods generate low dimensional representations of a single type of homogeneous data objects. In this work, we examine the problem of generating co-embeddings or pattern representations from two different types of objects within a joint common space of controlled dimensionality, where the only available information is assumed to be a set of pairwise relations or similarities between instances of the two groups. We propose a new method that models the embedding of each object type symmetrically to the other type, subject to flexible scale constraints and weighting parameters. The embedding generation relies on an efficient optimization dispatched using matrix decomposition, that is also extended to support multidimensional co-embeddings. We also propose a scheme of heuristically reducing the parameters of the model, and a simple way of measuring the conformity between the original object relations and the ones re-estimated from the co-embeddings, in order to achieve model selection by identifying the optimal model parameters with a simple search procedure. The capabilities of the proposed method are demonstrated with multiple synthetic and real-world datasets from the text mining domain. The experimental results and comparative analyses indicate that the proposed algorithm outperforms existing methods for co-embedding generation

    Multidimensional Scaling: Infinite Metric Measure Spaces

    Get PDF
    Multidimensional scaling (MDS) is a popular technique for mapping a finite metric space into a low-dimensional Euclidean space in a way that best preserves pairwise distances. We study a notion of MDS on infinite metric measure spaces, along with its optimality properties and goodness of fit. This allows us to study the MDS embeddings of the geodesic circle S1S^1 into Rm\mathbb{R}^m for all mm, and to ask questions about the MDS embeddings of the geodesic nn-spheres SnS^n into Rm\mathbb{R}^m. Furthermore, we address questions on convergence of MDS. For instance, if a sequence of metric measure spaces converges to a fixed metric measure space XX, then in what sense do the MDS embeddings of these spaces converge to the MDS embedding of XX? Convergence is understood when each metric space in the sequence has the same finite number of points, or when each metric space has a finite number of points tending to infinity. We are also interested in notions of convergence when each metric space in the sequence has an arbitrary (possibly infinite) number of points
    • ā€¦
    corecore