84,548 research outputs found

    Nonparametric Feature Extraction from Dendrograms

    Full text link
    We propose feature extraction from dendrograms in a nonparametric way. The Minimax distance measures correspond to building a dendrogram with single linkage criterion, with defining specific forms of a level function and a distance function over that. Therefore, we extend this method to arbitrary dendrograms. We develop a generalized framework wherein different distance measures can be inferred from different types of dendrograms, level functions and distance functions. Via an appropriate embedding, we compute a vector-based representation of the inferred distances, in order to enable many numerical machine learning algorithms to employ such distances. Then, to address the model selection problem, we study the aggregation of different dendrogram-based distances respectively in solution space and in representation space in the spirit of deep representations. In the first approach, for example for the clustering problem, we build a graph with positive and negative edge weights according to the consistency of the clustering labels of different objects among different solutions, in the context of ensemble methods. Then, we use an efficient variant of correlation clustering to produce the final clusters. In the second approach, we investigate the sequential combination of different distances and features sequentially in the spirit of multi-layered architectures to obtain the final features. Finally, we demonstrate the effectiveness of our approach via several numerical studies

    Developments in the theory of randomized shortest paths with a comparison of graph node distances

    Get PDF
    There have lately been several suggestions for parametrized distances on a graph that generalize the shortest path distance and the commute time or resistance distance. The need for developing such distances has risen from the observation that the above-mentioned common distances in many situations fail to take into account the global structure of the graph. In this article, we develop the theory of one family of graph node distances, known as the randomized shortest path dissimilarity, which has its foundation in statistical physics. We show that the randomized shortest path dissimilarity can be easily computed in closed form for all pairs of nodes of a graph. Moreover, we come up with a new definition of a distance measure that we call the free energy distance. The free energy distance can be seen as an upgrade of the randomized shortest path dissimilarity as it defines a metric, in addition to which it satisfies the graph-geodetic property. The derivation and computation of the free energy distance are also straightforward. We then make a comparison between a set of generalized distances that interpolate between the shortest path distance and the commute time, or resistance distance. This comparison focuses on the applicability of the distances in graph node clustering and classification. The comparison, in general, shows that the parametrized distances perform well in the tasks. In particular, we see that the results obtained with the free energy distance are among the best in all the experiments.Comment: 30 pages, 4 figures, 3 table

    An Approximation Scheme for the Generalized Geometric Minimum Spanning Tree Problem with Grid Clustering

    Get PDF
    This paper is concerned with a special case of the Generalized Minimum Spanning Tree Problem. The Generalized Minimum Spanning Tree Problem is de¯ned on an undirected graph, where the vertex set is partitioned into clusters, and non-negative costs are associated with the edges. The problem is to ¯nd a tree of minimum cost containing exactly one vertex in each cluster. We consider a geometric case of the problem where the graph is complete, all vertices are situated in the plane, and Euclidean distance de¯nes the edge cost. We prove that the problem admits PTAS if restricted to grid clustering.operations research and management science;

    A DC Programming Approach for Solving Multicast Network Design Problems via the Nesterov Smoothing Technique

    Get PDF
    This paper continues our effort initiated in [9] to study Multicast Communication Networks, modeled as bilevel hierarchical clustering problems, by using mathematical optimization techniques. Given a finite number of nodes, we consider two different models of multicast networks by identifying a certain number of nodes as cluster centers, and at the same time, locating a particular node that serves as a total center so as to minimize the total transportation cost through the network. The fact that the cluster centers and the total center have to be among the given nodes makes this problem a discrete optimization problem. Our approach is to reformulate the discrete problem as a continuous one and to apply Nesterov smoothing approximation technique on the Minkowski gauges that are used as distance measures. This approach enables us to propose two implementable DCA-based algorithms for solving the problems. Numerical results and practical applications are provided to illustrate our approach
    corecore