42,163 research outputs found

    Efficient Computation of Multiple Density-Based Clustering Hierarchies

    Full text link
    HDBSCAN*, a state-of-the-art density-based hierarchical clustering method, produces a hierarchical organization of clusters in a dataset w.r.t. a parameter mpts. While the performance of HDBSCAN* is robust w.r.t. mpts in the sense that a small change in mpts typically leads to only a small or no change in the clustering structure, choosing a "good" mpts value can be challenging: depending on the data distribution, a high or low value for mpts may be more appropriate, and certain data clusters may reveal themselves at different values of mpts. To explore results for a range of mpts values, however, one has to run HDBSCAN* for each value in the range independently, which is computationally inefficient. In this paper, we propose an efficient approach to compute all HDBSCAN* hierarchies for a range of mpts values by replacing the graph used by HDBSCAN* with a much smaller graph that is guaranteed to contain the required information. An extensive experimental evaluation shows that with our approach one can obtain over one hundred hierarchies for the computational cost equivalent to running HDBSCAN* about 2 times.Comment: A short version of this paper appears at IEEE ICDM 2017. Corrected typos. Revised abstrac

    Extension of One-Dimensional Proximity Regions to Higher Dimensions

    Get PDF
    Proximity maps and regions are defined based on the relative allocation of points from two or more classes in an area of interest and are used to construct random graphs called proximity catch digraphs (PCDs) which have applications in various fields. The simplest of such maps is the spherical proximity map which maps a point from the class of interest to a disk centered at the same point with radius being the distance to the closest point from the other class in the region. The spherical proximity map gave rise to class cover catch digraph (CCCD) which was applied to pattern classification. Furthermore for uniform data on the real line, the exact and asymptotic distribution of the domination number of CCCDs were analytically available. In this article, we determine some appealing properties of the spherical proximity map in compact intervals on the real line and use these properties as a guideline for defining new proximity maps in higher dimensions. Delaunay triangulation is used to partition the region of interest in higher dimensions. Furthermore, we introduce the auxiliary tools used for the construction of the new proximity maps, as well as some related concepts that will be used in the investigation and comparison of them and the resulting graphs. We characterize the geometry invariance of PCDs for uniform data. We also provide some newly defined proximity maps in higher dimensions as illustrative examples

    TS2PACK: A Two-Level Tabu Search for the Three-dimensional Bin Packing Problem

    Get PDF
    Three-dimensional orthogonal bin packing is a problem NP-hard in the strong sense where a set of boxes must be orthogonally packed into the minimum number of three-dimensional bins. We present a two-level tabu search for this problem. The first-level aims to reduce the number of bins. The second optimizes the packing of the bins. This latter procedure is based on the Interval Graph representation of the packing, proposed by Fekete and Schepers, which reduces the size of the search space. We also introduce a general method to increase the size of the associated neighborhoods, and thus the quality of the search, without increasing the overall complexity of the algorithm. Extensive computational results on benchmark problem instances show the effectiveness of the proposed approach, obtaining better results compared to the existing one

    Generalizations of the Kolmogorov-Barzdin embedding estimates

    Full text link
    We consider several ways to measure the `geometric complexity' of an embedding from a simplicial complex into Euclidean space. One of these is a version of `thickness', based on a paper of Kolmogorov and Barzdin. We prove inequalities relating the thickness and the number of simplices in the simplicial complex, generalizing an estimate that Kolmogorov and Barzdin proved for graphs. We also consider the distortion of knots. We give an alternate proof of a theorem of Pardon that there are isotopy classes of knots requiring arbitrarily large distortion. This proof is based on the expander-like properties of arithmetic hyperbolic manifolds.Comment: 45 page