4,776 research outputs found

    Data clustering using a model granular magnet

    Full text link
    We present a new approach to clustering, based on the physical properties of an inhomogeneous ferromagnet. No assumption is made regarding the underlying distribution of the data. We assign a Potts spin to each data point and introduce an interaction between neighboring points, whose strength is a decreasing function of the distance between the neighbors. This magnetic system exhibits three phases. At very low temperatures it is completely ordered; all spins are aligned. At very high temperatures the system does not exhibit any ordering and in an intermediate regime clusters of relatively strongly coupled spins become ordered, whereas different clusters remain uncorrelated. This intermediate phase is identified by a jump in the order parameters. The spin-spin correlation function is used to partition the spins and the corresponding data points into clusters. We demonstrate on three synthetic and three real data sets how the method works. Detailed comparison to the performance of other techniques clearly indicates the relative success of our method.Comment: 46 pages, postscript, 15 ps figures include

    Discovery of new stellar groups in the Orion complex

    Full text link
    We test the ability of two unsupervised machine learning algorithms, \textit{EnLink} and Shared Nearest Neighbour (SNN), to identify stellar groupings in the Orion star-forming complex as an application to the 5-dimensional astrometric data from \textit{Gaia} DR2. The algorithms represent two distinct approaches to limiting user bias when selecting parameter values and evaluating the relative weights among astrometric parameters. \textit{EnLink} adopts a locally adaptive distance metric and eliminates the need of parameter tuning through automation. The original SNN relies only on human input for parameter tuning so we modified SNN to run in two stages. We first ran the original SNN 7,000 times, each with a randomly generated sample according to within-source co-variance matrices provided in \textit{Gaia} DR2 and random parameter values within reasonable ranges. During the second stage, we modified SNN to identify the most repeating stellar groups from 25,798 we obtained in the first stage. We reveal 21 spatially- and kinematically-coherent groups in the Orion complex, 12 of which previously unknown. The groups show a wide distribution of distances extending as far as about 150 pc in front of the star-forming Orion molecular clouds, to about 50 pc beyond them where we find, unexpectedly, several groups. Our results expose to view the wealth of sub-structure in the OB association, within and beyond the classical Blaauw Orion OBI sub-groups. A full characterization of the new groups is of the essence as it offers the potential to unveil how star formation proceeds globally in large complexes such as Orion. The data and code that generated the groups in this work as well as the final table can be found at \protect\url{ https://github.com/BoquanErwinChen/GaiaDR2_Orion_Dissection}.Comment: 9 pages, 4 figures. Accepted by A&A. Comments welcom

    A novel double-hybrid learning method for modal frequency-based damage assessment of bridge structures under different environmental variation patterns

    Get PDF
    Monitoring of modal frequencies under an unsupervised learning framework is a practical strategy for damage assessment of civil structures, especially bridges. However, the key challenge is related to high sensitivity of modal frequencies to environmental and/or operational changes that may lead to economic and safety losses. The other challenge pertains to different environmental and/or operational variation patterns in modal frequencies due to differences in structural types, materials, and applications, measurement periods in terms of short and/or long monitoring programs, geographical locations of structures, weather conditions, and influences of single or multiple environmental and/or operational factors, which may cause barriers to employing stateof-the-art unsupervised learning approaches. To cope with these issues, this paper proposes a novel double-hybrid learning technique in an unsupervised manner. It contains two stages of data partitioning and anomaly detection, both of which comprise two hybrid algorithms. For the first stage, an improved hybrid clustering method based on a coupling of shared nearest neighbor searching and density peaks clustering is proposed to prepare local information for anomaly detection with the focus on mitigating environmental and/or operational effects. For the second stage, this paper proposes an innovative non-parametric hybrid anomaly detector based on local outlier factor. In both stages, the number of nearest neighbors is the key hyperparameter that is automatically determined by leveraging a self-adaptive neighbor searching algorithm. Modal frequencies of two full-scale bridges are utilized to validate the proposed technique with several comparisons. Results indicate that this technique is able to successfully eliminate different environmental and/or operational variations and correctly detect damage

    PECANN: Parallel Efficient Clustering with Graph-Based Approximate Nearest Neighbor Search

    Full text link
    This paper studies density-based clustering of point sets. These methods use dense regions of points to detect clusters of arbitrary shapes. In particular, we study variants of density peaks clustering, a popular type of algorithm that has been shown to work well in practice. Our goal is to cluster large high-dimensional datasets, which are prevalent in practice. Prior solutions are either sequential, and cannot scale to large data, or are specialized for low-dimensional data. This paper unifies the different variants of density peaks clustering into a single framework, PECANN, by abstracting out several key steps common to this class of algorithms. One such key step is to find nearest neighbors that satisfy a predicate function, and one of the main contributions of this paper is an efficient way to do this predicate search using graph-based approximate nearest neighbor search (ANNS). To provide ample parallelism, we propose a doubling search technique that enables points to find an approximate nearest neighbor satisfying the predicate in a small number of rounds. Our technique can be applied to many existing graph-based ANNS algorithms, which can all be plugged into PECANN. We implement five clustering algorithms with PECANN and evaluate them on synthetic and real-world datasets with up to 1.28 million points and up to 1024 dimensions on a 30-core machine with two-way hyper-threading. Compared to the state-of-the-art FASTDP algorithm for high-dimensional density peaks clustering, which is sequential, our best algorithm is 45x-734x faster while achieving competitive ARI scores. Compared to the state-of-the-art parallel DPC-based algorithm, which is optimized for low dimensions, we show that PECANN is two orders of magnitude faster. As far as we know, our work is the first to evaluate DPC variants on large high-dimensional real-world image and text embedding datasets
    corecore