36 research outputs found

    Theoretically-Efficient and Practical Parallel DBSCAN

    Full text link
    The DBSCAN method for spatial clustering has received significant attention due to its applicability in a variety of data analysis tasks. There are fast sequential algorithms for DBSCAN in Euclidean space that take O(nlogn)O(n\log n) work for two dimensions, sub-quadratic work for three or more dimensions, and can be computed approximately in linear work for any constant number of dimensions. However, existing parallel DBSCAN algorithms require quadratic work in the worst case, making them inefficient for large datasets. This paper bridges the gap between theory and practice of parallel DBSCAN by presenting new parallel algorithms for Euclidean exact DBSCAN and approximate DBSCAN that match the work bounds of their sequential counterparts, and are highly parallel (polylogarithmic depth). We present implementations of our algorithms along with optimizations that improve their practical performance. We perform a comprehensive experimental evaluation of our algorithms on a variety of datasets and parameter settings. Our experiments on a 36-core machine with hyper-threading show that we outperform existing parallel DBSCAN implementations by up to several orders of magnitude, and achieve speedups by up to 33x over the best sequential algorithms

    Faster DBScan and HDBScan in Low-Dimensional Euclidean Spaces

    Get PDF
    We present a new algorithm for the widely used density-based clustering method DBScan. Our algorithm computes the DBScan-clustering in O(n log n) time in R^2, irrespective of the scale parameter eps, but assuming the second parameter MinPts is set to a fixed constant, as is the case in practice. We also present an O(n log n) randomized algorithm for HDBScan in the plane---HDBScans is a hierarchical version of DBScan introduced recently---and we show how to compute an approximate version of HDBScan in near-linear time in any fixed dimension

    Clustering image sets with features from deep convolutional neural networks

    Get PDF
    Abstract. This thesis compares the results of clustering image sets by features extracted using different layers of a convolutional neural network. The image features were extracted with layers of a pre-trained image classification network which layer weights were trained with ImageNet dataset. Eight image sets were used to test which extracted features achieve the best clustering accuracies. Features from the test image sets were extracted with the layers of the network architecture, and the features were clustered on a layer by layer basis. The clustering accuracies were measured with normalized mutual information (NMI). The results show that the clustering accuracies depend on the characteristic of the image set being clustered. The image sets with more than two image categories had the best NMI scores with the features from the second last layer in the architecture, while the image sets with two categories had different layers give the best NMI scores. Moreover, the image set with blurred images had the best result come from few of the first layers, showing that the current method of selecting the second last layer for feature extraction in pre-trained CNNs is not always optimal.Piirteiden vaikutus kuvaryhmän klusterointiin käyttäen konvoluutioverkolla irroitettuja piirteitä. Tiivistelmä. Tässä työssä vertaillaan kuvajoukkojen klusterointituloksia eri piirteillä. Piirteiden irrotukseen kuvista käytettiin valmiiksi koulutetun konvoluutio neuroverkon eri tasoja. Neuroverkko oli koulutettu kuvaluokitteluun ImageNet datajoukolla. Kahdeksan kuvajoukkoa klusteroitiin eri piirteillä, jotka oli irrotettu neuroverkon eri tasoilla. Näiden kuvajoukkojen klusterointitarkkuus mitattiin parhaan piirreirrotus tason löytämiseksi kullekin kuvajoukolle. Klusteroinnin tulos mitattiin normalisoidulla yhteisen informaation metriikalla (normalized mutual information). Työn tulos osoitti, että klusterointitulos taso tasolta mitatessa riippuu klusteroitavasta kuvajoukosta. Kuvajoukot, jotka sisälsivät kuvia useammasta kuin kahdesta kategoriasta, klusteroituvat parhaiten verkon toiseksi viimeisellä tasolla irrotetuilla piirteillä. Kahden kategorian kuvajoukkojen parhaat klusterointi tulokset tulivat eri tasoilla. Kuvajoukko joka sisälsi kuvia sumeista ja tarkoista kuvista, saavutti parhaat klusterointitulokset piirteillä, jotka oli irrotettu verkon ylemmiltä tasoilta. Tulokset osoittavat, että yleisesti käytetty menetelmä valita valmiiksi koulutetun verkon toiseksi viimeinen taso piirreirrotukseen ei aina anna optimaalista tulosta

    A DATA-DRIVEN METHODOLOGY TO ANALYZE AIR TRAFFIC MANAGEMENT SYSTEM OPERATIONS WITHIN THE TERMINAL AIRSPACE

    Get PDF
    Air Traffic Management (ATM) systems are the systems responsible for managing the operations of all aircraft within an airspace. In the past two decades, global modernization efforts have been underway to increase ATM system capacity and efficiency, while maintaining safety. Gaining a comprehensive understanding of both flight-level and airspace-level operations enables ATM system operators, planners, and decision-makers to make better-informed and more robust decisions related to the implementation of future operational concepts. The increased availability of operational data, including widely-accessible ADS-B trajectory data, and advances in modern machine learning techniques provide the basis for offline data-driven methods to be applied to analyze ATM system operations. Further, analysis of ATM system operations of arriving aircraft within the terminal airspace has the highest potential to impact safety, capacity, and efficiency levels due to the highest rate of accidents and incidents occurring during the arrival flight phases. Therefore, motivating this research is the question of how offline data-driven methods may be applied to ADS-B trajectory data to analyze ATM system operations at both the flight and airspace levels for arriving aircraft within the terminal airspace to extract novel insights relevant to ATM system operators, planners, and decision-makers. An offline data-driven methodology to analyze ATM system operations is proposed involving the following three steps: (i) Air Traffic Flow Identification, (ii) Anomaly Detection, and (iii) Airspace-Level Analysis. The proposed methodology is implemented considering ADS-B trajectory data that was extracted, cleaned, processed, and augmented for aircraft arriving at San Francisco International Airport (KSFO) during the full year of 2019 as well as the corresponding extracted and processed ASOS weather data. The Air Traffic Flow Identification step contributes a method to more reliably identify air traffic flows for arriving aircraft trajectories through a novel implementation of the HDBSCAN clustering algorithm with a weighted Euclidean distance function. The Anomaly Detection step contributes the novel distinction between spatial and energy anomalies in ADS-B trajectory data and provides key insights into the relationship between the two types of anomalies. Spatial anomalies are detected leveraging the aforementioned air traffic flow identification method, whereas energy anomalies are detected leveraging the DBSCAN clustering algorithm. Finally, the Airspace-Level Analysis step contributes a novel method to identify operational patterns and characterize operational states of aircraft arriving within the terminal airspace during specified time intervals leveraging the UMAP dimensionality reduction technique and DBSCAN clustering algorithm. Additionally, the ability to predict, in advance, a time interval’s operational pattern using metrics derived from the ASOS weather data as input and training a gradient-boosted decision tree (XGBoost) algorithm is provided.Ph.D

    Density-based clustering: algorithms and evaluation techniques

    Get PDF
    Density-based clustering algorithms involve a relevant subset of all the methods developed for cluster analysis, which is one of the fundamental pillars of unsupervised learning [2]. While the origins of clustering can be traced to the early 20th century [3], it is not until the 1990s that the concerns that would lead to develop density-based clustering algorithms are raised [4]. In 1996, the most popular density-based clustering algorithm to date (DBSCAN) is published [5] and, with it, many applications for density-based clustering are found within increasingly different fields over the next decades. In this introductory chapter, we present an overview of the research that led to this dissertation, focused mainly on density-based clustering. The work presented in this document can be divided into two main blocks, which, briefly stated, are: (1) research on the development of novel density-based algorithms and (2) research on evaluation techniques and metrics for density -based clustering. The motivation that led to this approach is expressed in Section 1.1. First, the original motivation to pursue the study of densitybased clustering algorithms (landmark discovery) is introduced in Section 1.1.1. After that, in Section 1.1.2, we explain the demand for an evaluation benchmark applicable to density-based clustering algorithms. In Section 1.2, the main objectives of this thesis, which emerge from the demands and opportunities introduced in the motivation section, are presented and justified. Subsequently, we introduce the main scientific contributions of this thesis (Section 1.3). A notation guide is then included to serve as a reference for the reader (Section 1.4). Lastly, the description regarding the structure of this document is included in Section 1.5.Programa de Doctorado en Multimedia y Comunicaciones por la Universidad Carlos III de Madrid y la Universidad Rey Juan CarlosPresidente: Emilio Parrado Hernández.- Secretario: Fernando Fernández Martínez.- Vocal: Raúl Santos Rodrígue

    COMET-AR User's Manual: COmputational MEchanics Testbed with Adaptive Refinement

    Get PDF
    The COMET-AR User's Manual provides a reference manual for the Computational Structural Mechanics Testbed with Adaptive Refinement (COMET-AR), a software system developed jointly by Lockheed Palo Alto Research Laboratory and NASA Langley Research Center under contract NAS1-18444. The COMET-AR system is an extended version of an earlier finite element based structural analysis system called COMET, also developed by Lockheed and NASA. The primary extensions are the adaptive mesh refinement capabilities and a new "object-like" database interface that makes COMET-AR easier to extend further. This User's Manual provides a detailed description of the user interface to COMET-AR from the viewpoint of a structural analyst

    Introductory Computer Forensics

    Get PDF
    INTERPOL (International Police) built cybercrime programs to keep up with emerging cyber threats, and aims to coordinate and assist international operations for ?ghting crimes involving computers. Although signi?cant international efforts are being made in dealing with cybercrime and cyber-terrorism, ?nding effective, cooperative, and collaborative ways to deal with complicated cases that span multiple jurisdictions has proven dif?cult in practic
    corecore