4 research outputs found

    Retrieving Top-N Weighted Spatial k-cliques

    Full text link
    Spatial data analysis is a classic yet important topic because of its wide range of applications. Recently, as a spatial data analysis approach, a neighbor graph of a set P of spatial points has often been employed. This paper also considers a spatial neighbor graph and addresses a new problem, namely top-N weighted spatial k-clique retrieval. This problem searches for the N minimum weighted cliques consisting of k points in P, and it has important applications, such as community detection and co-location pattern mining. Recent spatial datasets have many points, and efficiently dealing with such big datasets is one of the main requirements of applications. A straightforward approach to solving our problem is to try to enumerate all k-cliques, which incurs O(nkk2) time. Since k ≥ 3, this approach cannot achieve the main requirement, so computing the result without enumerating unnecessary k-cliques is required. This paper achieves this challenging task and proposes a simple practically-efficient algorithm that returns the exact answer. We conduct experiments using two real spatial datasets consisting of million points, and the results show the efficiency of our algorithm, e.g., it can return the exact top-N result within 1 second when N ≤ 1000 and k ≤ 7.Taniguchi R., Amagata D., Hara T.. Retrieving Top-N Weighted Spatial k-cliques. Proceedings - 2022 IEEE International Conference on Big Data, Big Data 2022 , 4952 (2022); https://doi.org/10.1109/BigData55660.2022.10021071

    Efficient Retrieval of Top-k Weighted Triangles on Static and Dynamic Spatial Data

    Get PDF
    Due to the proliferation of location-based services, spatial data analysis becomes more and more important. We consider graphs consisting of spatial points, where each point has edges to its nearby points and the weight of each edge is the distance between the corresponding points, as they have been receiving attention as spatial data analysis tools. We focus on triangles in such graphs and address the problem of retrieving the top- kk weighted spatial triangles. This problem is computationally challenging, because the number of triangles in a graph is generally huge and enumerating all of them is not feasible. To overcome this challenge, we propose an algorithm that returns the exact result efficiently. We moreover consider two dynamic data models: (i) fully dynamic data that allow arbitrary point insertions and deletions and (ii) streaming data in a sliding-window model. They often appear in location-based services. The results of our experiments on real datasets show the efficiency of our algorithms for static and dynamic data.Taniguchi R., Amagata D., Hara T.. Efficient Retrieval of Top-k Weighted Triangles on Static and Dynamic Spatial Data. IEEE Access 10, 55298 (2022); https://doi.org/10.1109/ACCESS.2022.3177620

    A new compressed cover tree for k-nearest neighbour search and the stable-under-noise mergegram of a point cloud

    Get PDF
    The analysis of data sets mathematically representable as finite metric spaces plays a significant role in every scientific study. In this thesis we focus on constructing new effective algorithms in the area of computational geometry that can be effectively deployed for the study and classification of large data sets prevalent in natural science, economic analysis, medicine, environmental protection etc. The first major contribution of this thesis is a new near-linear time algorithm, that resolves the classical problem of finding kk-nearest neighbors (KNN) to of query set QQ in a larger reference set RR, where QQ and RR both belong to some metric space XX. This project was inspired by the work of Beygelzimer, Kakade, and Langford in ICML 2006 that attempted to show that such problem is resolvable for k=1k=1 having a near-linear time complexity. However, in 2015 it was pointed out that the proof of their time complexity might contain mistakes, which has been ascertained in this thesis by showing that the proposed proof does not withstand a concrete counterexample. An important application of the KNN algorithm is a KNN graph on a finite metric space RR whose edge set is formed by connecting every point p∈Rp \in R with its kk-nearest neighbors. The KNN graph finds its application in areas of data-skeletonization, where it can serve as an initial skeleton of the data set, or in cluster analysis, where connected components of the KNN graph can represent the clusters. Another application of the the KNN algorithm is Minimum spanning tree (MST), which is an efficient way to visualize any unstructured data while knowing only distances, for example any metric graph connecting abstract data points. Although many efficient algorithms for the MST in metric spaces have been devised, there existed only one past attempt to justify a near-linear time complexity in the size of a given metric space. In 2010 March, Ram, and Gray claimed that MST of any finite metric space can be built in a parametrized near-linear time. In this work we have demonstrated, with multiple counterexamples, that the attempted proof was incorrect by showing that one of its step fails. Encouraged by the results of the work of 2010 this thesis produces a new algorithm that is based on Boruvka algorithm, which is combined with the KNN method to resolve the metric MST problem in a near-linear time complexity. In the thesis final chapter the MST algorithm is applied in the computation of a new isometry invariant mergegram of Topological Data Analysis (TDA). TDA quantifies topological shapes hidden in unorganized data such as clouds of unordered points. In the 00-dimensional (00D) case, the distance-based persistence is determined by a single-linkage (SL) clustering of a finite set in a metric space. Equivalently, the 0D0D persistence captures only edge lengths of a Minimum Spanning Tree (MST). Both the SL dendrogram and the MST are unstable under perturbations of points. In this thesis, we define the new stable-under-noise mergegram which outperforms previous isometry invariants on a classification of point clouds. In conclusion, the developed fast algorithms of this thesis can cater to a vast varieties of tasks in data science and beyond. The newly proposed corrected time complexity analysis of KNN and MST not only rectifies the past issues in their theoretical justifications but also gives a way to fix analogous issues in other similar methods based on the cover tree data structure

    Geometric Minimum Spanning Trees with GEOFILTERKRUSKAL

    No full text
    Let P be a set of points in R d. We propose GEOFILTERKRUSKAL, an algorithm that computes the minimum spanning tree of P using well separated pair decomposition in combination with a simple modification of Kruskal’s algorithm. When P is sampled from uniform random distribution, we show that our algorithm takes one parallel sort plus a linear number of additional steps, with high probability, to compute the minimum spanning tree. Experiments show that our algorithm works better in practice for most data distributions compared to the current state of the art [31]. Our algorithm is easy to parallelize and to our knowledge, is currently the best practical algorithm on multi-core machines for d > 2
    corecore