2,585 research outputs found

    DKWS: A Distributed System for Keyword Search on Massive Graphs (Complete Version)

    Full text link
    Due to the unstructuredness and the lack of schemas of graphs, such as knowledge graphs, social networks, and RDF graphs, keyword search for querying such graphs has been proposed. As graphs have become voluminous, large-scale distributed processing has attracted much interest from the database research community. While there have been several distributed systems, distributed querying techniques for keyword search are still limited. This paper proposes a novel distributed keyword search system called \DKWS. First, we \revise{present} a {\em monotonic} property with keyword search algorithms that guarantees correct parallelization. Second, we present a keyword search algorithm as monotonic backward and forward search phases. Moreover, we propose new tight bounds for pruning nodes being searched. Third, we propose a {\em notify-push} paradigm and \PINE {\em programming model} of \DKWS. The notify-push paradigm allows {\em asynchronously} exchanging the upper bounds of matches across the workers and the coordinator in \DKWS. The \PINE programming model naturally fits keyword search algorithms, as they have distinguished phases, to allow {\em preemptive} searches to mitigate staleness in a distributed system. Finally, we investigate the performance and effectiveness of \DKWS through experiments using real-world datasets. We find that \DKWS is up to two orders of magnitude faster than related techniques, and its communication costs are 7.67.6 times smaller than those of other techniques

    Reverse spatial visual top-k query

    Get PDF
    With the wide application of mobile Internet techniques an location-based services (LBS), massive multimedia data with geo-tags has been generated and collected. In this paper, we investigate a novel type of spatial query problem, named reverse spatial visual top- kk query (RSVQ k ) that aims to retrieve a set of geo-images that have the query as one of the most relevant geo-images in both geographical proximity and visual similarity. Existing approaches for reverse top- kk queries are not suitable to address this problem because they cannot effectively process unstructured data, such as image. To this end, firstly we propose the definition of RSVQ k problem and introduce the similarity measurement. A novel hybrid index, named VR 2 -Tree is designed, which is a combination of visual representation of geo-image and R-Tree. Besides, an extension of VR 2 -Tree, called CVR 2 -Tree is introduced and then we discuss the calculation of lower/upper bound, and then propose the optimization technique via CVR 2 -Tree for further pruning. In addition, a search algorithm named RSVQ k algorithm is developed to support the efficient RSVQ k query. Comprehensive experiments are conducted on four geo-image datasets, and the results illustrate that our approach can address the RSVQ k problem effectively and efficiently

    Geo-Social Group Queries with Minimum Acquaintance Constraint

    Full text link
    The prosperity of location-based social networking services enables geo-social group queries for group-based activity planning and marketing. This paper proposes a new family of geo-social group queries with minimum acquaintance constraint (GSGQs), which are more appealing than existing geo-social group queries in terms of producing a cohesive group that guarantees the worst-case acquaintance level. GSGQs, also specified with various spatial constraints, are more complex than conventional spatial queries; particularly, those with a strict kkNN spatial constraint are proved to be NP-hard. For efficient processing of general GSGQ queries on large location-based social networks, we devise two social-aware index structures, namely SaR-tree and SaR*-tree. The latter features a novel clustering technique that considers both spatial and social factors. Based on SaR-tree and SaR*-tree, efficient algorithms are developed to process various GSGQs. Extensive experiments on real-world Gowalla and Dianping datasets show that our proposed methods substantially outperform the baseline algorithms based on R-tree.Comment: This is the preprint version that is accepted by the Very Large Data Bases Journa

    On Graph Stream Clustering with Side Information

    Full text link
    Graph clustering becomes an important problem due to emerging applications involving the web, social networks and bio-informatics. Recently, many such applications generate data in the form of streams. Clustering massive, dynamic graph streams is significantly challenging because of the complex structures of graphs and computational difficulties of continuous data. Meanwhile, a large volume of side information is associated with graphs, which can be of various types. The examples include the properties of users in social network activities, the meta attributes associated with web click graph streams and the location information in mobile communication networks. Such attributes contain extremely useful information and has the potential to improve the clustering process, but are neglected by most recent graph stream mining techniques. In this paper, we define a unified distance measure on both link structures and side attributes for clustering. In addition, we propose a novel optimization framework DMO, which can dynamically optimize the distance metric and make it adapt to the newly received stream data. We further introduce a carefully designed statistics SGS(C) which consume constant storage spaces with the progression of streams. We demonstrate that the statistics maintained are sufficient for the clustering process as well as the distance optimization and can be scalable to massive graphs with side attributes. We will present experiment results to show the advantages of the approach in graph stream clustering with both links and side information over the baselines.Comment: Full version of SIAM SDM 2013 pape

    Retrieving Top-N Weighted Spatial k-cliques

    Full text link
    Spatial data analysis is a classic yet important topic because of its wide range of applications. Recently, as a spatial data analysis approach, a neighbor graph of a set P of spatial points has often been employed. This paper also considers a spatial neighbor graph and addresses a new problem, namely top-N weighted spatial k-clique retrieval. This problem searches for the N minimum weighted cliques consisting of k points in P, and it has important applications, such as community detection and co-location pattern mining. Recent spatial datasets have many points, and efficiently dealing with such big datasets is one of the main requirements of applications. A straightforward approach to solving our problem is to try to enumerate all k-cliques, which incurs O(nkk2) time. Since k ≥ 3, this approach cannot achieve the main requirement, so computing the result without enumerating unnecessary k-cliques is required. This paper achieves this challenging task and proposes a simple practically-efficient algorithm that returns the exact answer. We conduct experiments using two real spatial datasets consisting of million points, and the results show the efficiency of our algorithm, e.g., it can return the exact top-N result within 1 second when N ≤ 1000 and k ≤ 7.Taniguchi R., Amagata D., Hara T.. Retrieving Top-N Weighted Spatial k-cliques. Proceedings - 2022 IEEE International Conference on Big Data, Big Data 2022 , 4952 (2022); https://doi.org/10.1109/BigData55660.2022.10021071
    corecore