2,585 research outputs found
DKWS: A Distributed System for Keyword Search on Massive Graphs (Complete Version)
Due to the unstructuredness and the lack of schemas of graphs, such as
knowledge graphs, social networks, and RDF graphs, keyword search for querying
such graphs has been proposed. As graphs have become voluminous, large-scale
distributed processing has attracted much interest from the database research
community. While there have been several distributed systems, distributed
querying techniques for keyword search are still limited. This paper proposes a
novel distributed keyword search system called \DKWS. First, we
\revise{present} a {\em monotonic} property with keyword search algorithms that
guarantees correct parallelization. Second, we present a keyword search
algorithm as monotonic backward and forward search phases. Moreover, we propose
new tight bounds for pruning nodes being searched. Third, we propose a {\em
notify-push} paradigm and \PINE {\em programming model} of \DKWS. The
notify-push paradigm allows {\em asynchronously} exchanging the upper bounds of
matches across the workers and the coordinator in \DKWS. The \PINE
programming model naturally fits keyword search algorithms, as they have
distinguished phases, to allow {\em preemptive} searches to mitigate staleness
in a distributed system. Finally, we investigate the performance and
effectiveness of \DKWS through experiments using real-world datasets. We find
that \DKWS is up to two orders of magnitude faster than related techniques,
and its communication costs are times smaller than those of other
techniques
Reverse spatial visual top-k query
With the wide application of mobile Internet techniques an location-based services (LBS), massive multimedia data with geo-tags has been generated and collected. In this paper, we investigate a novel type of spatial query problem, named reverse spatial visual top- query (RSVQ k ) that aims to retrieve a set of geo-images that have the query as one of the most relevant geo-images in both geographical proximity and visual similarity. Existing approaches for reverse top- queries are not suitable to address this problem because they cannot effectively process unstructured data, such as image. To this end, firstly we propose the definition of RSVQ k problem and introduce the similarity measurement. A novel hybrid index, named VR 2 -Tree is designed, which is a combination of visual representation of geo-image and R-Tree. Besides, an extension of VR 2 -Tree, called CVR 2 -Tree is introduced and then we discuss the calculation of lower/upper bound, and then propose the optimization technique via CVR 2 -Tree for further pruning. In addition, a search algorithm named RSVQ k algorithm is developed to support the efficient RSVQ k query. Comprehensive experiments are conducted on four geo-image datasets, and the results illustrate that our approach can address the RSVQ k problem effectively and efficiently
Geo-Social Group Queries with Minimum Acquaintance Constraint
The prosperity of location-based social networking services enables
geo-social group queries for group-based activity planning and marketing. This
paper proposes a new family of geo-social group queries with minimum
acquaintance constraint (GSGQs), which are more appealing than existing
geo-social group queries in terms of producing a cohesive group that guarantees
the worst-case acquaintance level. GSGQs, also specified with various spatial
constraints, are more complex than conventional spatial queries; particularly,
those with a strict NN spatial constraint are proved to be NP-hard. For
efficient processing of general GSGQ queries on large location-based social
networks, we devise two social-aware index structures, namely SaR-tree and
SaR*-tree. The latter features a novel clustering technique that considers both
spatial and social factors. Based on SaR-tree and SaR*-tree, efficient
algorithms are developed to process various GSGQs. Extensive experiments on
real-world Gowalla and Dianping datasets show that our proposed methods
substantially outperform the baseline algorithms based on R-tree.Comment: This is the preprint version that is accepted by the Very Large Data
Bases Journa
On Graph Stream Clustering with Side Information
Graph clustering becomes an important problem due to emerging applications
involving the web, social networks and bio-informatics. Recently, many such
applications generate data in the form of streams. Clustering massive, dynamic
graph streams is significantly challenging because of the complex structures of
graphs and computational difficulties of continuous data. Meanwhile, a large
volume of side information is associated with graphs, which can be of various
types. The examples include the properties of users in social network
activities, the meta attributes associated with web click graph streams and the
location information in mobile communication networks. Such attributes contain
extremely useful information and has the potential to improve the clustering
process, but are neglected by most recent graph stream mining techniques. In
this paper, we define a unified distance measure on both link structures and
side attributes for clustering. In addition, we propose a novel optimization
framework DMO, which can dynamically optimize the distance metric and make it
adapt to the newly received stream data. We further introduce a carefully
designed statistics SGS(C) which consume constant storage spaces with the
progression of streams. We demonstrate that the statistics maintained are
sufficient for the clustering process as well as the distance optimization and
can be scalable to massive graphs with side attributes. We will present
experiment results to show the advantages of the approach in graph stream
clustering with both links and side information over the baselines.Comment: Full version of SIAM SDM 2013 pape
Retrieving Top-N Weighted Spatial k-cliques
Spatial data analysis is a classic yet important topic because of its wide range of applications. Recently, as a spatial data analysis approach, a neighbor graph of a set P of spatial points has often been employed. This paper also considers a spatial neighbor graph and addresses a new problem, namely top-N weighted spatial k-clique retrieval. This problem searches for the N minimum weighted cliques consisting of k points in P, and it has important applications, such as community detection and co-location pattern mining. Recent spatial datasets have many points, and efficiently dealing with such big datasets is one of the main requirements of applications. A straightforward approach to solving our problem is to try to enumerate all k-cliques, which incurs O(nkk2) time. Since k ≥ 3, this approach cannot achieve the main requirement, so computing the result without enumerating unnecessary k-cliques is required. This paper achieves this challenging task and proposes a simple practically-efficient algorithm that returns the exact answer. We conduct experiments using two real spatial datasets consisting of million points, and the results show the efficiency of our algorithm, e.g., it can return the exact top-N result within 1 second when N ≤ 1000 and k ≤ 7.Taniguchi R., Amagata D., Hara T.. Retrieving Top-N Weighted Spatial k-cliques. Proceedings - 2022 IEEE International Conference on Big Data, Big Data 2022 , 4952 (2022); https://doi.org/10.1109/BigData55660.2022.10021071
- …