1 research outputs found
Query-driven Frequent Co-occurring Term Extraction over Relational Data using MapReduce
In this paper we study how to efficiently compute \textit{frequent
co-occurring terms} (FCT) in the results of a keyword query in parallel using
the popular MapReduce framework. Taking as input a keyword query q and an
integer k, an FCT query reports the k terms that are not in q, but appear most
frequently in the results of the keyword query q over multiple joined
relations. The returned terms of FCT search can be used to do query expansion
and query refinement for traditional keyword search. Different from the method
of FCT search in a single platform, our proposed approach can efficiently
answer a FCT query using the MapReduce Paradigm without pre-computing the
results of the original keyword query, which is run in parallel platform. In
this work, we can output the final FCT search results by two MapReduce jobs:
the first is to extract the statistical information of the data; and the second
is to calculate the total frequency of each term based on the output of the
first job. At the two MapReduce jobs, we would guarantee the load balance of
mappers and the computational balance of reducers as much as possible.
Analytical and experimental evaluations demonstrate the efficiency and
scalability of our proposed approach using TPC-H benchmark datasets with
different sizes