1 research outputs found

    Binary search join between an IR system and an RDBMS

    No full text
    Integrating relational database technologies into Web Information Retrieval enables users to ask complex queries beyond traditional keyword searches over web pages. One approach to this integration is to have a software layer on top of an Information Retrieval (IR) system and an RDBMS (Relational Database Management System). A core operation in this top layer is to join the intermediate results from the two underlying systems (called the IR results and the DB results correspondingly) in order to produce the final ranked results for each query. Unfortunately, most conventional join algorithms are inefficient for this operation. In this paper, we propose one simple join algorithm called Binary Search Join (BSJ) for the operation of joining the IR results and the DB results. This algorithm takes advantage of the fact that the IR results are already ranked by relevance and that the DB results are already sorted by the join attribute. It scans the IR results and for each IR result tuple performs a binary search over the DB results. We analytically and empirically study the performance of BSJ in comparison with several conventional join algorithms on a repository of Chinese news web pages. The experiment results prove that BSJ works best in most cases. © 2006 IEEE
    corecore