1 research outputs found

    An efficient stream-based join to process end user transactions in real-time data warehousing

    No full text
    In the field of real-time data warehousing semistream processing has become a potential area of research since last one decade. One important operation in semi-stream processing is to join stream data with a slowly changing diskbased master data. A join operator is usually required to implement this operation. This join operator typically works under limited main memory and this memory is generally not large enough to hold the whole disk-based master data. Recently, a seminal join algorithm called MESHJOIN (Mesh Join) has been proposed in the literature to process semistream data. MESHJOIN is a candidate for a resource-aware system setup. However, MESHJOIN is not very selective. In particular, MESHJOIN does not consider the characteristics of stream data and its performance is suboptimal for skewed stream data. In this paper we propose a novel Semi-Stream Join (SSJ) using a new cache module. The algorithm is more appropriate for skewed distributions, and we present results for Zipfian distributions of the type that appears in many applications. We present the cost model for our SSJ and validate it with experiments. Based on the cost model we also tune the algorithm up to a maximum performance. We conduct a rigorous experimental study to test our algorithm. Our experiments show that SSJ outperforms MESHJOIN significantl
    corecore