46 research outputs found
GreedyDual-Join: Locality-Aware Buffer Management for Approximate Join Processing Over Data Streams
We investigate adaptive buffer management techniques for approximate evaluation of sliding window joins over multiple data streams. In many applications, data stream processing systems have limited memory or have to deal with very high speed data streams. In both cases, computing the exact results of joins between these streams may not be feasible, mainly because the buffers used to compute the joins contain much smaller number of tuples than the tuples contained in the sliding windows. Therefore, a stream buffer management policy is needed in that case. We show that the buffer replacement policy is an important determinant of the quality of the produced results. To that end, we propose GreedyDual-Join (GDJ) an adaptive and locality-aware buffering technique for managing these buffers. GDJ exploits the temporal correlations (at both long and short time scales), which we found to be prevalent in many real data streams. We note that our algorithm is readily applicable to multiple data streams and multiple joins and requires almost no additional system resources. We report results of an experimental study using both synthetic and real-world data sets. Our results demonstrate the superiority and flexibility of our approach when contrasted to other recently proposed techniques
CQ-Buddy: Harnessing Peers For Distributed Continuous Query Processing
In this paper, we present the design and evaluation of CQ-Buddy, a peer-to-peer (p2p) continuous query (CQ) processing system that is distributed, and highly-scalable. CQ-Buddy exploits the differences in capabilities (processing and memory) of peers and load-balances the tasks across powerful and weak peers. Our main contributions are as follows: First, CQ-Buddy introduces the notion of pervasive continuous queries to tackle the frequent disconnected problems common in a peer-to-peer environment. Second, CQ-Buddy allows for inter-sharing and intra-sharing in the processing of continuous queries amongst peers. Third, CQ-Buddy peers perform query-centric load balancing for overloaded data source providers by acting as proxies. We have conducted extensive studies to evaluate CQ-Buddy’s performance. Our results show that CQ-Buddy is highly scalable, and is able to process continuous queries in an effective and efficient manner.Singapore-MIT Alliance (SMA
Exploiting the Power of Relational Databases for Efficient Stream Processing
Stream applications gained significant popularity over
the last years that lead to the development of specialized stream engines.
These systems are designed from scratch with a different
philosophy than nowadays database engines in order to cope with the stream applications requirements.
However, this means that they lack the power and sophisticated techniques of a full fledged
database system that exploits techniques and algorithms accumulated over many years of database research.
In this paper, we take the opposite route and design a stream engine directly on top of a database kernel.
Incoming tuples are directly stored upon arrival in a new kind of system tables, called baskets.
A continuous query can then be evaluated over its relevant baskets as a typical one-time query
exploiting the power of the relational engine.
Once a tuple has been seen by all relevant queries/operators, it is dropped from its basket.
A basket can be the input to a single or multiple similar query plans.
Furthermore, a query plan can be split into multiple parts each one with its own
input/output baskets allowing for flexible load sharing query scheduling.
Contrary to traditional stream engines, that process one tuple at a time,
this model allows batch processing of tuples, e.g., query a basket only after tuples arrive
or after a time threshold has passed.
Furthermore, we are not restricted to process tuples in the order they arrive.
Instead, we can selectively pick tuples from a basket based on the query requirements exploiting
a novel query component, the basket expressions.
We investigate the opportunities and challenges that arise with such a direction and we show that it carries significant advantages.
We propose a complete architecture, the DataCell, which we implemented on top of an open-source column-oriented DBMS.
A detailed analysis and experimental evaluation of the core algorithms using both micro benchmarks and
the standard Linear Road benchmark demonstrate the potential of this new approach
Exploiting the power of relational databases for efficient stream processing
textabstractStream applications gained significant popularity over
the last years that lead to the development of specialized stream engines.
These systems are designed from scratch with a different
philosophy than nowadays database engines in order to cope with the stream applications requirements.
However, this means that they lack the power and sophisticated techniques of a full fledged
database system that exploits techniques and algorithms accumulated over many years of database research.
In this paper, we take the opposite route and design a stream engine directly on top of a database kernel.
Incoming tuples are directly stored upon arrival in a new kind of system tables, called baskets.
A continuous query can then be evaluated over its relevant baskets as a typical one-time query
exploiting the power of the relational engine.
Once a tuple has been seen by all relevant queries/operators, it is dropped from its basket.
A basket can be the input to a single or multiple similar query plans.
Furthermore, a query plan can be split into multiple parts each one with its own
input/output baskets allowing for flexible load sharing query scheduling.
Contrary to traditional stream engines, that process one tuple at a time,
this model allows batch processing of tuples, e.g., query a basket only after tuples arrive
or after a time threshold has passed.
Furthermore, we are not restricted to process tuples in the order they arrive.
Instead, we can selectively pick tuples from a basket based on the query requirements exploiting
a novel query component, the basket expressions.
We investigate the opportunities and challenges that arise with such a direction and we show that it carries significant advantages.
We propose a complete architecture, the DataCell, which we implemented on top of an open-source column-oriented DBMS.
A detailed analysis and experimental evaluation of the core algorithms using both micro benchmarks and
the standard Linear Road benchmark demonstrate the potential of this new approach
RFID REAL TIME TRACKER
The author has successfully completed Dissertation on RFID Real Time Tracker. A brief
introduction of Radio Frequency Identification (RFID) is introduced including objectives,
problem statement, scope of study, methodology and finding based on the research on
RFID techniques. The purpose of this Dissertation is mainly to allow supervisor and
examiners to evaluate her work on RFID Real Time Tracker based on the report which
explain in writing about the contents of the project and its significance, like the problem
statement, objective, scope, literature review, methodology used, results, conclusions and
recommendations. Gain experience with applying the RFID knowledge and also to use
the RFID concepts to solve in students tracking in real time. This Dissertation can be
divided into five (5) chapters: Introduction, Literature review/Theory,
Methodology/project work, Results and Discussion, lastly with Conclusion and
Recommendation. In these the author learnt how to carry out simple support tasks which
enhanced the author Professional Knowledge and Soft Skill Improvement.
RFID Real Time Tracker is a system that applies the advantages of RFID technology to
track the students that entering building 1 in real time which can help the security guard
to solve the problem of stealing cases that always happen in UTP. From the research
work from FYP I - II, the author divided methodology used into 5 stages: Design system,
Software Development (Interface), Hardware Testing (Hyper Terminal), Hardware and
Software Integration and Model Development. In order to make sure that system is
working, testing is needed. The fmding can prove that system is really work as objective
desired