Search CORE

46 research outputs found

GreedyDual-Join: Locality-Aware Buffer Management for Approximate Join Processing Over Data Streams

Author: Chang Ching
Li Feifei
Bestavros Azer
Kollios
Publication venue: Boston University Computer Science Department
Publication date: 01/01/1997
Field of study

We investigate adaptive buffer management techniques for approximate evaluation of sliding window joins over multiple data streams. In many applications, data stream processing systems have limited memory or have to deal with very high speed data streams. In both cases, computing the exact results of joins between these streams may not be feasible, mainly because the buffers used to compute the joins contain much smaller number of tuples than the tuples contained in the sliding windows. Therefore, a stream buffer management policy is needed in that case. We show that the buffer replacement policy is an important determinant of the quality of the produced results. To that end, we propose GreedyDual-Join (GDJ) an adaptive and locality-aware buffering technique for managing these buffers. GDJ exploits the temporal correlations (at both long and short time scales), which we found to be prevalent in many real data streams. We note that our algorithm is readily applicable to multiple data streams and multiple joins and requires almost no additional system resources. We report results of an experimental study using both synthetic and real-world data sets. Our results demonstrate the superiority and flexibility of our approach when contrasted to other recently proposed techniques

Boston University Institutional Repository (OpenBU)

CQ-Buddy: Harnessing Peers For Distributed Continuous Query Processing

Author: Ng Wee Siong
Shu Yanfeng
Tok Wee Hyong
Publication venue
Publication date: 01/01/2004
Field of study

In this paper, we present the design and evaluation of CQ-Buddy, a peer-to-peer (p2p) continuous query (CQ) processing system that is distributed, and highly-scalable. CQ-Buddy exploits the differences in capabilities (processing and memory) of peers and load-balances the tasks across powerful and weak peers. Our main contributions are as follows: First, CQ-Buddy introduces the notion of pervasive continuous queries to tackle the frequent disconnected problems common in a peer-to-peer environment. Second, CQ-Buddy allows for inter-sharing and intra-sharing in the processing of continuous queries amongst peers. Third, CQ-Buddy peers perform query-centric load balancing for overloaded data source providers by acting as proxies. We have conducted extensive studies to evaluate CQ-Buddy’s performance. Our results show that CQ-Buddy is highly scalable, and is able to process continuous queries in an effective and efficient manner.Singapore-MIT Alliance (SMA

DSpace@MIT

Exploiting the Power of Relational Databases for Efficient Stream Processing

Author: Idreos S. (Stratos)
Liarou E. (Erietta)
Pereira Goncalves R.A. (Romulo Antonio)
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 01/01/2009
Field of study

Stream applications gained significant popularity over the last years that lead to the development of specialized stream engines. These systems are designed from scratch with a different philosophy than nowadays database engines in order to cope with the stream applications requirements. However, this means that they lack the power and sophisticated techniques of a full fledged database system that exploits techniques and algorithms accumulated over many years of database research. In this paper, we take the opposite route and design a stream engine directly on top of a database kernel. Incoming tuples are directly stored upon arrival in a new kind of system tables, called baskets. A continuous query can then be evaluated over its relevant baskets as a typical one-time query exploiting the power of the relational engine. Once a tuple has been seen by all relevant queries/operators, it is dropped from its basket. A basket can be the input to a single or multiple similar query plans. Furthermore, a query plan can be split into multiple parts each one with its own input/output baskets allowing for flexible load sharing query scheduling. Contrary to traditional stream engines, that process one tuple at a time, this model allows batch processing of tuples, e.g., query a basket only after

x

tuples arrive or after a time threshold has passed. Furthermore, we are not restricted to process tuples in the order they arrive. Instead, we can selectively pick tuples from a basket based on the query requirements exploiting a novel query component, the basket expressions. We investigate the opportunities and challenges that arise with such a direction and we show that it carries significant advantages. We propose a complete architecture, the DataCell, which we implemented on top of an open-source column-oriented DBMS. A detailed analysis and experimental evaluation of the core algorithms using both micro benchmarks and the standard Linear Road benchmark demonstrate the potential of this new approach

CWI's Institutional Repository

Exploiting the power of relational databases for efficient stream processing

Author: Idreos Stratos
Liarou Erietta
Pereira Goncalves Romulo Antonio
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2009
Field of study

textabstractStream applications gained significant popularity over the last years that lead to the development of specialized stream engines. These systems are designed from scratch with a different philosophy than nowadays database engines in order to cope with the stream applications requirements. However, this means that they lack the power and sophisticated techniques of a full fledged database system that exploits techniques and algorithms accumulated over many years of database research. In this paper, we take the opposite route and design a stream engine directly on top of a database kernel. Incoming tuples are directly stored upon arrival in a new kind of system tables, called baskets. A continuous query can then be evaluated over its relevant baskets as a typical one-time query exploiting the power of the relational engine. Once a tuple has been seen by all relevant queries/operators, it is dropped from its basket. A basket can be the input to a single or multiple similar query plans. Furthermore, a query plan can be split into multiple parts each one with its own input/output baskets allowing for flexible load sharing query scheduling. Contrary to traditional stream engines, that process one tuple at a time, this model allows batch processing of tuples, e.g., query a basket only after

x

Crossref

CWI's Institutional Repository

RFID REAL TIME TRACKER

Author: WAE-ESOR JETSRAPHORN
Publication venue: Universiti Teknologi Petronas
Publication date: 01/01/2008
Field of study

The author has successfully completed Dissertation on RFID Real Time Tracker. A brief introduction of Radio Frequency Identification (RFID) is introduced including objectives, problem statement, scope of study, methodology and finding based on the research on RFID techniques. The purpose of this Dissertation is mainly to allow supervisor and examiners to evaluate her work on RFID Real Time Tracker based on the report which explain in writing about the contents of the project and its significance, like the problem statement, objective, scope, literature review, methodology used, results, conclusions and recommendations. Gain experience with applying the RFID knowledge and also to use the RFID concepts to solve in students tracking in real time. This Dissertation can be divided into five (5) chapters: Introduction, Literature review/Theory, Methodology/project work, Results and Discussion, lastly with Conclusion and Recommendation. In these the author learnt how to carry out simple support tasks which enhanced the author Professional Knowledge and Soft Skill Improvement. RFID Real Time Tracker is a system that applies the advantages of RFID technology to track the students that entering building 1 in real time which can help the security guard to solve the problem of stealing cases that always happen in UTP. From the research work from FYP I - II, the author divided methodology used into 5 stages: Design system, Software Development (Interface), Hardware Testing (Hyper Terminal), Hardware and Software Integration and Model Development. In order to make sure that system is working, testing is needed. The fmding can prove that system is really work as objective desired

UTPedia