9 research outputs found

    Efficient Probabilistic Subsumption Checking for Content-Based Publish/Subscribe Systems

    Get PDF
    Abstract. Efficient subsumption checking, deciding whether a subscription or publication is covered by a set of previously defined subscriptions, is of paramount importance for publish/subscribe systems. It provides the core system functionality—matching of publications to subscriber needs expressed as subscriptions—and additionally, reduces the overall system load and generated traffic since the covered subscriptions are not propagated in distributed environments. As the subsumption problem was shown previously to be co-NP complete and existing solutions typically apply pairwise comparisons to detect the subsumption relationship, we propose a ‘Monte Carlo type ’ probabilistic algorithm for the general subsumption problem. It determines whether a publication/subscription is covered by a disjunction of subscriptions in O(k md), wherek is the number of subscriptions, m is the number of distinct attributes in subscriptions, and d is the number of tests performed to answer a subsumption question. The probability of error is problem-specific and typically very small, and sets an upper bound on d. Our experimental results show significant gains in term of subscription set reduction which has favorable impact on the overall system performance as it reduces the total computational costs and networking traffic. Furthermore, the expected theoretical bounds underestimate algorithm performance because it performs much better in practice due to introduced optimizations, and is adequate for fast forwarding of subscriptions in case of high subscription rate.

    Self-adaptive event recognition for intelligent transport management

    Full text link
    Intelligent transport management involves the use of voluminous amounts of uncertain sensor data to identify and effectively manage issues of congestion and quality of service. In particular, urban traffic has been in the eye of the storm for many years now and gathers increasing interest as cities become bigger, crowded, and “smart”. In this work we tackle the issue of uncertainty in transportation systems stream reporting. The variety of existing data sources opens new opportunities for testing the validity of sensor reports and self-adapting the recognition of complex events as a result. We report on the use of a logic-based event reasoning tool to identify regions of uncertainty within a stream and demonstrate our method with a real-world use-case from the city of Dublin. Our empirical analysis shows the feasibility of the approach when dealing with voluminous and highly uncertain streams

    Evaluation of Expressions with Uncertainty in Databases

    Get PDF
    Expressions are used in a range of applications like Publish/Subscribe, Ecommerce, etc. Integrating support for expressions in a database management system (DBMS) provides an efficient and scalable platform for applications that use Expressions. Support from uncertain data and expressions can be beneficial but not currently provided for. In this thesis, we investigate how expressions with uncertainty can be integrated in a DBMS like other data. We describe the underlying theory and implementation of UNXS (UNcertain eXpression System), a system that we have developed to handle uncertainty in expressions and data. We develop a theoretical model to compare and contrast different previous work in supporting uncertainty in DBMS and Publish/Subscribe systems. We extend the existing approaches to propose new techniques for matching uncertain expressions to uncertain data in UNXS. We then describe an implementation that integrates this support in Postgresql DBMS, which to our knowledge is the first such implementation

    Top-k/w publish/subscribe: finding k most relevant publications in sliding time window w

    Get PDF
    Existing content-based publish/subscribe systems are designed assuming that all matching publications are equally relevant to a subscription. As we cannot know in advance the distribution of publication content, the following two unwanted situations are highly possible: a subscriber either receives too many or only few publications. In this paper we present a new publish/subscribe model which is based on the sliding window computation model. Our model assumes that publications have different relevance to a subscription. In the model, a subscriber receives k most relevant publications published within a time window w, where k and w are parameters defined per each subscription. As a consequence, the arrival rate of incoming relevant publications per subscription is predefined, and does not depend on the publication rate. Since all relevant objects (i.e. publications in our case) cannot be kept in main memory, existing solutions immediately discard less relevant objects, and store only a small representative set for subsequent delivery. In this paper we develop a probabilistic criterion to decide upon the arrival of a new object whether it may become the top-k object at some future point in time and should thus be stored in a special publications queue. We show that by accepting typically very small probability of error, the queue length is reasonably small and does not significantly depend on publication rate. Furthermore, we experimentally evaluate our approach to demonstrate its applicability in practice

    Preference-aware publish/subscribe delivery with diversity

    Full text link
    In publish/subscribe systems, users describe their interests via subscriptions and are notified whenever new interesting events become available. Typically, in such systems, all sub-scriptions are considered equally important. However, due to the abundance of information, users may receive over-whelming amounts of events. In this paper, we propose us-ing a ranking mechanism based on user preferences, so that only top-ranked events are delivered to each user. Since many times top-ranked events are similar to each other, we also propose increasing the diversity of delivered events. Furthermore, we examine a number of different delivering policies for forwarding ranked events to users, namely a pe-riodic, a sliding-window and a history-based one. We have fully implemented our approach in SIENA, a popular pub-lish/subscribe middleware system, and report experimental results of its deployment. 1

    Fast Probabilistic Subsumption Checking for Publish/Subscribe Systems

    Get PDF
    Efficient subsumption checking, deciding whether a subscription or publication is subsumed (covered) by a set of previously defined subscriptions, is of paramount importance for publish/subscribe systems. It provides the core system functionality, and additionally, reduces the overall system load and generated traffic in distributed environments. As the deterministic solution was shown previously to be co-NP complete and existing solutions typically employ costly pairwise comparisons to detect the subsumption relationship, we propose a probabilistic algorithm for the general subsumption problem. It efficiently determines whether a publication/subscription is covered by a disjunction of subscriptions in O(k m d)O(k~m~d), where kk is the number of subscriptions, mm is the number of distinct attributes in subscriptions, and dd is the number of tests performed to answer a subsumption question. The probability of error is problem specific and typically very small, and determines an upper bound on dd in polynomial time prior to the algorithm execution. Our experimental results demonstrate the algorithm performs even better in practice due to introduced optimizations, and is adequate for fast forwarding of publications/subscriptions, especially in resource scarce environments, e.g. sensor networks

    Forecasting in Database Systems

    Get PDF
    Time series forecasting is a fundamental prerequisite for decision-making processes and crucial in a number of domains such as production planning and energy load balancing. In the past, forecasting was often performed by statistical experts in dedicated software environments outside of current database systems. However, forecasts are increasingly required by non-expert users or have to be computed fully automatically without any human intervention. Furthermore, we can observe an ever increasing data volume and the need for accurate and timely forecasts over large multi-dimensional data sets. As most data subject to analysis is stored in database management systems, a rising trend addresses the integration of forecasting inside a DBMS. Yet, many existing approaches follow a black-box style and try to keep changes to the database system as minimal as possible. While such approaches are more general and easier to realize, they miss significant opportunities for improved performance and usability. In this thesis, we introduce a novel approach that seamlessly integrates time series forecasting into a traditional database management system. In contrast to flash-back queries that allow a view on the data in the past, we have developed a Flash-Forward Database System (F2DB) that provides a view on the data in the future. It supports a new query type - a forecast query - that enables forecasting of time series data and is automatically and transparently processed by the core engine of an existing DBMS. We discuss necessary extensions to the parser, optimizer, and executor of a traditional DBMS. We furthermore introduce various optimization techniques for three different types of forecast queries: ad-hoc queries, recurring queries, and continuous queries. First, we ease the expensive model creation step of ad-hoc forecast queries by reducing the amount of processed data with traditional sampling techniques. Second, we decrease the runtime of recurring forecast queries by materializing models in a specialized index structure. However, a large number of time series as well as high model creation and maintenance costs require a careful selection of such models. Therefore, we propose a model configuration advisor that determines a set of forecast models for a given query workload and multi-dimensional data set. Finally, we extend forecast queries with continuous aspects allowing an application to register a query once at our system. As new time series values arrive, we send notifications to the application based on predefined time and accuracy constraints. All of our optimization approaches intend to increase the efficiency of forecast queries while ensuring high forecast accuracy

    Modeling Uncertainties in Publish/Subscribe Systems

    No full text
    In the publish/subscribe paradigm, information providers disseminate publications to all consumers who have expressed interest by registering subscriptions. This paradigm has found wide-spread applications, ranging from selective information dissemination to network management. However, all existing publish/subscribe systems cannot capture uncertainty inherent to the information in either subscriptions or publications. In many situations, exact knowledge of either specific subscriptions or publications is not available. Moreover, especially in selective information dissemination applications, it is often more appropriate for a user to formulate her search requests or information offers in less precise terms, rather than defining a sharp limit. To address this problem, this paper proposes a new publish/subscribe model based on possibility theory and fuzzy set theory to process uncertainties for both subscriptions and publications. Furthermore, an approximate publish/subscribe matching problem is defined and algorithms for solving it are developed and evaluated.
    corecore