670,916 research outputs found

    Top-k spatial-keyword publish/subscribe over sliding window

    Full text link
    © 2017, Springer-Verlag Berlin Heidelberg. With the prevalence of social media and GPS-enabled devices, a massive amount of geo-textual data have been generated in a stream fashion, leading to a variety of applications such as location-based recommendation and information dissemination. In this paper, we investigate a novel real-time top-k monitoring problem over sliding window of streaming data; that is, we continuously maintain the top-k most relevant geo-textual messages (e.g., geo-tagged tweets) for a large number of spatial-keyword subscriptions (e.g., registered users interested in local events) simultaneously. To provide the most recent information under controllable memory cost, sliding window model is employed on the streaming geo-textual data. To the best of our knowledge, this is the first work to study top-k spatial-keyword publish/subscribe over sliding window. A novel centralized system, called Skype (Top-kSpatial-keyword Publish/Subscribe), is proposed in this paper. In Skype, to continuously maintain top-k results for massive subscriptions, we devise a novel indexing structure upon subscriptions such that each incoming message can be immediately delivered on its arrival. To reduce the expensive top-k re-evaluation cost triggered by message expiration, we develop a novel cost-basedk-skyband technique to reduce the number of re-evaluations in a cost-effective way. Extensive experiments verify the great efficiency and effectiveness of our proposed techniques. Furthermore, to support better scalability and higher throughput, we propose a distributed version of Skype, namely DSkype, on top of Storm, which is a popular distributed stream processing system. With the help of fine-tuned subscription/message distribution mechanisms, DSkype can achieve orders of magnitude speed-up than its centralized version

    Monitoring frequent items over distributed data streams.

    Get PDF
    Many important applications require the discovery of items which have occurred frequently. Knowledge of these items is commonly used in anomaly detection and network monitoring tasks. Effective solutions for this problem focus mainly on reducing memory requirements in a centralized environment. These solutions, however, ignore the inherently distributed nature of many systems. Naively forwarding data to a centralized location is not practical when dealing with high speed data streams and will result in significant communication overhead. This thesis proposes a new approach designed for continuously tracking frequent items over distributed data streams, providing either exact or approximate answers. The method introduced is a direct modification to an existing communication efficient algorithm called Top-K, Monitoring. Experimental results demonstrated that the proposed modifications significantly reduced communication cost and improved scalability. Also examined in this thesis is the applicability of frequent item monitoring at detecting distributed denial of service attacks. Simulation of the proposed tracking method against four different attack patterns was conducted. The outcome of these experiments showed promising results when compared to previous detection methods

    Mint views: Materialized in-network top-k views in sensor networks

    Get PDF
    In this paper we introduce MINT (materialized in-network top-k) Views, a novel framework for optimizing the execution of continuous monitoring queries in sensor networks. A typical materialized view V maintains the complete results of a query Q in order to minimize the cost of future query executions. In a sensor network context, maintaining consistency between V and the underlying and distributed base relation R is very expensive in terms of communication. Thus, our approach focuses on a subset V(sube. V) that unveils only the k highest-ranked answers at the sink for some user defined parameter k. We additionally provide an elaborate description of energy-conscious algorithms for constructing, pruning and maintaining such recursively- defined in-network views. Our trace-driven experimentation with real datasets show that MINT offers significant energy reductions compared to other predominant data acquisition models

    Ranking in Distributed Uncertain Database Environments

    Get PDF
    Distributed data processing is a major field in nowadays applications. Many applications collect and process data from distributed nodes to gain overall results. Large amount of data transfer and network delay made data processing in a centralized manner a hard operation representing an important problem. A very common way to solve this problem is ranking queries. Ranking or top-k queries concentrate only on the highest ranked tuples according to user's interest. Another issue in most nowadays applications is data uncertainty. Many techniques were introduced for modeling, managing, and processing uncertain databases. Although these techniques were efficient, they didn't deal with distributed data uncertainty. This paper deals with both data uncertainty and distribution based on ranking queries. A novel framework is proposed for ranking distributed uncertain data. The framework has a suite of novel algorithms for ranking data and monitoring updates. These algorithms help in reducing the communication rounds used and amount of data transmitted while achieving efficient and effective ranking. Experimental results show that the proposed framework has a great impact in reducing communication cost compared to other techniques.DOI:http://dx.doi.org/10.11591/ijece.v4i4.592

    Top-k aggregation queries in large-scale distributed systems

    Get PDF
    Distributed top-k query processing has recently become an essential functionality in a large number of emerging application classes like Internet traffic monitoring and Peer-to-Peer Web search. This work addresses efficient algorithms for distributed top-k queries in wide-area networks where the index lists for the attribute values (or text terms) of a query are distributed across a number of data peers. More precisely, in this thesis, we make the following distributions: We present the family of KLEE algorithms that are a fundamental building-block towards efficient top-k query processing in distributed systems. We present means to model score distributions and show how these score models can be used to reason about parameter values that play an important role in the overall performance of KLEE. We present GRASS, a family of novel algorithms based on three optimization techniques significantly increased overall performance of KLEE and related algorithms. We present probabilistic guarantees for the result quality. Moreover, we present Minerva1, a distributed search engine. Minerva offers a highly distributed (in both the data dimension and the computational dimension), scalable, and efficient solution toward the development of internet-scale search engines.Top-k Anfragen spielen eine große Rolle in einer Vielzahl von Anwendungen, insbesondere im Bereich von Informationssystemen, bei denen eine kleine, sorgfältig ausgewählte Teilmenge der Ergebnisse den Benutzern präsentiert werden soll. Beispiele hierfür sind Suchmaschinen wie Google, Yahoo oder MSN. Obwohl die Forschung in diesem Bereich in den letzten Jahren große Fortschritte gemacht hat, haben Top-k-Anfragen in verteilten Systemen, bei denen die Daten auf verschiedenen Rechnern verteilt sind, vergleichsweise wenig Aufmerksamkeit erlangt. In dieser Arbeit beschäftigen wir uns mit der effizienten Verarbeitung eben dieser Anfragen. Die Hauptbeiträge gliedern sich wie folgt. Wir präsentieren KLEE, eine Familie neuartiger Top-k-Algorithmen. Wir entwickeln Modelle mit denen Datenverteilungen beschrieben werden können. Diese Modelle sind die Grundlage für eine Schätzung diverser Parameter, die einen großen Einfluss auf die Performanz von KLEE und anderen ähnlichen Algorithmen haben. Wir präsentieren GRASS, eine Familie von Algorithmen, basierend auf drei neuartigen Optimierungstechniken, mit denen die Performanz von KLEE und ähnlichen Algorithmen verbessert wird. Wir präsentieren probabilistische Garantien für die Ergebnisgüte. Wir präsentieren Minerva, eine neuartige verteilte Peer-to-Peer-Suchmaschine

    Distributed Evaluation of Top-k Temporal Joins

    No full text
    To appear in SIGMOD'16We study a particular kind of join, coined Ranked Temporal Join (RTJ), featuring predicates that compare time intervals and a scoring function associated with each predicate to quantify how well it is satisfied. RTJ queries are prevalent in a variety of applications such as network traffic monitoring , task scheduling, and tweet analysis. RTJ queries are often best interpreted as top-k queries where only the best matches are returned. We show how to exploit the nature of temporal predicates and the properties of their associated scoring semantics to design TKIJ , an efficient query evaluation approach on a distributed Map-Reduce architecture. TKIJ relies on an offline statistics computation that, given a time partitioning into granules, computes the distribution of intervals' endpoints in each granule, and an online computation that generates query-dependent score bounds. Those statistics are used for workload assignment to reducers. This aims at reducing data replication, to limit I/O cost. Additionally , high-scoring results are distributed evenly to enable each reducer to prune unnecessary results. Our extensive experiments on synthetic and real datasets show that TKIJ outperforms state-of-the-art competitors and provides very good performance for n-ary RTJ queries on temporal data
    corecore