605 research outputs found

    Verifiable and private top-k monitoring

    Get PDF

    Secure server-aided top-k monitoring

    Get PDF
    National Research Foundation (NRF) Singapor

    Reverse spatial visual top-k query

    Get PDF
    With the wide application of mobile Internet techniques an location-based services (LBS), massive multimedia data with geo-tags has been generated and collected. In this paper, we investigate a novel type of spatial query problem, named reverse spatial visual top- kk query (RSVQ k ) that aims to retrieve a set of geo-images that have the query as one of the most relevant geo-images in both geographical proximity and visual similarity. Existing approaches for reverse top- kk queries are not suitable to address this problem because they cannot effectively process unstructured data, such as image. To this end, firstly we propose the definition of RSVQ k problem and introduce the similarity measurement. A novel hybrid index, named VR 2 -Tree is designed, which is a combination of visual representation of geo-image and R-Tree. Besides, an extension of VR 2 -Tree, called CVR 2 -Tree is introduced and then we discuss the calculation of lower/upper bound, and then propose the optimization technique via CVR 2 -Tree for further pruning. In addition, a search algorithm named RSVQ k algorithm is developed to support the efficient RSVQ k query. Comprehensive experiments are conducted on four geo-image datasets, and the results illustrate that our approach can address the RSVQ k problem effectively and efficiently

    Finding Top-k Dominance on Incomplete Big Data Using Map-Reduce Framework

    Full text link
    Incomplete data is one major kind of multi-dimensional dataset that has random-distributed missing nodes in its dimensions. It is very difficult to retrieve information from this type of dataset when it becomes huge. Finding top-k dominant values in this type of dataset is a challenging procedure. Some algorithms are present to enhance this process but are mostly efficient only when dealing with a small-size incomplete data. One of the algorithms that make the application of TKD query possible is the Bitmap Index Guided (BIG) algorithm. This algorithm strongly improves the performance for incomplete data, but it is not originally capable of finding top-k dominant values in incomplete big data, nor is it designed to do so. Several other algorithms have been proposed to find the TKD query, such as Skyband Based and Upper Bound Based algorithms, but their performance is also questionable. Algorithms developed previously were among the first attempts to apply TKD query on incomplete data; however, all these had weak performances or were not compatible with the incomplete data. This thesis proposes MapReduced Enhanced Bitmap Index Guided Algorithm (MRBIG) for dealing with the aforementioned issues. MRBIG uses the MapReduce framework to enhance the performance of applying top-k dominance queries on huge incomplete datasets. The proposed approach uses the MapReduce parallel computing approach using multiple computing nodes. The framework separates the tasks between several computing nodes that independently and simultaneously work to find the result. This method has achieved up to two times faster processing time in finding the TKD query result in comparison to previously presented algorithms

    Cleaning uncertain data for top-k queries

    Get PDF
    The information managed in emerging applications, such as sensor networks, location-based services, and data integration, is inherently imprecise. To handle data uncertainty, probabilistic databases have been recently developed. In this paper, we study how to quantify the ambiguity of answers returned by a probabilistic top-k query. We develop efficient algorithms to compute the quality of this query under the possible world semantics. We further address the cleaning of a probabilistic database, in order to improve top-k query quality. Cleaning involves the reduction of ambiguity associated with the database entities. For example, the uncertainty of a temperature value acquired from a sensor can be reduced, or cleaned, by requesting its newest value from the sensor. While this 'cleaning operation' may produce a better query result, it may involve a cost and fail. We investigate the problem of selecting entities to be cleaned under a limited budget. Particularly, we propose an optimal solution and several heuristics. Experiments show that the greedy algorithm is efficient and close to optimal. © 2013 IEEE.published_or_final_versio

    Temporal search in document streams

    Get PDF
    In this thesis, we address major challenges in searching temporal document collections. In such collections, documents are created and/or edited over time. Examples of temporal document collections are web archives, news archives, blogs, personal emails and enterprise documents. Unfortunately, traditional IR approaches based on termmatching only can give unsatisfactory results when searching temporal document collections. The reason for this is twofold: the contents of documents are strongly time-dependent, i.e., documents are about events happened at particular time periods, and a query representing an information need can be time-dependent as well, i.e., a temporal query. On the other hand, time-only-based methods fall short when it comes to reasoning about events in social media. During the last few years users create chronologically ordered documents about topics that draw their attention in an ever increasing pace. However, with the vast adoption of social media, new types of marketing campaigns have been developed in order to promote content, i.e. brands, products, celebrities, etc

    Load shedding in network monitoring applications

    Get PDF
    Monitoring and mining real-time network data streams are crucial operations for managing and operating data networks. The information that network operators desire to extract from the network traffic is of different size, granularity and accuracy depending on the measurement task (e.g., relevant data for capacity planning and intrusion detection are very different). To satisfy these different demands, a new class of monitoring systems is emerging to handle multiple and arbitrary monitoring applications. Such systems must inevitably cope with the effects of continuous overload situations due to the large volumes, high data rates and bursty nature of the network traffic. These overload situations can severely compromise the accuracy and effectiveness of monitoring systems, when their results are most valuable to network operators. In this thesis, we propose a technique called load shedding as an effective and low-cost alternative to over-provisioning in network monitoring systems. It allows these systems to handle efficiently overload situations in the presence of multiple, arbitrary and competing monitoring applications. We present the design and evaluation of a predictive load shedding scheme that can shed excess load in front of extreme traffic conditions and maintain the accuracy of the monitoring applications within bounds defined by end users, while assuring a fair allocation of computing resources to non-cooperative applications. The main novelty of our scheme is that it considers monitoring applications as black boxes, with arbitrary (and highly variable) input traffic and processing cost. Without any explicit knowledge of the application internals, the proposed scheme extracts a set of features from the traffic streams to build an on-line prediction model of the resource requirements of each monitoring application, which is used to anticipate overload situations and control the overall resource usage by sampling the input packet streams. This way, the monitoring system preserves a high degree of flexibility, increasing the range of applications and network scenarios where it can be used. Since not all monitoring applications are robust against sampling, we then extend our load shedding scheme to support custom load shedding methods defined by end users, in order to provide a generic solution for arbitrary monitoring applications. Our scheme allows the monitoring system to safely delegate the task of shedding excess load to the applications and still guarantee fairness of service with non-cooperative users. We implemented our load shedding scheme in an existing network monitoring system and deployed it in a research ISP network. We present experimental evidence of the performance and robustness of our system with several concurrent monitoring applications during long-lived executions and using real-world traffic traces.Postprint (published version
    corecore