34,875 research outputs found

    Privacy and accountability for location-based aggregate statistics

    Get PDF
    A significant and growing class of location-based mobile applications aggregate position data from individual devices at a server and compute aggregate statistics over these position streams. Because these devices can be linked to the movement of individuals, there is significant danger that the aggregate computation will violate the location privacy of individuals. This paper develops and evaluates PrivStats, a system for computing aggregate statistics over location data that simultaneously achieves two properties: first, provable guarantees on location privacy even in the face of any side information about users known to the server, and second, privacy-preserving accountability (i.e., protection against abusive clients uploading large amounts of spurious data). PrivStats achieves these properties using a new protocol for uploading and aggregating data anonymously as well as an efficient zero-knowledge proof of knowledge protocol we developed from scratch for accountability. We implemented our system on Nexus One smartphones and commodity servers. Our experimental results demonstrate that PrivStats is a practical system: computing a common aggregate (e.g., count) over the data of 10,000 clients takes less than 0.46 s at the server and the protocol has modest latency (0.6 s) to upload data from a Nexus phone. We also validated our protocols on real driver traces from the CarTel project.National Science Foundation (U.S.) (grant 0931550)National Science Foundation (U.S.) (grant 0716273

    Fully decentralized computation of aggregates over data streams

    Get PDF
    In several emerging applications, data is collected in massive streams at several distributed points of observation. A basic and challenging task is to allow every node to monitor a neighbourhood of interest by issuing continuous aggregate queries on the streams observed in its vicinity. This class of algorithms is fully decentralized and diffusive in nature: collecting all data at few central nodes of the network is unfeasible in networks of low capability devices or in the presence of massive data sets. The main difficulty in designing diffusive algorithms is to cope with duplicate detections. These arise both from the observation of the same event at several nodes of the network and/or receipt of the same aggregated information along multiple paths of diffusion. In this paper, we consider fully decentralized algorithms that answer locally continuous aggregate queries on the number of distinct events, total number of events and the second frequency moment in the scenario outlined above. The proposed algorithms use in the worst case or on realistic distributions sublinear space at every node. We also propose strategies that minimize the communication needed to update the aggregates when new events are observed. We experimentally evaluate for the efficiency and accuracy of our algorithms on realistic simulated scenarios

    Continuous Nearest Neighbor Queries over Sliding Windows

    Get PDF
    Abstract—This paper studies continuous monitoring of nearest neighbor (NN) queries over sliding window streams. According to this model, data points continuously stream in the system, and they are considered valid only while they belong to a sliding window that contains 1) the W most recent arrivals (count-based) or 2) the arrivals within a fixed interval W covering the most recent time stamps (time-based). The task of the query processor is to constantly maintain the result of long-running NN queries among the valid data. We present two processing techniques that apply to both count-based and time-based windows. The first one adapts conceptual partitioning, the best existing method for continuous NN monitoring over update streams, to the sliding window model. The second technique reduces the problem to skyline maintenance in the distance-time space and precomputes the future changes in the NN set. We analyze the performance of both algorithms and extend them to variations of NN search. Finally, we compare their efficiency through a comprehensive experimental evaluation. The skyline-based algorithm achieves lower CPU cost, at the expense of slightly larger space overhead. Index Terms—Location-dependent and sensitive, spatial databases, query processing, nearest neighbors, data streams, sliding windows.

    Towards Analytics Aware Ontology Based Access to Static and Streaming Data (Extended Version)

    Full text link
    Real-time analytics that requires integration and aggregation of heterogeneous and distributed streaming and static data is a typical task in many industrial scenarios such as diagnostics of turbines in Siemens. OBDA approach has a great potential to facilitate such tasks; however, it has a number of limitations in dealing with analytics that restrict its use in important industrial applications. Based on our experience with Siemens, we argue that in order to overcome those limitations OBDA should be extended and become analytics, source, and cost aware. In this work we propose such an extension. In particular, we propose an ontology, mapping, and query language for OBDA, where aggregate and other analytical functions are first class citizens. Moreover, we develop query optimisation techniques that allow to efficiently process analytical tasks over static and streaming data. We implement our approach in a system and evaluate our system with Siemens turbine data
    • …
    corecore