1,831 research outputs found

    Pyramid: Enhancing Selectivity in Big Data Protection with Count Featurization

    Full text link
    Protecting vast quantities of data poses a daunting challenge for the growing number of organizations that collect, stockpile, and monetize it. The ability to distinguish data that is actually needed from data collected "just in case" would help these organizations to limit the latter's exposure to attack. A natural approach might be to monitor data use and retain only the working-set of in-use data in accessible storage; unused data can be evicted to a highly protected store. However, many of today's big data applications rely on machine learning (ML) workloads that are periodically retrained by accessing, and thus exposing to attack, the entire data store. Training set minimization methods, such as count featurization, are often used to limit the data needed to train ML workloads to improve performance or scalability. We present Pyramid, a limited-exposure data management system that builds upon count featurization to enhance data protection. As such, Pyramid uniquely introduces both the idea and proof-of-concept for leveraging training set minimization methods to instill rigor and selectivity into big data management. We integrated Pyramid into Spark Velox, a framework for ML-based targeting and personalization. We evaluate it on three applications and show that Pyramid approaches state-of-the-art models while training on less than 1% of the raw data

    An Approach for Ensuring Robust Support for Location Privacy and Identity Inference Protection

    Get PDF
    The challenge of preserving a user\u27s location privacy is more important now than ever before with the proliferation of handheld devices and the pervasive use of location based services. To protect location privacy, we must ensure k-anonymity so that the user remains indistinguishable among k-1 other users. There is no better way but to use a location anonymizer (LA) to achieve k-anonymity. However, its knowledge of each user\u27s current location makes it susceptible to be a single-point-of-failure. In this thesis, we propose a formal location privacy framework, termed SafeGrid that can work with or without an LA. In SafeGrid, LA is designed in such a way that it is no longer a single point of failure. In addition, it is resistant to known attacks and most significantly, the cloaking algorithm it employs meets reciprocity condition. Simulation results exhibit its better performance in query processing and cloaking region calculation compared with existing solutions. In this thesis, we also show that satisfying k-anonymity is not enough in preserving privacy. Especially in an environment where a group of colluded service providers collaborate with each other, a user\u27s privacy can be compromised through identity inference attacks. We present a detailed analysis of such attacks on privacy and propose a novel and powerful privacy definition called s-proximity. In addition to building a formal definition for s-proximity, we show that it is practical and it can be incorporated efficiently into existing systems to make them secure

    L-SRR: Local Differential Privacy for Location-Based Services with Staircase Randomized Response

    Full text link
    Location-based services (LBS) have been significantly developed and widely deployed in mobile devices. It is also well-known that LBS applications may result in severe privacy concerns by collecting sensitive locations. A strong privacy model ''local differential privacy'' (LDP) has been recently deployed in many different applications (e.g., Google RAPPOR, iOS, and Microsoft Telemetry) but not effective for LBS applications due to the low utility of existing LDP mechanisms. To address such deficiency, we propose the first LDP framework for a variety of location-based services (namely ''L-SRR''), which privately collects and analyzes user locations with high utility. Specifically, we design a novel randomization mechanism ''Staircase Randomized Response'' (SRR) and extend the empirical estimation to significantly boost the utility for SRR in different LBS applications (e.g., traffic density estimation, and k-nearest neighbors). We have conducted extensive experiments on four real LBS datasets by benchmarking with other LDP schemes in practical applications. The experimental results demonstrate that L-SRR significantly outperforms them.Comment: accepted to CCS'22; full versio

    Privacy protection in context aware systems.

    Get PDF
    Smartphones, loaded with users’ personal information, are a primary computing device for many. Advent of 4G networks, IPV6 and increased number of subscribers to these has triggered a host of application developers to develop softwares that are easy to install on the mobile devices. During the application download process, users accept the terms and conditions that permit revelation of private information. The free application markets are sustainable as the revenue model for most of these service providers is through profiling of users and pushing advertisements to the users. This creates a serious threat to users privacy and hence it is important that “privacy protection mechanisms” should be in place to protect the users’ privacy. Most of the existing solutions falsify or modify the information in the service request and starve the developers of their revenue. In this dissertation, we attempt to bridge the gap by proposing a novel integrated CLOPRO framework (Context Cloaking Privacy Protection) that achieves Identity privacy, Context privacy and Query privacy without depriving the service provider of sustainable revenue made from the CAPPA (Context Aware Privacy Preserving Advertising). Each service request has three parameters: identity, context and actual query. The CLOPRO framework reduces the risk of an adversary linking all of the three parameters. The main objective is to ensure that no single entity in the system has all the information about the user, the queries or the link between them, even though the user gets the desired service in a viable time frame. The proposed comprehensive framework for privacy protecting, does not require the user to use a modified OS or the service provider to modify the way an application developer designs and deploys the application and at the same time protecting the revenue model of the service provider. The system consists of two non-colluding servers, one to process the location coordinates (Location server) and the other to process the original query (Query server). This approach makes several inherent algorithmic and research contributions. First, we have proposed a formal definition of privacy and the attack. We identified and formalized that the privacy is protected if the transformation functions used are non-invertible. Second, we propose use of clustering of every component of the service request to provide anonymity to the user. We use a unique encrypted identity for every service request and a unique id for each cluster of users that ensures Identity privacy. We have designed a Split Clustering Anonymization Algorithms (SCAA) that consists of two algorithms Location Anonymization Algorithm (LAA) and Query Anonymization Algorithm (QAA). The application of LAA replaces the actual location for the users in the cluster with the centroid of the location coordinates of all users in that cluster to achieve Location privacy. The time of initiation of the query is not a part of the message string to the service provider although it is used for identifying the timed out requests. Thus, Context privacy is achieved. To ensure the Query privacy, the generic queries (created using QAA) are used that cover the set of possible queries, based on the feature variations between the queries. The proposed CLOPRO framework associates the ads/coupons relevant to the generic query and the location of the users and they are sent to the user along with the result without revealing the actual user, the initiation time of query or the location and the query, of the user to the service provider. Lastly, we introduce the use of caching in query processing to improve the response time in case of repetitive queries. The Query processing server caches the query result. We have used multiple approaches to prove that privacy is preserved in CLOPRO system. We have demonstrated using the properties of the transformation functions and also using graph theoretic approaches that the user’s Identity, Context and Query is protected against the curious but honest adversary attack, fake query and also replay attacks with the use of CLOPRO framework. The proposed system not only provides \u27k\u27 anonymity, but also satisfies the \u3c k; s \u3e and \u3c k; T \u3e anonymity properties required for privacy protection. The complexity of our proposed algorithm is O(n)
    • …
    corecore