32 research outputs found

    Proxy-secure computation model: application to k-means clustering implementation, analysis and improvements

    Get PDF
    Distributed privacy preserving data mining applications, where data is divided among several parties, require high amounts of network communication. In order to overcome this overhead, we propose a scheme that reduces remote computations in distributed data mining applications into local computations on a trusted hardware. Cell BE is used to realize the trusted hardware acting as a proxy for the parties. We design a secure two-party computation protocol that can be instrumental in realizing non-colluding parties in privacy-preserving data mining applications. Each party is represented with a signed and encrypted thread on a separate core of Cell BE running in an isolated mode, whereby its execution and data are secured by hardware means. Our implementations and experiments demonstrate that a significant speed up is gained through the new scheme. It is also possible to increase the number of non-colluding parties on Cell BE, which extends the proposed technique to implement most distributed privacy-preserving data mining protocols proposed in literature that require several non-colluding parties

    Depth optimized efficient homomorphic sorting

    Get PDF
    We introduce a sorting scheme which is capable of efficiently sorting encrypted data without the secret key. The technique is obtained by focusing on the multiplicative depth of the sorting circuit alongside the more traditional metrics such as number of comparisons and number of iterations. The reduced depth allows much reduced noise growth and thereby makes it possible to select smaller parameter sizes in somewhat homomorphic encryption instantiations resulting in greater efficiency savings. We first consider a number of well known comparison based sorting algorithms as well as some sorting networks, and analyze their circuit implementations with respect to multiplicative depth. In what follows, we introduce a new ranking based sorting scheme and rigorously analyze the multiplicative depth complexity as O(log(N) + log(l)), where N is the size of the array to be sorted and l is the bit size of the array elements. Finally, we simulate our sorting scheme using a leveled/batched instantiation of a SWHE library. Our sorting scheme performs favorably over the analyzed classical sorting algorithms

    Learning structure and schemas from heterogeneous domains in networked systems: a survey

    Get PDF
    The rapidly growing amount of available digital documents of various formats and the possibility to access these through internet-based technologies in distributed environments, have led to the necessity to develop solid methods to properly organize and structure documents in large digital libraries and repositories. Specifically, the extremely large size of document collections make it impossible to manually organize such documents. Additionally, most of the document sexist in an unstructured form and do not follow any schemas. Therefore, research efforts in this direction are being dedicated to automatically infer structure and schemas. This is essential in order to better organize huge collections as well as to effectively and efficiently retrieve documents in heterogeneous domains in networked system. This paper presents a survey of the state-of-the-art methods for inferring structure from documents and schemas in networked environments. The survey is organized around the most important application domains, namely, bio-informatics, sensor networks, social networks, P2Psystems, automation and control, transportation and privacy preserving for which we analyze the recent developments on dealing with unstructured data in such domains.Peer ReviewedPostprint (published version

    Sorting problem in fully homomorphic encrypted data

    Get PDF
    Fully Homomorphic Encryption (FHE) schemes allow users to perform computations over encrypted data without decrypting the ciphertext. This is possible via two operations which are bitwise addition and multiplication, namely logical XOR and logical AND operations, which can be applied over the bits individually encrypted under the fully homomorphic encryption scheme. Since any Boolean circuit can be realized using only AND and XOR gates, they can be used to build circuits for the computation of even more complicated operations over encrypted data. This property of FHE cryptosystems is especially useful in cloud computing applications, since data owners who use cloud computing for storage and computation, usually tend not to trust servers and for security reasons, they prefer storing their data in encrypted form. By using FHE cryptographic primitives, now servers are allowed to perform any desired task over the encrypted user data without the knowledge of secret key or plaintext. In this thesis, we focus on solving one such task that cloud server performs over encrypted data; sorting the elements of an integer array. We introduce two sorting schemes, both of which are capable of e ciently sorting data in fully homomorphic encrypted form. The technique is obtained by focusing on the minimization of the depth of the sorting circuit in addition to more traditional metrics such as the number of comparisons. The reduced depth of the sorting network allows a slower growth in the noise of encrypted bits and thereby makes it possible to select smaller parameter sizes for the underlying homomorphic encryption scheme resulting in much faster computation of homomorphic sorting. We present a leveled/batched implementation for the proposed sorting algorithms, using an NTRU based homomorphic encryption library, which yields significant improvements over classical sorting algorithms

    Aggregating privatized medical data for secure querying applications

    Full text link
     This thesis analyses and examines the challenges of aggregation of sensitive data and data querying on aggregated data at cloud server. This thesis also delineates applications of aggregation of sensitive medical data in several application scenarios, and tests privatization techniques to assist in improving the strength of privacy and utility

    Leveraging Client Processing for Location Privacy in Mobile Local Search

    Get PDF
    Usage of mobile services is growing rapidly. Most Internet-based services targeted for PC based browsers now have mobile counterparts. These mobile counterparts often are enhanced when they use user\u27s location as one of the inputs. Even some PC-based services such as point of interest Search, Mapping, Airline tickets, and software download mirrors now use user\u27s location in order to enhance their services. Location-based services are exactly these, that take the user\u27s location as an input and enhance the experience based on that. With increased use of these services comes the increased risk to location privacy. The location is considered an attribute that user\u27s hold as important to their privacy. Compromise of one\u27s location, in other words, loss of location privacy can have several detrimental effects on the user ranging from trivial annoyance to unreasonable persecution. More and more companies in the Internet economy rely exclusively on the huge data sets they collect about users. The more detailed and accurate the data a company has about its users, the more valuable the company is considered. No wonder that these companies are often the same companies that offer these services for free. This gives them an opportunity to collect more accurate location information. Research community in the location privacy protection area had to reciprocate by modeling an adversary that could be the service provider itself. To further drive this point, we show that a well-equipped service provider can infer user\u27s location even if the location information is not directly available by using other information he collects about the user. There is no dearth of proposals of several protocols and algorithms that protect location privacy. A lot of these earlier proposals require a trusted third party to play as an intermediary between the service provider and the user. These protocols use anonymization and/or obfuscation techniques to protect user\u27s identity and/or location. This requirement of trusted third parties comes with its own complications and risks and makes these proposals impractical in real life scenarios. Thus it is preferable that protocols do not require a trusted third party. We look at existing proposals in the area of private information retrieval. We present a brief survey of several proposals in the literature and implement two representative algorithms. We run experiments using different sizes of databases to ascertain their practicability and performance features. We show that private information retrieval based protocols still have long ways to go before they become practical enough for local search applications. We propose location privacy preserving mechanisms that take advantage of the processing power of modern mobile devices and provide configurable levels of location privacy. We propose these techniques both in the single query scenario and multiple query scenario. In single query scenario, the user issues a query to the server and obtains the answer. In the multiple query scenario, the user keeps sending queries as she moves about in the area of interest. We show that the multiple query scenario increases the accuracy of adversary\u27s determination of user\u27s location, and hence improvements are needed to cope with this situation. So, we propose an extension of the single query scenario that addresses this riskier multiple query scenario, still maintaining the practicability and acceptable performance when implemented on a modern mobile device. Later we propose a technique based on differential privacy that is inspired by differential privacy in statistical databases. All three mechanisms proposed by us are implemented in realistic hardware or simulators, run against simulated but real life data and their characteristics ascertained to show that they are practical and ready for adaptation. This dissertation study the privacy issues for location-based services in mobile environment and proposes a set of new techniques that eliminate the need for a trusted third party by implementing efficient algorithms on modern mobile hardware

    New Fundamental Technologies in Data Mining

    Get PDF
    The progress of data mining technology and large public popularity establish a need for a comprehensive text on the subject. The series of books entitled by "Data Mining" address the need by presenting in-depth description of novel mining algorithms and many useful applications. In addition to understanding each section deeply, the two books present useful hints and strategies to solving problems in the following chapters. The contributing authors have highlighted many future research directions that will foster multi-disciplinary collaborations and hence will lead to significant development in the field of data mining

    Models and Algorithms for Private Data Sharing

    Get PDF
    In recent years, there has been a tremendous growth in the collection of digital information about individuals. Many organizations such as governmental agencies, hospitals, and financial companies collect and disseminate various person-specific data. Due to the rapid advance in the storing, processing, and networking capabilities of the computing devices, the collected data can now be easily analyzed to infer valuable information for research and business purposes. Data from different sources can be integrated and further analyzed to gain better insights. On one hand, the collected data offer tremendous opportunities for mining useful information. On the other hand, the mining process poses a threat to individual privacy since these data often contain sensitive information. In this thesis, we address the problem of developing anonymization algorithms to thwart potential privacy attacks in different real-life data sharing scenarios. In particular, we study two privacy models: LKC-privacy and differential privacy. For each of these models, we develop algorithms for anonymizing different types of data such as relational data, trajectory data, and heterogeneous data. We also develop algorithms for distributed data where multiple data publishers cooperate to integrate their private data without violating the given privacy requirements. Experimental results on the real-life data demonstrate that the proposed anonymization algorithms can effectively retain the essential information for data analysis and are scalable for large data sets

    Exécutions de requêtes respectueuses de la vie privée par utilisation de composants matériels sécurisés

    Get PDF
    Current applications, from complex sensor systems (e.g. quantified self) to online e-markets acquire vast quantities of personal information which usually end-up on central servers. This massive amount of personal data, the new oil, represents an unprecedented potential for applications and business. However, centralizing and processing all one's data in a single server, where they are exposed to prying eyes, poses a major problem with regards to privacy concern.Conversely, decentralized architectures helping individuals keep full control of their data, but they complexify global treatments and queries, impeding the development of innovative services.In this thesis, we aim at reconciling individual's privacy on one side and global benefits for the community and business perspectives on the other side. It promotes the idea of pushing the security to secure hardware devices controlling the data at the place of their acquisition. Thanks to these tangible physical elements of trust, secure distributed querying protocols can reestablish the capacity to perform global computations, such as SQL aggregates, without revealing any sensitive information to central servers.This thesis studies the subset of SQL queries without external joins and shows how to secure their execution in the presence of honest-but-curious attackers. It also discusses how the resulting querying protocols can be integrated in a concrete decentralized architecture. Cost models and experiments on SQL/AA, our distributed prototype running on real tamper-resistant hardware, demonstrate that this approach can scale to nationwide applications.Les applications actuelles, des systèmes de capteurs complexes (par exemple auto quantifiée) aux applications de e-commerce, acquièrent de grandes quantités d'informations personnelles qui sont habituellement stockées sur des serveurs centraux. Cette quantité massive de données personnelles, considéré comme le nouveau pétrole, représente un important potentiel pour les applications et les entreprises. Cependant, la centralisation et le traitement de toutes les données sur un serveur unique, où elles sont exposées aux indiscrétions de son gestionnaire, posent un problème majeur en ce qui concerne la vie privée.Inversement, les architectures décentralisées aident les individus à conserver le plein de contrôle sur leurs données, toutefois leurs traitements en particulier le calcul de requêtes globales deviennent complexes.Dans cette thèse, nous visons à concilier la vie privée de l'individu et l'exploitation de ces données, qui présentent des avantages manifestes pour la communauté (comme des études statistiques) ou encore des perspectives d'affaires. Nous promouvons l'idée de sécuriser l'acquisition des données par l'utilisation de matériel sécurisé. Grâce à ces éléments matériels tangibles de confiance, sécuriser des protocoles d'interrogation distribués permet d'effectuer des calculs globaux, tels que les agrégats SQL, sans révéler d'informations sensibles à des serveurs centraux.Cette thèse étudie le sous-groupe de requêtes SQL sans jointures et montre comment sécuriser leur exécution en présence d'attaquants honnêtes-mais-curieux. Cette thèse explique également comment les protocoles d'interrogation qui en résultent peuvent être intégrés concrètement dans une architecture décentralisée. Nous démontrons que notre approche est viable et peut passer à l'échelle d'applications de la taille d'un pays par un modèle de coût et des expériences réelles sur notre prototype, SQL/AA
    corecore