1,515 research outputs found

    Contributions to security and privacy protection in recommendation systems

    Get PDF
    A recommender system is an automatic system that, given a customer model and a set of available documents, is able to select and offer those documents that are more interesting to the customer. From the point of view of security, there are two main issues that recommender systems must face: protection of the users' privacy and protection of other participants of the recommendation process. Recommenders issue personalized recommendations taking into account not only the profile of the documents, but also the private information that customers send to the recommender. Hence, the users' profiles include personal and highly sensitive information, such as their likes and dislikes. In order to have a really useful recommender system and improve its efficiency, we believe that users shouldn't be afraid of stating their preferences. The second challenge from the point of view of security involves the protection against a new kind of attack. Copyright holders have shifted their targets to attack the document providers and any other participant that aids in the process of distributing documents, even unknowingly. In addition, new legislation trends such as ACTA or the ¿Sinde-Wert law¿ in Spain show the interest of states all over the world to control and prosecute these intermediate nodes. we proposed the next contributions: 1.A social model that captures user's interests into the users' profiles, and a metric function that calculates the similarity between users, queries and documents. This model represents profiles as vectors of a social space. Document profiles are created by means of the inspection of the contents of the document. Then, user profiles are calculated as an aggregation of the profiles of the documents that the user owns. Finally, queries are a constrained view of a user profile. This way, all profiles are contained in the same social space, and the similarity metric can be used on any pair of them. 2.Two mechanisms to protect the personal information that the user profiles contain. The first mechanism takes advantage of the Johnson-Lindestrauss and Undecomposability of random matrices theorems to project profiles into social spaces of less dimensions. Even if the information about the user is reduced in the projected social space, under certain circumstances the distances between the original profiles are maintained. The second approach uses a zero-knowledge protocol to answer the question of whether or not two profiles are affine without leaking any information in case of that they are not. 3.A distributed system on a cloud that protects merchants, customers and indexers against legal attacks, by means of providing plausible deniability and oblivious routing to all the participants of the system. We use the term DocCloud to refer to this system. DocCloud organizes databases in a tree-shape structure over a cloud system and provide a Private Information Retrieval protocol to avoid that any participant or observer of the process can identify the recommender. This way, customers, intermediate nodes and even databases are not aware of the specific database that answered the query. 4.A social, P2P network where users link together according to their similarity, and provide recommendations to other users in their neighborhood. We defined an epidemic protocol were links are established based on the neighbors similarity, clustering and randomness. Additionally, we proposed some mechanisms such as the use SoftDHT to aid in the identification of affine users, and speed up the process of creation of clusters of similar users. 5.A document distribution system that provides the recommended documents at the end of the process. In our view of a recommender system, the recommendation is a complete process that ends when the customer receives the recommended document. We proposed SCFS, a distributed and secure filesystem where merchants, documents and users are protectedEste documento explora c omo localizar documentos interesantes para el usuario en grandes redes distribuidas mediante el uso de sistemas de recomendaci on. Se de fine un sistema de recomendaci on como un sistema autom atico que, dado un modelo de cliente y un conjunto de documentos disponibles, es capaz de seleccionar y ofrecer los documentos que son m as interesantes para el cliente. Las caracter sticas deseables de un sistema de recomendaci on son: (i) ser r apido, (ii) distribuido y (iii) seguro. Un sistema de recomendaci on r apido mejora la experiencia de compra del cliente, ya que una recomendaci on no es util si es que llega demasiado tarde. Un sistema de recomendaci on distribuido evita la creaci on de bases de datos centralizadas con informaci on sensible y mejora la disponibilidad de los documentos. Por ultimo, un sistema de recomendaci on seguro protege a todos los participantes del sistema: usuarios, proveedores de contenido, recomendadores y nodos intermedios. Desde el punto de vista de la seguridad, existen dos problemas principales a los que se deben enfrentar los sistemas de recomendaci on: (i) la protecci on de la intimidad de los usuarios y (ii) la protecci on de los dem as participantes del proceso de recomendaci on. Los recomendadores son capaces de emitir recomendaciones personalizadas teniendo en cuenta no s olo el per l de los documentos, sino tambi en a la informaci on privada que los clientes env an al recomendador. Por tanto, los per les de usuario incluyen informaci on personal y altamente sensible, como sus gustos y fobias. Con el n de desarrollar un sistema de recomendaci on util y mejorar su e cacia, creemos que los usuarios no deben tener miedo a la hora de expresar sus preferencias. Para ello, la informaci on personal que est a incluida en los per les de usuario debe ser protegida y la privacidad del usuario garantizada. El segundo desafi o desde el punto de vista de la seguridad implica un nuevo tipo de ataque. Dado que la prevenci on de la distribuci on ilegal de documentos con derechos de autor por medio de soluciones t ecnicas no ha sido efi caz, los titulares de derechos de autor cambiaron sus objetivos para atacar a los proveedores de documentos y cualquier otro participante que ayude en el proceso de distribuci on de documentos. Adem as, tratados y leyes como ACTA, la ley SOPA de EEUU o la ley "Sinde-Wert" en España ponen de manfi esto el inter es de los estados de todo el mundo para controlar y procesar a estos nodos intermedios. Los juicios recientes como MegaUpload, PirateBay o el caso contra el Sr. Pablo Soto en España muestran que estas amenazas son una realidad

    Enabling Internet-Scale Publish/Subscribe In Overlay Networks

    Get PDF
    As the amount of data in todays Internet is growing larger, users are exposed to too much information, which becomes increasingly more difficult to comprehend. Publish/subscribe systems leverage this problem by providing loosely-coupled communications between producers and consumers of data in a network. Data consumers, i.e., subscribers, are provided with a subscription mechanism, to express their interests in a subset of data, in order to be notified only when some data that matches their subscription is generated by the producers, i.e., publishers. Most publish/subscribe systems today, are based on the client/server architectural model. However, to provide the publish/subscribe service in large scale, companies either have to invest huge amount of money for over-provisioning the resources, or are prone to frequent service failures. Peer-to-peer overlay networks are attractive alternative solutions for building Internet-scale publish/subscribe systems. However, scalability comes with a cost: a published message often needs to traverse a large number of uninterested (unsubscribed) nodes before reaching all its subscribers. We refer to this undesirable traffic, as relay overhead. Without careful considerations, the relay overhead might sharply increase resource consumption for the relay nodes (in terms of bandwidth transmission cost, CPU, etc) and could ultimately lead to rapid deterioration of the system’s performance once the relay nodes start dropping the messages or choose to permanently abandon the system. To mitigate this problem, some solutions use unbounded number of connections per node, while some other limit the expressiveness of the subscription scheme. In this thesis work, we introduce two systems called Vitis and Vinifera, for topic-based and content-based publish/subscribe models, respectively. Both these systems are gossip-based and significantly decrease the relay overhead. We utilize novel techniques to cluster together nodes that exhibit similar subscriptions. In the topic-based model, distinct clusters for each topic are constructed, while clusters in the content-based model are fuzzy and do not have explicit boundaries. We augment these clustered overlays by links that facilitate routing in the network. We construct a hybrid system by injecting structure into an otherwise unstructured network. The resulting structures resemble navigable small-world networks, which spans along clusters of nodes that have similar subscriptions. The properties of such overlays make them an ideal platform for efficient data dissemination in large-scale systems. The systems requires only a bounded node degree and as we show, through simulations, they scale well with the number of nodes and subscriptions and remain efficient under highly complex subscription patterns, high publication rates, and even in the presence of failures in the network. We also compare both systems against some state-of-the-art publish/subscribe systems. Our measurements show that both Vitis and Vinifera significantly outperform their counterparts on various subscription and churn scenarios, under both synthetic workloads and real-world traces

    Privacy-preserving friend recommendations in online social networks

    Get PDF
    Online social networks, such as Facebook and Google+, have been emerging as a new communication service for users to stay in touch and share information with family members and friends over the Internet. Since the users are generating huge amounts of data on social network sites, an interesting question is how to mine this enormous amount of data to retrieve useful information. Along this direction, social network analysis has emerged as an important tool for many business intelligence applications such as identifying potential customers and promoting items based on their interests. In particular, since users are often interested to make new friends, a friend recommendation application provides the medium for users to expand his/her social connections and share information of interest with more friends. Besides this, it also helps to enhance the development of the entire network structure. The existing friend recommendation methods utilize social network structure and/or user profile information. However, these methods can no longer be applicable if the privacy of users is taken into consideration. This work introduces a set of privacy-preserving friend recommendation protocols based on different existing similarity metrics in the literature. Briefly, depending on the underlying similarity metric used, the proposed protocols guarantee the privacy of a user\u27s personal information such as friend lists. These protocols are the first to make the friend recommendation process possible in privacy-enhanced social networking environments. Also, this work considers the case of outsourced social networks, where users\u27 profile data are encrypted and outsourced to third-party cloud providers who provide social networking services to the users. Under such an environment, this work proposes novel protocols for the cloud to do friend recommendations in a privacy-preserving manner --Abstract, page iii

    SDSF : social-networking trust based distributed data storage and co-operative information fusion.

    Get PDF
    As of 2014, about 2.5 quintillion bytes of data are created each day, and 90% of the data in the world was created in the last two years alone. The storage of this data can be on external hard drives, on unused space in peer-to-peer (P2P) networks or using the more currently popular approach of storing in the Cloud. When the users store their data in the Cloud, the entire data is exposed to the administrators of the services who can view and possibly misuse the data. With the growing popularity and usage of Cloud storage services like Google Drive, Dropbox etc., the concerns of privacy and security are increasing. Searching for content or documents, from this distributed stored data, given the rate of data generation, is a big challenge. Information fusion is used to extract information based on the query of the user, and combine the data and learn useful information. This problem is challenging if the data sources are distributed and heterogeneous in nature where the trustworthiness of the documents may be varied. This thesis proposes two innovative solutions to resolve both of these problems. Firstly, to remedy the situation of security and privacy of stored data, we propose an innovative Social-based Distributed Data Storage and Trust based co-operative Information Fusion Framework (SDSF). The main objective is to create a framework that assists in providing a secure storage system while not overloading a single system using a P2P like approach. This framework allows the users to share storage resources among friends and acquaintances without compromising the security or privacy and enjoying all the benefits that the Cloud storage offers. The system fragments the data and encodes it to securely store it on the unused storage capacity of the data owner\u27s friends\u27 resources. The system thus gives a centralized control to the user over the selection of peers to store the data. Secondly, to retrieve the stored distributed data, the proposed system performs the fusion also from distributed sources. The technique uses several algorithms to ensure the correctness of the query that is used to retrieve and combine the data to improve the information fusion accuracy and efficiency for combining the heterogeneous, distributed and massive data on the Cloud for time critical operations. We demonstrate that the retrieved documents are genuine when the trust scores are also used while retrieving the data sources. The thesis makes several research contributions. First, we implement Social Storage using erasure coding. Erasure coding fragments the data, encodes it, and through introduction of redundancy resolves issues resulting from devices failures. Second, we exploit the inherent concept of trust that is embedded in social networks to determine the nodes and build a secure net-work where the fragmented data should be stored since the social network consists of a network of friends, family and acquaintances. The trust between the friends, and availability of the devices allows the user to make an informed choice about where the information should be stored using `k\u27 optimal paths. Thirdly, for the purpose of retrieval of this distributed stored data, we propose information fusion on distributed data using a combination of Enhanced N-grams (to ensure correctness of the query), Semantic Machine Learning (to extract the documents based on the context and not just bag of words and also considering the trust score) and Map Reduce (NSM) Algorithms. Lastly we evaluate the performance of distributed storage of SDSF using era- sure coding and identify the social storage providers based on trust and evaluate their trustworthiness. We also evaluate the performance of our information fusion algorithms in distributed storage systems. Thus, the system using SDSF framework, implements the beneficial features of P2P networks and Cloud storage while avoiding the pitfalls of these systems. The multi-layered encrypting ensures that all other users, including the system administrators cannot decode the stored data. The application of NSM algorithm improves the effectiveness of fusion since large number of genuine documents are retrieved for fusion

    Privacy-preserving collaboration in an integrated social environment

    Get PDF
    Privacy and security of data have been a critical concern at the state, organization and individual levels since times immemorial. New and innovative methods for data storage, retrieval and analysis have given rise to greater challenges on these fronts. Online social networks (OSNs) are at the forefront of individual privacy concerns due to their ubiquity, popularity and possession of a large collection of users' personal data. These OSNs use recommender systems along with their integration partners (IPs) for offering an enriching user experience and growth. However, the recommender systems provided by these OSNs inadvertently leak private user information. In this work, we develop solutions targeted at addressing existing, real-world privacy issues for recommender systems that are deployed across multiple OSNs. Specifically, we identify the various ways through which privacy leaks can occur in a friend recommendation system (FRS), and propose a comprehensive solution that integrates both Differential Privacy and Secure Multi-Party Computation (MPC) to provide a holistic privacy guarantee. We model a privacy-preserving similarity computation framework and library named Lucene-P2. It includes the efficient privacy-preserving Latent Semantic Indexing (LSI) extension. OSNs can use the Lucene-P2 framework to evaluate similarity scores for their private inputs without sharing them. Security proofs are provided under semi-honest and malicious adversary models. We analyze the computation and communication complexities of the protocols proposed and empirically test them on real-world datasets. These solutions provide functional efficiency and data utility for practical applications to an extent.Includes bibliographical references
    • …
    corecore