5 research outputs found

    Contributions to Lifelogging Protection In Streaming Environments

    Get PDF
    Tots els dies, més de cinc mil milions de persones generen algun tipus de dada a través d'Internet. Per accedir a aquesta informació, necessitem utilitzar serveis de recerca, ja siguin motors de cerca web o assistents personals. A cada interacció amb ells, el nostre registre d'accions, logs, s'utilitza per oferir una millor experiència. Per a les empreses, també són molt valuosos, ja que ofereixen una forma de monetitzar el servei. La monetització s'aconsegueix venent dades a tercers, però, els logs de consultes podrien exposar informació confidencial de l'usuari (identificadors, malalties, tendències sexuals, creences religioses) o usar-se per al que es diu "life-logging ": Un registre continu de les activitats diàries. La normativa obliga a protegir aquesta informació. S'han proposat prèviament sistemes de protecció per a conjunts de dades tancats, la majoria d'ells treballant amb arxius atòmics o dades estructurades. Desafortunadament, aquests sistemes no s'adapten quan es fan servir en el creixent entorn de dades no estructurades en temps real que representen els serveis d'Internet. Aquesta tesi té com objectiu dissenyar tècniques per protegir la informació confidencial de l'usuari en un entorn no estructurat d’streaming en temps real, garantint un equilibri entre la utilitat i la protecció de dades. S'han fet tres propostes per a una protecció eficaç dels logs. La primera és un nou mètode per anonimitzar logs de consultes, basat en k-anonimat probabilística i algunes eines de desanonimització per determinar fuites de dades. El segon mètode, s'ha millorat afegint un equilibri configurable entre privacitat i usabilitat, aconseguint una gran millora en termes d'utilitat de dades. La contribució final es refereix als assistents personals basats en Internet. La informació generada per aquests dispositius es pot considerar "life-logging" i pot augmentar els riscos de privacitat de l'usuari. Es proposa un esquema de protecció que combina anonimat de logs i signatures sanitizables.Todos los días, más de cinco mil millones de personas generan algún tipo de dato a través de Internet. Para acceder a esa información, necesitamos servicios de búsqueda, ya sean motores de búsqueda web o asistentes personales. En cada interacción con ellos, nuestro registro de acciones, logs, se utiliza para ofrecer una experiencia más útil. Para las empresas, también son muy valiosos, ya que ofrecen una forma de monetizar el servicio, vendiendo datos a terceros. Sin embargo, los logs podrían exponer información confidencial del usuario (identificadores, enfermedades, tendencias sexuales, creencias religiosas) o usarse para lo que se llama "life-logging": Un registro continuo de las actividades diarias. La normativa obliga a proteger esta información. Se han propuesto previamente sistemas de protección para conjuntos de datos cerrados, la mayoría de ellos trabajando con archivos atómicos o datos estructurados. Desafortunadamente, esos sistemas no se adaptan cuando se usan en el entorno de datos no estructurados en tiempo real que representan los servicios de Internet. Esta tesis tiene como objetivo diseñar técnicas para proteger la información confidencial del usuario en un entorno no estructurado de streaming en tiempo real, garantizando un equilibrio entre utilidad y protección de datos. Se han hecho tres propuestas para una protección eficaz de los logs. La primera es un nuevo método para anonimizar logs de consultas, basado en k-anonimato probabilístico y algunas herramientas de desanonimización para determinar fugas de datos. El segundo método, se ha mejorado añadiendo un equilibrio configurable entre privacidad y usabilidad, logrando una gran mejora en términos de utilidad de datos. La contribución final se refiere a los asistentes personales basados en Internet. La información generada por estos dispositivos se puede considerar “life-logging” y puede aumentar los riesgos de privacidad del usuario. Se propone un esquema de protección que combina anonimato de logs y firmas sanitizables.Every day, more than five billion people generate some kind of data over the Internet. As a tool for accessing that information, we need to use search services, either in the form of Web Search Engines or through Personal Assistants. On each interaction with them, our record of actions via logs, is used to offer a more useful experience. For companies, logs are also very valuable since they offer a way to monetize the service. Monetization is achieved by selling data to third parties, however query logs could potentially expose sensitive user information: identifiers, sensitive data from users (such as diseases, sexual tendencies, religious beliefs) or be used for what is called ”life-logging”: a continuous record of one’s daily activities. Current regulations oblige companies to protect this personal information. Protection systems for closed data sets have previously been proposed, most of them working with atomic files or structured data. Unfortunately, those systems do not fit when used in the growing real-time unstructured data environment posed by Internet services. This thesis aims to design techniques to protect the user’s sensitive information in a non-structured real-time streaming environment, guaranteeing a trade-off between data utility and protection. In this regard, three proposals have been made in efficient log protection. The first is a new method to anonymize query logs, based on probabilistic k-anonymity and some de-anonymization tools to determine possible data leaks. A second method has been improved in terms of a configurable trade-off between privacy and usability, achieving a great improvement in terms of data utility. Our final contribution concerns Internet-based Personal Assistants. The information generated by these devices is likely to be considered life-logging, and it can increase the user’s privacy risks. The proposal is a protection scheme that combines log anonymization and sanitizable signatures

    Client-side privacy-enhancing technologies in web search

    Get PDF
    Els motors de cerca (En anglès, Web Search Engines - WSEs-), són eines que permeten als usuaris localitzar informació específica a Internet. Un dels objectius dels WSEs és retornar els resultats que millor coincideixen amb els interessos de cada usuari. Amb aquesta finalitat, l'WSEs recull i analitza l' historial de cerca per construir perfils. Com a resultat, un usuari que envia una certa consulta rebrà els resultats més interessants en les primeres posicions. Encara que proporcionen un servei molt útil, també representen una amenaça per a la privacitat dels seus usuaris. Es construeixen els perfils basats en la història de les consultes i altres dades relacionades que poden contenir informació personal i privada. Per evitar aquesta amenaça de privacitat, és necessari establir mecanismes per a la protecció de la privacitat dels usuaris dels motors de cerca. Actualment, hi ha diverses solucions en la literatura per proporcionar privacitat a aquests usuaris. Un dels objectius d'aquest estudi és analitzar les solucions existents, estudiar les seves diferències i els avantatges i inconvenients de cada proposta. Llavors, basat en l'estat de l'art, presentem noves propostes per protegir la privadesa dels usuaris. Més concretament, aquesta tesi proposa tres protocols per preservar la privacitat dels usuaris en la cerca web. La idea general és distribuir als usuaris en grups on intercanvi consultes, com a mètode d'ofuscació ocultar les consultes reals de cada usuari. El primer protocol distribuït que proposem es centra en la reducció del temps d'espera de consulta, és a dir, el temps que cada membre del grup ha d'esperar per rebre els resultats de la seva consulta. El segon protocol proposat millora les propostes anteriors ja que resisteix els atacs interns, i obté millors resultats que les propostes similars en termes de càlcul i comunicació. La tercera proposta és un protocol P2P, on els usuaris estan agrupats segons les seves preferències. Això permet ocultar els perfils d'usuari però conservar els interessos generals. En conseqüència, el motor de cerca és capaç de classificar millor els resultats de les seves consultes.Los motores de búsqueda (en inglés, Web Search Engines -WSEs-) son herramientas que permiten a los usuarios localizar información específica en Internet. Uno de los objetivos de los WSEs es devolver los resultados que mejor coinciden con los intereses de cada usuario. Para ello, los WSEs recogen y analizan el historial de búsqueda de los usuarios para construir perfiles. Como resultado, un usuario que envía una cierta consulta recibirá los resultados más interesantes en las primeras posiciones. Aunque ofrecen un servicio muy útil, también representan una amenaza para la privacidad de sus usuarios. Los perfiles se construyen a partir del historial de consultas y otros datos relacionados que pueden contener información privada y personal. Para evitar esta amenaza de privacidad, es necesario establecer mecanismos de protección de privacidad de motores de búsqueda. En la actualidad, existen varias soluciones en la literatura para proporcionar privacidad a estos usuarios. Uno de los objetivos de este trabajo es examinar las soluciones existentes, analizando sus diferencias y las ventajas y desventajas de cada propuesta. Después, basándonos en el estado del arte actual, presentamos nuevas propuestas que protegen la privacidad de los usuarios. Más concretamente, esta tesis doctoral propone tres protocolos que preservan la privacidad de los usuarios en las búsquedas web. La idea general es distribuir a los usuarios en grupos donde intercambian sus consultas, como método de ofuscación para ocultar las consultas reales de cada usuario. El primer protocolo distribuido que proponemos se centra en reducir el tiempo de espera de la consulta, es decir, el tiempo que cada miembro del grupo tiene que esperar para recibir los resultados de la consulta. El segundo protocolo propuesto mejora anteriores propuestas porque resiste ataques internos, mejorando propuestas similares en términos de cómputo y comunicación. La tercera propuesta es un protocolo P2P, donde los usuarios se agrupan según sus preferencias. Esto permite ofuscar los perfiles de los usuarios pero conservando a sus intereses generales. En consecuencia, el WSE es capaz de clasificar mejor los resultados de sus consultas.Web search engines (WSEs) are tools that allow users to locate specific information on the Internet. One of the objectives of WSEs is to return the results that best match the interests of each user. For this purpose, WSEs collect and analyze users’ search history in order to build profiles. Consequently, a profiled user who submits a certain query will receive the results which are more interesting for her in the first positions. Although they offer a very useful service, they also represent a threat for their users’ privacy. Profiles are built from past queries and other related data that may contain private and personal information. In order to avoid this privacy threat, it is necessary to provide privacy-preserving mechanisms that protect users. Nowadays, there exist several solutions that intend to provide privacy in this field. One of the goals of this work is to survey the current solutions, analyzing their differences and remarking the advantages and disadvantages of each approach. Then, based on the current state of the art, we present new proposals that protect users’ privacy. More specifically, this dissertation proposes three different privacy-preserving multi-party protocols for web search. A multi-party protocol for web search arranges users into groups where they exchange their queries. This serves as an obfuscation method to hide the real queries of each user. The first multi-party protocol that we propose focuses on reducing the query delay. This is the time that every group member has to wait in order to receive the query results. The second proposed multi-party protocol improves current literature because it is resilient against internal attacks, outperforming similar proposals in terms of computation and communication. The third proposal is a P2P protocol, where users are grouped according to their preferences. This allows to obfuscate users’ profiles but conserving their general interests. Consequently, the WSE is able to better rank the results of their queries

    Online Privacy in Mobile and Web Platforms: Risk Quantification and Obfuscation Techniques

    Full text link
    The wide-spread use of the web and mobile platforms and their high engagement in human lives pose serious threats to the privacy and confidentiality of users. It has been demonstrated in a number of research works that devices, such as desktops, mobile, and web browsers contain subtle information and measurable variation, which allow them to be fingerprinted. Moreover, behavioural tracking is another form of privacy threat that is induced by the collection and monitoring of users gestures such as touch, motion, GPS, search queries, writing pattern, and more. The success of these methods is a clear indication that obfuscation techniques to protect the privacy of individuals, in reality, are not successful if the collected data contains potentially unique combinations of attributes relating to specific individuals. With this in view, this thesis focuses on understanding the privacy risks across the web and mobile platforms by identifying and quantifying the privacy leakages and then designing privacy preserving frameworks against identified threats. We first investigate the potential of using touch-based gestures to track mobile device users. For this purpose, we propose and develop an analytical framework that quantifies the amount of information carried by the user touch gestures. We then quantify users privacy risk in the web data using probabilistic method that incorporates all key privacy aspects, which are uniqueness, uniformity, and linkability of the web data. We also perform a large-scale study of dependency chains in the web and find that a large proportion of websites under-study load resources from suspicious third-parties that are known to mishandle user data and risk privacy leaks. The second half of the thesis addresses the abovementioned identified privacy risks by designing and developing privacy preserving frameworks for the web and mobile platforms. We propose an on-device privacy preserving framework that minimizes privacy leakages by bringing down the risk of trackability and distinguishability of mobile users while preserving the functionality of the existing apps/services. We finally propose a privacy-aware obfuscation framework for the web data having high predicted risk. Using differentially-private noise addition, our proposed framework is resilient against adversary who has knowledge about the obfuscation mechanism, HMM probabilities and the training dataset

    Privacy protection of user profiles in personalized information systems

    Get PDF
    In recent times we are witnessing the emergence of a wide variety of information systems that tailor the information-exchange functionality to meet the specific interests of their users. Most of these personalized information systems capitalize on, or lend themselves to, the construction of profiles, either directly declared by a user, or inferred from past activity. The ability of these systems to profile users is therefore what enables such intelligent functionality, but at the same time, it is the source of serious privacy concerns. Although there exists a broad range of privacy-enhancing technologies aimed to mitigate many of those concerns, the fact is that their use is far from being widespread. The main reason is that there is a certain ambiguity about these technologies and their effectiveness in terms of privacy protection. Besides, since these technologies normally come at the expense of system functionality and utility, it is challenging to assess whether the gain in privacy compensates for the costs in utility. Assessing the privacy provided by a privacy-enhancing technology is thus crucial to determine its overall benefit, to compare its effectiveness with other technologies, and ultimately to optimize it in terms of the privacy-utility trade-off posed. Considerable effort has consequently been devoted to investigating both privacy and utility metrics. However, most of these metrics are specific to concrete systems and adversary models, and hence are difficult to generalize or translate to other contexts. Moreover, in applications involving user profiles, there are a few proposals for the evaluation of privacy, and those existing are not appropriately justified or fail to justify the choice. The first part of this thesis approaches the fundamental problem of quantifying user privacy. Firstly, we present a theoretical framework for privacy-preserving systems, endowed with a unifying view of privacy in terms of the estimation error incurred by an attacker who aims to disclose the private information that the system is designed to conceal. Our theoretical analysis shows that numerous privacy metrics emerging from a broad spectrum of applications are bijectively related to this estimation error, which permits interpreting and comparing these metrics under a common perspective. Secondly, we tackle the issue of measuring privacy in the enthralling application of personalized information systems. Specifically, we propose two information-theoretic quantities as measures of the privacy of user profiles, and justify these metrics by building on Jaynes' rationale behind entropy-maximization methods and fundamental results from the method of types and hypothesis testing. Equipped with quantifiable measures of privacy and utility, the second part of this thesis investigates privacy-enhancing, data-perturbative mechanisms and architectures for two important classes of personalized information systems. In particular, we study the elimination of tags in semantic-Web applications, and the combination of the forgery and the suppression of ratings in personalized recommendation systems. We design such mechanisms to achieve the optimal privacy-utility trade-off, in the sense of maximizing privacy for a desired utility, or vice versa. We proceed in a systematic fashion by drawing upon the methodology of multiobjective optimization. Our theoretical analysis finds a closed-form solution to the problem of optimal tag suppression, and to the problem of optimal forgery and suppression of ratings. In addition, we provide an extensive theoretical characterization of the trade-off between the contrasting aspects of privacy and utility. Experimental results in real-world applications show the effectiveness of our mechanisms in terms of privacy protection, system functionality and data utility
    corecore