70 research outputs found

    Exposing Inconsistent Web Search Results with Bobble

    Full text link
    Abstract. Given their critical role as gateways to Web content, the search results a Web search engine provides to its users have an out-sized impact on the way each user views the Web. Previous studies have shown that popular Web search engines like Google employ sophisticated personalization engines that can oc-casionally provide dramatically inconsistent views of the Web to different users. Unfortunately, even if users are aware of this potential, it is not straightforward for them to determine the extent to which a particular set of search results differs from those returned to other users, nor the factors that contribute to this person-alization. We present the design and implementation of Bobble, a Web browser extension that contemporaneously executes a user’s Google search query from a variety of different world-wide vantage points under a range of different condi-tions, alerting the user to the extent of inconsistency present in the set of search results returned to them by Google. Using more than 75,000 real search queries issued by over 170 users during a nine-month period, we explore the frequency and nature of inconsistencies that arise in Google search queries. In contrast to previously published results, we find that 98 % of all Google search results dis-play some inconsistency, with a user’s geographic location being the dominant factor influencing the nature of the inconsistency.

    XRay: Enhancing the Web's Transparency with Differential Correlation

    Get PDF
    Today's Web services - such as Google, Amazon, and Facebook - leverage user data for varied purposes, including personalizing recommendations, targeting advertisements, and adjusting prices. At present, users have little insight into how their data is being used. Hence, they cannot make informed choices about the services they choose. To increase transparency, we developed XRay, the first fine-grained, robust, and scalable personal data tracking system for the Web. XRay predicts which data in an arbitrary Web account (such as emails, searches, or viewed products) is being used to target which outputs (such as ads, recommended products, or prices). XRay's core functions are service agnostic and easy to instantiate for new services, and they can track data within and across services. To make predictions independent of the audited service, XRay relies on the following insight: by comparing outputs from different accounts with similar, but not identical, subsets of data, one can pinpoint targeting through correlation. We show both theoretically, and through experiments on Gmail, Amazon, and YouTube, that XRay achieves high precision and recall by correlating data from a surprisingly small number of extra accounts.Comment: Extended version of a paper presented at the 23rd USENIX Security Symposium (USENIX Security 14

    Web Transparency for Complex Targeting: Algorithms, Limits, and Tradeoffs

    Get PDF
    International audienceBig Data promises important societal progress but exacerbates the need for due process and accountability. Companies and institutions can now discriminate between users at an individual level using collected data or past behavior. Worse, today they can do so in near perfect opacity. The nascent field of web transparency aims to develop the tools and methods necessary to reveal how information is used, however today it lacks robust tools that let users and investigators identify targeting using multiple inputs. Here, we formalize for the first time the problem of detecting and identifying targeting on combinations of inputs and provide the first algorithm that is asymptotically exact. This algorithm is designed to serve as a theoretical foundational block to build future scalable and robust web transparency tools. It offers three key properties. First, our algorithm is service agnostic and applies to a variety of settings under a broad set of assumptions. Second, our algorithm's analysis delineates a theoretical detection limit that characterizes which forms of targeting can be distinguished from noise and which cannot. Third, our algorithm establishes fundamental tradeoffs that lead the way to new metrics for the science of web transparency. Understanding the tradeoff between effective targeting and targeting concealment lets us determine under which conditions predatory targeting can be made unprofitable by transparency tools

    Vers une plus grande transparence du Web

    Get PDF
    International audienceDe plus en plus les géants du Web (Amazon, Google et Twitter en tête) recourent a la manne des « Big data » : ils collectent une myriade de données qu'ils exploitent pour leurs algorithmes de recommandation personnalisée et leurs campagnes publicitaires. Pareilles méthodes peuvent considérablement améliorer les services rendus a leurs utilisateurs, mais leur opacité fait débat. En effet, il n'existe pas a ce jour d'outil suffisamment robuste qui puisse tracer sur le Web l'usage des données et des informations sur un utilisateur par des services en ligne. Motivés par ce manque de transparence, nous avons développé un prototype du nom d'XRay, et qui peut prédire quelle donnée parmi toutes celles présentes dans un compte utilisateur est responsable de la réception d'une publicité. Dans cet article, nous présentons son principe ainsi que les résultats de nos premières expérimentations. Nous introduisons dans le même temps le tout premier modèle théorique pour le problème de la transparence du Web, et nous interprétons les performances d'Xray a la lumière de nos résultats obtenus dans ce modèle. En particulier, nous démontrons qu'un nombre θ(log N) de comptes utilisateurs auxiliaires, remplis selon un procédé aléatoire , suffisent a déterminer quelle donnée parmi les N en présence a causé la réception d'une publicité. Nous aborderons brièvement les extensions possibles, et quelques problèmes ouverts

    Measuring the Importance of User-Generated Content to Search Engines

    Full text link
    Search engines are some of the most popular and profitable intelligent technologies in existence. Recent research, however, has suggested that search engines may be surprisingly dependent on user-created content like Wikipedia articles to address user information needs. In this paper, we perform a rigorous audit of the extent to which Google leverages Wikipedia and other user-generated content to respond to queries. Analyzing results for six types of important queries (e.g. most popular, trending, expensive advertising), we observe that Wikipedia appears in over 80% of results pages for some query types and is by far the most prevalent individual content source across all query types. More generally, our results provide empirical information to inform a nascent but rapidly-growing debate surrounding a highly-consequential question: Do users provide enough value to intelligent technologies that they should receive more of the economic benefits from intelligent technologies?Comment: This version includes a bibliography entry that was missing from the first version of the text due to a processing error. This is a preprint of a paper accepted at ICWSM 2019. Please cite that version instea

    Mecanismos de busca e as implicações nos aspectos de privacidade

    Get PDF
    Os mecanismos de busca utilizam de estratégias para lidar com a sobrecarga de informações e disponibilizar resultados mais eficientes, tal como a busca personalizada, que ao coletar diversos dados dos participantes podem resultar em ameaças à privacidade. Este trabalho tem o objetivo de explicitar questões de privacidade, considerando aspectos de consciência e controle do usuário no processo de coleta de dados por parte dos mecanismos de busca. Realizou-se uma análise exploratória nas políticas de privacidade do Google e Bing para identificar possíveis dados coletados por esses mecanismos. Os dados foram agrupados em categorias e analisados tanto em relação à menção nas políticas quanto à possibilidade de controle do usuário. A menção à coleta de dados especificamente pelos mecanismos é pouca evidenciada, e embora haja possibilidade de controle, é necessário interpretar exaustivas políticas e configurações de privacidade para que o usuário tenha consciência sobre esse processo. Conclui-se que a díade privacidade-benefício pode implicar no controle sobre a coleta de dados, e a busca pela relevância nos resultados se sobrepõe à garantia de privacidade, pois quando usuários controlam suas configurações e limitam a coleta de dados, tornam menos relevantes os resultados, e ao personalizar, abrem portas de acesso a seus dados
    corecore