7,660 research outputs found

    LAYERED APPROACH FOR PERSONALIZED SEARCH ENGINE LOGS PRIVACY PRESERVING

    Get PDF
    ABSTRACT In this paper we examine the problem of defending privacy for publishing search engine logs. Search engines play a vital role in the navigation through the enormity of the Web. Privacy-preserving data publishing (PPDP) provides techniques and tools for publishing helpful information while preserving data privacy. Recently, PPDP has received significant attention in research communities, and several approaches have been proposed for different data publishing situations. In this paper we learn privacy preservation for the publication of search engine query logs. Consider a subject that even after eliminating all personal characteristics of the searcher, which can serve as associations to his identity, the magazine of such data, is still subject to privacy attacks from opponents who have partial knowledge about the set. Our tentative results show that the query log can be appropriately anonymized against the particular attack, while retaining a significant volume of helpful data. In this paper we learn about problem in search logs and why the log is not secure and how to create log secure using data mining algorithm and methods like Generalization, Suppression and Quasi identifier

    Privacy-preserving Targeted Advertising

    Full text link
    Recommendation systems form the center piece of a rapidly growing trillion dollar online advertisement industry. Even with numerous optimizations and approximations, collaborative filtering (CF) based approaches require real-time computations involving very large vectors. Curating and storing such related profile information vectors on web portals seriously breaches the user's privacy. Modifying such systems to achieve private recommendations further requires communication of long encrypted vectors, making the whole process inefficient. We present a more efficient recommendation system alternative, in which user profiles are maintained entirely on their device, and appropriate recommendations are fetched from web portals in an efficient privacy preserving manner. We base this approach on association rules.Comment: A preliminary version was presented at the 11th INFORMS Workshop on Data Mining and Decision Analytics (2016

    Privacy in Search Logs

    Full text link
    Search engine companies collect the "database of intentions", the histories of their users' search queries. These search logs are a gold mine for researchers. Search engine companies, however, are wary of publishing search logs in order not to disclose sensitive information. In this paper we analyze algorithms for publishing frequent keywords, queries and clicks of a search log. We first show how methods that achieve variants of kk-anonymity are vulnerable to active attacks. We then demonstrate that the stronger guarantee ensured by ϵ\epsilon-differential privacy unfortunately does not provide any utility for this problem. We then propose an algorithm ZEALOUS and show how to set its parameters to achieve (ϵ,δ)(\epsilon,\delta)-probabilistic privacy. We also contrast our analysis of ZEALOUS with an analysis by Korolova et al. [17] that achieves (ϵ,δ)(\epsilon',\delta')-indistinguishability. Our paper concludes with a large experimental study using real applications where we compare ZEALOUS and previous work that achieves kk-anonymity in search log publishing. Our results show that ZEALOUS yields comparable utility to kk-anonymity while at the same time achieving much stronger privacy guarantees

    Privacy Violation and Detection Using Pattern Mining Techniques

    Get PDF
    Privacy, its violations and techniques to bypass privacy violation have grabbed the centre-stage of both academia and industry in recent months. Corporations worldwide have become conscious of the implications of privacy violation and its impact on them and to other stakeholders. Moreover, nations across the world are coming out with privacy protecting legislations to prevent data privacy violations. Such legislations however expose organizations to the issues of intentional or unintentional violation of privacy data. A violation by either malicious external hackers or by internal employees can expose the organizations to costly litigations. In this paper, we propose PRIVDAM; a data mining based intelligent architecture of a Privacy Violation Detection and Monitoring system whose purpose is to detect possible privacy violations and to prevent them in the future. Experimental evaluations show that our approach is scalable and robust and that it can detect privacy violations or chances of violations quite accurately. Please contact the author for full text at [email protected]

    Analysing Parallel and Passive Web Browsing Behavior and its Effects on Website Metrics

    Full text link
    Getting deeper insights into the online browsing behavior of Web users has been a major research topic since the advent of the WWW. It provides useful information to optimize website design, Web browser design, search engines offerings, and online advertisement. We argue that new technologies and new services continue to have significant effects on the way how people browse the Web. For example, listening to music clips on YouTube or to a radio station on Last.fm does not require users to sit in front of their computer. Social media and networking sites like Facebook or micro-blogging sites like Twitter have attracted new types of users that previously were less inclined to go online. These changes in how people browse the Web feature new characteristics which are not well understood so far. In this paper, we provide novel and unique insights by presenting first results of DOBBS, our long-term effort to create a comprehensive and representative dataset capturing online user behavior. We firstly investigate the concepts of parallel browsing and passive browsing, showing that browsing the Web is no longer a dedicated task for many users. Based on these results, we then analyze their impact on the calculation of a user's dwell time -- i.e., the time the user spends on a webpage -- which has become an important metric to quantify the popularity of websites.Comment: 22 pages, 11 figures, 3 tables, 29 references. arXiv admin note: text overlap with arXiv:1307.154

    Internet Advertising: An Interplay among Advertisers, Online Publishers, Ad Exchanges and Web Users

    Full text link
    Internet advertising is a fast growing business which has proved to be significantly important in digital economics. It is vitally important for both web search engines and online content providers and publishers because web advertising provides them with major sources of revenue. Its presence is increasingly important for the whole media industry due to the influence of the Web. For advertisers, it is a smarter alternative to traditional marketing media such as TVs and newspapers. As the web evolves and data collection continues, the design of methods for more targeted, interactive, and friendly advertising may have a major impact on the way our digital economy evolves, and to aid societal development. Towards this goal mathematically well-grounded Computational Advertising methods are becoming necessary and will continue to develop as a fundamental tool towards the Web. As a vibrant new discipline, Internet advertising requires effort from different research domains including Information Retrieval, Machine Learning, Data Mining and Analytic, Statistics, Economics, and even Psychology to predict and understand user behaviours. In this paper, we provide a comprehensive survey on Internet advertising, discussing and classifying the research issues, identifying the recent technologies, and suggesting its future directions. To have a comprehensive picture, we first start with a brief history, introduction, and classification of the industry and present a schematic view of the new advertising ecosystem. We then introduce four major participants, namely advertisers, online publishers, ad exchanges and web users; and through analysing and discussing the major research problems and existing solutions from their perspectives respectively, we discover and aggregate the fundamental problems that characterise the newly-formed research field and capture its potential future prospects.Comment: 44 pages, 7 figures, 6 tables. Submitted to Information Processing and Managemen

    Mobile Information Retrieval

    Full text link
    Mobile Information Retrieval (Mobile IR) is a relatively recent branch of Information Retrieval (IR) that is concerned with enabling users to carry out, using a mobile device, all the classical IR operations that they were used to carry out on a desktop. This includes finding content available on local repositories or on the web in response to a user query, interacting with the system in an explicit or implicit way, reformulate the query and/or visualise the content of the retrieved documents, as well as providing relevance judgments to improve the retrieval process. This book is structured as follows. Chapter 2 provides a very brief overview of IR and of Mobile IR, briefly outlining what in Mobile IR is different from IR. Chapter 3 provides the foundations of Mobile IR, looking at the characteristics of mobile devices and what they bring to IR, but also looking at how the concept of relevance changed from standard IR to Mobile IR. Chapter 4 presents an overview of the document collections that are searchable by a Mobile IR system, and that are somehow different from classical IR ones; available for experimentation, including collections of data that have become complementary to Mobile IR. Similarly, Chapter 5 reviews mobile information needs studies and users log analysis. Chapter 6 reviews studies aimed at adapting and improving the users interface to the needs of Mobile IR. Chapter 7, instead, reviews work on context awareness, which studies the many aspects of the user context that Mobile IR employs. Chapter 8 reviews some of evaluation work done in Mobile IR, highlighting the distinctions with classical IR from the perspectives of two main IR evaluation methodologies: users studies and test collections. Finally, Chapter 9 reports the conclusions of this review, highlighting briefly some trends in Mobile IR that we believe will drive research in the next few years.Comment: 116 pages, published in 201

    Enabling Semantic Analysis of User Browsing Patterns in the Web of Data

    Full text link
    A useful step towards better interpretation and analysis of the usage patterns is to formalize the semantics of the resources that users are accessing in the Web. We focus on this problem and present an approach for the semantic formalization of usage logs, which lays the basis for eective techniques of querying expressive usage patterns. We also present a query answering approach, which is useful to nd in the logs expressive patterns of usage behavior via formulation of semantic and temporal-based constraints. We have processed over 30 thousand user browsing sessions extracted from usage logs of DBPedia and Semantic Web Dog Food. All these events are formalized semantically using respective domain ontologies and RDF representations of the Web resources being accessed. We show the eectiveness of our approach through experimental results, providing in this way an exploratory analysis of the way users browse theWeb of Data.Comment: 2nd International Workshop on Usage Analysis and the Web of Data (USEWOD2012) in the 21st International World Wide Web Conference (WWW2012), Lyon, France, April 17th, 201

    Share your Model instead of your Data: Privacy Preserving Mimic Learning for Ranking

    Get PDF
    Deep neural networks have become a primary tool for solving problems in many fields. They are also used for addressing information retrieval problems and show strong performance in several tasks. Training these models requires large, representative datasets and for most IR tasks, such data contains sensitive information from users. Privacy and confidentiality concerns prevent many data owners from sharing the data, thus today the research community can only benefit from research on large-scale datasets in a limited manner. In this paper, we discuss privacy preserving mimic learning, i.e., using predictions from a privacy preserving trained model instead of labels from the original sensitive training data as a supervision signal. We present the results of preliminary experiments in which we apply the idea of mimic learning and privacy preserving mimic learning for the task of document re-ranking as one of the core IR tasks. This research is a step toward laying the ground for enabling researchers from data-rich environments to share knowledge learned from actual users' data, which should facilitate research collaborations.Comment: SIGIR 2017 Workshop on Neural Information Retrieval (Neu-IR'17)}{}{August 7--11, 2017, Shinjuku, Tokyo, Japa

    Who you gonna call? Analyzing Web Requests in Android Applications

    Full text link
    Relying on ubiquitous Internet connectivity, applications on mobile devices frequently perform web requests during their execution. They fetch data for users to interact with, invoke remote functionalities, or send user-generated content or meta-data. These requests collectively reveal common practices of mobile application development, like what external services are used and how, and they point to possible negative effects like security and privacy violations, or impacts on battery life. In this paper, we assess different ways to analyze what web requests Android applications make. We start by presenting dynamic data collected from running 20 randomly selected Android applications and observing their network activity. Next, we present a static analysis tool, Stringoid, that analyzes string concatenations in Android applications to estimate constructed URL strings. Using Stringoid, we extract URLs from 30, 000 Android applications, and compare the performance with a simpler constant extraction analysis. Finally, we present a discussion of the advantages and limitations of dynamic and static analyses when extracting URLs, as we compare the data extracted by Stringoid from the same 20 applications with the dynamically collected data
    corecore