7,660 research outputs found
LAYERED APPROACH FOR PERSONALIZED SEARCH ENGINE LOGS PRIVACY PRESERVING
ABSTRACT In this paper we examine the problem of defending privacy for publishing search engine logs. Search engines play a vital role in the navigation through the enormity of the Web. Privacy-preserving data publishing (PPDP) provides techniques and tools for publishing helpful information while preserving data privacy. Recently, PPDP has received significant attention in research communities, and several approaches have been proposed for different data publishing situations. In this paper we learn privacy preservation for the publication of search engine query logs. Consider a subject that even after eliminating all personal characteristics of the searcher, which can serve as associations to his identity, the magazine of such data, is still subject to privacy attacks from opponents who have partial knowledge about the set. Our tentative results show that the query log can be appropriately anonymized against the particular attack, while retaining a significant volume of helpful data. In this paper we learn about problem in search logs and why the log is not secure and how to create log secure using data mining algorithm and methods like Generalization, Suppression and Quasi identifier
Privacy-preserving Targeted Advertising
Recommendation systems form the center piece of a rapidly growing trillion
dollar online advertisement industry. Even with numerous optimizations and
approximations, collaborative filtering (CF) based approaches require real-time
computations involving very large vectors. Curating and storing such related
profile information vectors on web portals seriously breaches the user's
privacy. Modifying such systems to achieve private recommendations further
requires communication of long encrypted vectors, making the whole process
inefficient. We present a more efficient recommendation system alternative, in
which user profiles are maintained entirely on their device, and appropriate
recommendations are fetched from web portals in an efficient privacy preserving
manner. We base this approach on association rules.Comment: A preliminary version was presented at the 11th INFORMS Workshop on
Data Mining and Decision Analytics (2016
Privacy in Search Logs
Search engine companies collect the "database of intentions", the histories
of their users' search queries. These search logs are a gold mine for
researchers. Search engine companies, however, are wary of publishing search
logs in order not to disclose sensitive information. In this paper we analyze
algorithms for publishing frequent keywords, queries and clicks of a search
log. We first show how methods that achieve variants of -anonymity are
vulnerable to active attacks. We then demonstrate that the stronger guarantee
ensured by -differential privacy unfortunately does not provide any
utility for this problem. We then propose an algorithm ZEALOUS and show how to
set its parameters to achieve -probabilistic privacy. We
also contrast our analysis of ZEALOUS with an analysis by Korolova et al. [17]
that achieves -indistinguishability. Our paper concludes
with a large experimental study using real applications where we compare
ZEALOUS and previous work that achieves -anonymity in search log publishing.
Our results show that ZEALOUS yields comparable utility to anonymity while
at the same time achieving much stronger privacy guarantees
Privacy Violation and Detection Using Pattern Mining Techniques
Privacy, its violations and techniques to bypass privacy violation have grabbed the centre-stage of both academia and industry in recent months. Corporations worldwide have become conscious of the implications of privacy violation and its impact on them and to other stakeholders. Moreover, nations across the world are coming out with privacy protecting legislations to prevent data privacy violations. Such legislations however expose organizations to the issues of intentional or unintentional violation of privacy data. A violation by either malicious external hackers or by internal employees can expose the organizations to costly litigations. In this paper, we propose PRIVDAM; a data mining based intelligent architecture of a Privacy Violation Detection and Monitoring system whose purpose is to detect possible privacy violations and to prevent them in the future. Experimental evaluations show that our approach is scalable and robust and that it can detect privacy violations or chances of violations quite accurately. Please contact the author for full text at [email protected]
Analysing Parallel and Passive Web Browsing Behavior and its Effects on Website Metrics
Getting deeper insights into the online browsing behavior of Web users has
been a major research topic since the advent of the WWW. It provides useful
information to optimize website design, Web browser design, search engines
offerings, and online advertisement. We argue that new technologies and new
services continue to have significant effects on the way how people browse the
Web. For example, listening to music clips on YouTube or to a radio station on
Last.fm does not require users to sit in front of their computer. Social media
and networking sites like Facebook or micro-blogging sites like Twitter have
attracted new types of users that previously were less inclined to go online.
These changes in how people browse the Web feature new characteristics which
are not well understood so far. In this paper, we provide novel and unique
insights by presenting first results of DOBBS, our long-term effort to create a
comprehensive and representative dataset capturing online user behavior. We
firstly investigate the concepts of parallel browsing and passive browsing,
showing that browsing the Web is no longer a dedicated task for many users.
Based on these results, we then analyze their impact on the calculation of a
user's dwell time -- i.e., the time the user spends on a webpage -- which has
become an important metric to quantify the popularity of websites.Comment: 22 pages, 11 figures, 3 tables, 29 references. arXiv admin note: text
overlap with arXiv:1307.154
Internet Advertising: An Interplay among Advertisers, Online Publishers, Ad Exchanges and Web Users
Internet advertising is a fast growing business which has proved to be
significantly important in digital economics. It is vitally important for both
web search engines and online content providers and publishers because web
advertising provides them with major sources of revenue. Its presence is
increasingly important for the whole media industry due to the influence of the
Web. For advertisers, it is a smarter alternative to traditional marketing
media such as TVs and newspapers. As the web evolves and data collection
continues, the design of methods for more targeted, interactive, and friendly
advertising may have a major impact on the way our digital economy evolves, and
to aid societal development.
Towards this goal mathematically well-grounded Computational Advertising
methods are becoming necessary and will continue to develop as a fundamental
tool towards the Web. As a vibrant new discipline, Internet advertising
requires effort from different research domains including Information
Retrieval, Machine Learning, Data Mining and Analytic, Statistics, Economics,
and even Psychology to predict and understand user behaviours. In this paper,
we provide a comprehensive survey on Internet advertising, discussing and
classifying the research issues, identifying the recent technologies, and
suggesting its future directions. To have a comprehensive picture, we first
start with a brief history, introduction, and classification of the industry
and present a schematic view of the new advertising ecosystem. We then
introduce four major participants, namely advertisers, online publishers, ad
exchanges and web users; and through analysing and discussing the major
research problems and existing solutions from their perspectives respectively,
we discover and aggregate the fundamental problems that characterise the
newly-formed research field and capture its potential future prospects.Comment: 44 pages, 7 figures, 6 tables. Submitted to Information Processing
and Managemen
Mobile Information Retrieval
Mobile Information Retrieval (Mobile IR) is a relatively recent branch of
Information Retrieval (IR) that is concerned with enabling users to carry out,
using a mobile device, all the classical IR operations that they were used to
carry out on a desktop. This includes finding content available on local
repositories or on the web in response to a user query, interacting with the
system in an explicit or implicit way, reformulate the query and/or visualise
the content of the retrieved documents, as well as providing relevance
judgments to improve the retrieval process.
This book is structured as follows. Chapter 2 provides a very brief overview
of IR and of Mobile IR, briefly outlining what in Mobile IR is different from
IR. Chapter 3 provides the foundations of Mobile IR, looking at the
characteristics of mobile devices and what they bring to IR, but also looking
at how the concept of relevance changed from standard IR to Mobile IR. Chapter
4 presents an overview of the document collections that are searchable by a
Mobile IR system, and that are somehow different from classical IR ones;
available for experimentation, including collections of data that have become
complementary to Mobile IR. Similarly, Chapter 5 reviews mobile information
needs studies and users log analysis. Chapter 6 reviews studies aimed at
adapting and improving the users interface to the needs of Mobile IR. Chapter
7, instead, reviews work on context awareness, which studies the many aspects
of the user context that Mobile IR employs. Chapter 8 reviews some of
evaluation work done in Mobile IR, highlighting the distinctions with classical
IR from the perspectives of two main IR evaluation methodologies: users studies
and test collections. Finally, Chapter 9 reports the conclusions of this
review, highlighting briefly some trends in Mobile IR that we believe will
drive research in the next few years.Comment: 116 pages, published in 201
Enabling Semantic Analysis of User Browsing Patterns in the Web of Data
A useful step towards better interpretation and analysis of the usage
patterns is to formalize the semantics of the resources that users are
accessing in the Web. We focus on this problem and present an approach for the
semantic formalization of usage logs, which lays the basis for eective
techniques of querying expressive usage patterns. We also present a query
answering approach, which is useful to nd in the logs expressive patterns of
usage behavior via formulation of semantic and temporal-based constraints. We
have processed over 30 thousand user browsing sessions extracted from usage
logs of DBPedia and Semantic Web Dog Food. All these events are formalized
semantically using respective domain ontologies and RDF representations of the
Web resources being accessed. We show the eectiveness of our approach through
experimental results, providing in this way an exploratory analysis of the way
users browse theWeb of Data.Comment: 2nd International Workshop on Usage Analysis and the Web of Data
(USEWOD2012) in the 21st International World Wide Web Conference (WWW2012),
Lyon, France, April 17th, 201
Share your Model instead of your Data: Privacy Preserving Mimic Learning for Ranking
Deep neural networks have become a primary tool for solving problems in many
fields. They are also used for addressing information retrieval problems and
show strong performance in several tasks. Training these models requires large,
representative datasets and for most IR tasks, such data contains sensitive
information from users. Privacy and confidentiality concerns prevent many data
owners from sharing the data, thus today the research community can only
benefit from research on large-scale datasets in a limited manner. In this
paper, we discuss privacy preserving mimic learning, i.e., using predictions
from a privacy preserving trained model instead of labels from the original
sensitive training data as a supervision signal. We present the results of
preliminary experiments in which we apply the idea of mimic learning and
privacy preserving mimic learning for the task of document re-ranking as one of
the core IR tasks. This research is a step toward laying the ground for
enabling researchers from data-rich environments to share knowledge learned
from actual users' data, which should facilitate research collaborations.Comment: SIGIR 2017 Workshop on Neural Information Retrieval
(Neu-IR'17)}{}{August 7--11, 2017, Shinjuku, Tokyo, Japa
Who you gonna call? Analyzing Web Requests in Android Applications
Relying on ubiquitous Internet connectivity, applications on mobile devices
frequently perform web requests during their execution. They fetch data for
users to interact with, invoke remote functionalities, or send user-generated
content or meta-data. These requests collectively reveal common practices of
mobile application development, like what external services are used and how,
and they point to possible negative effects like security and privacy
violations, or impacts on battery life. In this paper, we assess different ways
to analyze what web requests Android applications make. We start by presenting
dynamic data collected from running 20 randomly selected Android applications
and observing their network activity. Next, we present a static analysis tool,
Stringoid, that analyzes string concatenations in Android applications to
estimate constructed URL strings. Using Stringoid, we extract URLs from 30, 000
Android applications, and compare the performance with a simpler constant
extraction analysis. Finally, we present a discussion of the advantages and
limitations of dynamic and static analyses when extracting URLs, as we compare
the data extracted by Stringoid from the same 20 applications with the
dynamically collected data
- …