741 research outputs found
Profiling user activities with minimal traffic traces
Understanding user behavior is essential to personalize and enrich a user's
online experience. While there are significant benefits to be accrued from the
pursuit of personalized services based on a fine-grained behavioral analysis,
care must be taken to address user privacy concerns. In this paper, we consider
the use of web traces with truncated URLs - each URL is trimmed to only contain
the web domain - for this purpose. While such truncation removes the
fine-grained sensitive information, it also strips the data of many features
that are crucial to the profiling of user activity. We show how to overcome the
severe handicap of lack of crucial features for the purpose of filtering out
the URLs representing a user activity from the noisy network traffic trace
(including advertisement, spam, analytics, webscripts) with high accuracy. This
activity profiling with truncated URLs enables the network operators to provide
personalized services while mitigating privacy concerns by storing and sharing
only truncated traffic traces.
In order to offset the accuracy loss due to truncation, our statistical
methodology leverages specialized features extracted from a group of
consecutive URLs that represent a micro user action like web click, chat reply,
etc., which we call bursts. These bursts, in turn, are detected by a novel
algorithm which is based on our observed characteristics of the inter-arrival
time of HTTP records. We present an extensive experimental evaluation on a real
dataset of mobile web traces, consisting of more than 130 million records,
representing the browsing activities of 10,000 users over a period of 30 days.
Our results show that the proposed methodology achieves around 90% accuracy in
segregating URLs representing user activities from non-representative URLs
Building K-Anonymous User Cohorts with\\ Consecutive Consistent Weighted Sampling (CCWS)
To retrieve personalized campaigns and creatives while protecting user
privacy, digital advertising is shifting from member-based identity to
cohort-based identity. Under such identity regime, an accurate and efficient
cohort building algorithm is desired to group users with similar
characteristics. In this paper, we propose a scalable -anonymous cohort
building algorithm called {\em consecutive consistent weighted sampling}
(CCWS). The proposed method combines the spirit of the (-powered) consistent
weighted sampling and hierarchical clustering, so that the -anonymity is
ensured by enforcing a lower bound on the size of cohorts. Evaluations on a
LinkedIn dataset consisting of M users and ads campaigns demonstrate that
CCWS achieves substantial improvements over several hashing-based methods
including sign random projections (SignRP), minwise hashing (MinHash), as well
as the vanilla CWS
Venue2Vec: An efficient embedding model for fine-grained user location prediction in geo-social networks
Geo-Social Networks (GSN) significantly improve location-aware capability of services by offering geo-located content based on the huge volumes of data generated in the GSN. The problem of user location prediction based on user-generated data in GSN has been extensively studied. However, existing studies are either concerning predicting users' next check-in location or predicting their future check-in location at a given time with coarse granularity. A unified model that can predict both scenarios with fine granularity is quite rare. Also, due to the heterogeneity of multiple factors associated with both locations and users, how to efficiently incorporate these information still remains challenging. Inspired by the recent success of word embedding in natural language processing, in this paper, we propose a novel embedding model called Venue2Vec which automatically incorporates temporal-spatial context, semantic information, and sequential relations for fine-grained user location prediction. Locations of the same type, and those that are geographically close or often visited successively by users will be situated closer within the embedding space. Based on our proposed Venue2Vec model, we design techniques that allow for predicting a user's next check-in location, and also their future check-in location at a given time. We conduct experiments on three real-world GSN datasets to verify the performance of the proposed model. Experimental results on both tasks show that Venue2Vec model outperforms several state-of-the-art models on various evaluation metrics. Furthermore, we show how the Venue2Vec model can be more time-efficient due to being parallelizable
PERSONALIZED POINT OF INTEREST RECOMMENDATIONS WITH PRIVACY-PRESERVING TECHNIQUES
Location-based services (LBS) have become increasingly popular, with millions of people using mobile devices to access information about nearby points of interest (POIs). Personalized POI recommender systems have been developed to assist users in discovering and navigating these POIs. However, these systems typically require large amounts of user data, including location history and preferences, to provide personalized recommendations.
The collection and use of such data can pose significant privacy concerns. This dissertation proposes a privacy-preserving approach to POI recommendations that address these privacy concerns. The proposed approach uses clustering, tabular generative adversarial networks, and differential privacy to generate synthetic user data, allowing for personalized recommendations without revealing individual user data. Specifically, the approach clusters users based on their fuzzy locations, generates synthetic user data using a tabular generative adversarial network and perturbs user data with differential privacy before it is used for recommendation.
The proposed approaches achieve well-balanced trade-offs between accuracy and privacy preservation and can be applied to different recommender systems. The approach is evaluated through extensive experiments on real-world POI datasets, demonstrating that it is effective in providing personalized recommendations while preserving user privacy. The results show that the proposed approach achieves comparable accuracy to traditional POI recommender systems that do not consider privacy while providing significant privacy guarantees for users.
The research\u27s contribution is twofold: it compares different methods for synthesizing user data specifically for POI recommender systems and offers a general privacy-preserving framework for different recommender systems. The proposed approach provides a novel solution to the privacy concerns of POI recommender systems, contributes to the development of more trustworthy and user-friendly LBS applications, and can enhance the trust of users in these systems
A Cascade Framework for Privacy-Preserving Point-of-Interest Recommender System
Point-of-interest (POI) recommender systems (RSes) have gained significant popularity in recent years due to the prosperity of location-based social networks (LBSN). However, in the interest of personalization services, various sensitive contextual information is collected, causing potential privacy concerns. This paper proposes a cascaded privacy-preserving POI recommendation (CRS) framework that protects contextual information such as user comments and locations. We demonstrate a minimized trade-off between the privacy-preserving feature and prediction accuracy by applying a semi-decentralized model to real-world datasets
Comparing Predictions of Object Movements
Estimating the future location of moving objects using different estimation models, such as linear or probabilistic models, has been investigated extensively. However, the location estimations of those models are generally not comparable. For instance, one model might return a position for some object, another one a Gaussian probability distribution, and a third one a uniform distribution. Similar issues arise for query answers. In this paper, we examine the question how estimations of different models can be compared. To do so, we propose a general model based on the central limit theorem. This allows handling different PDF-based approaches as well as models from the other groups (i.e., linear estimations) in a unified manner. Furthermore, we show how to inject privacy into the general model, a fundamental pre-requisite for user acceptance. Thus, we support well-known approaches like k-anonymity and spatial obfuscation. Based on our general model, we conduct a comprehensive experimental study considering a real-world road network; comparing models form different groups for the first time. Our results, for instance, reveal that estimation models based on individual velocity profiles are not necessarily better than models, which estimate the future location of objects only based on their direction. In more abstract terms, our general model allows comparison of estimation models that could not be compared before and gives way to build models that solve the privacy-accuracy challenge
Big privacy: challenges and opportunities of privacy study in the age of big data
One of the biggest concerns of big data is privacy. However, the study on big data privacy is still at a very early stage. We believe the forthcoming solutions and theories of big data privacy root from the in place research output of the privacy discipline. Motivated by these factors, we extensively survey the existing research outputs and achievements of the privacy field in both application and theoretical angles, aiming to pave a solid starting ground for interested readers to address the challenges in the big data case. We first present an overview of the battle ground by defining the roles and operations of privacy systems. Second, we review the milestones of the current two major research categories of privacy: data clustering and privacy frameworks. Third, we discuss the effort of privacy study from the perspectives of different disciplines, respectively. Fourth, the mathematical description, measurement, and modeling on privacy are presented. We summarize the challenges and opportunities of this promising topic at the end of this paper, hoping to shed light on the exciting and almost uncharted land
- …