Search CORE

741 research outputs found

Profiling user activities with minimal traffic traces

Author: J Zhang
L Sweeney
T Fawcett
TTT Nguyen
Publication venue
Publication date: 07/04/2015
Field of study

Understanding user behavior is essential to personalize and enrich a user's online experience. While there are significant benefits to be accrued from the pursuit of personalized services based on a fine-grained behavioral analysis, care must be taken to address user privacy concerns. In this paper, we consider the use of web traces with truncated URLs - each URL is trimmed to only contain the web domain - for this purpose. While such truncation removes the fine-grained sensitive information, it also strips the data of many features that are crucial to the profiling of user activity. We show how to overcome the severe handicap of lack of crucial features for the purpose of filtering out the URLs representing a user activity from the noisy network traffic trace (including advertisement, spam, analytics, webscripts) with high accuracy. This activity profiling with truncated URLs enables the network operators to provide personalized services while mitigating privacy concerns by storing and sharing only truncated traffic traces. In order to offset the accuracy loss due to truncation, our statistical methodology leverages specialized features extracted from a group of consecutive URLs that represent a micro user action like web click, chat reply, etc., which we call bursts. These bursts, in turn, are detected by a novel algorithm which is based on our observed characteristics of the inter-arrival time of HTTP records. We present an extensive experimental evaluation on a real dataset of mobile web traces, consisting of more than 130 million records, representing the browsing activities of 10,000 users over a period of 30 days. Our results show that the proposed methodology achieves around 90% accuracy in segregating URLs representing user activities from non-representative URLs

arXiv.org e-Print Archive

Crossref

Building K-Anonymous User Cohorts with\\ Consecutive Consistent Weighted Sampling (CCWS)

Author: Li Ping
Li Xiaoyun
Zhao Weijie
Zheng Xinyi
Publication venue
Publication date: 26/04/2023
Field of study

To retrieve personalized campaigns and creatives while protecting user privacy, digital advertising is shifting from member-based identity to cohort-based identity. Under such identity regime, an accurate and efficient cohort building algorithm is desired to group users with similar characteristics. In this paper, we propose a scalable

K

-anonymous cohort building algorithm called {\em consecutive consistent weighted sampling} (CCWS). The proposed method combines the spirit of the (

p

-powered) consistent weighted sampling and hierarchical clustering, so that the

K

-anonymity is ensured by enforcing a lower bound on the size of cohorts. Evaluations on a LinkedIn dataset consisting of

>70

M users and ads campaigns demonstrate that CCWS achieves substantial improvements over several hashing-based methods including sign random projections (SignRP), minwise hashing (MinHash), as well as the vanilla CWS

arXiv.org e-Print Archive

Venue2Vec: An efficient embedding model for fine-grained user location prediction in geo-social networks

Author: Cao Jiuxin
Legg Phil
Li Shancang
Liu Bo
Xu Shuai
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/06/2020
Field of study

Geo-Social Networks (GSN) significantly improve location-aware capability of services by offering geo-located content based on the huge volumes of data generated in the GSN. The problem of user location prediction based on user-generated data in GSN has been extensively studied. However, existing studies are either concerning predicting users' next check-in location or predicting their future check-in location at a given time with coarse granularity. A unified model that can predict both scenarios with fine granularity is quite rare. Also, due to the heterogeneity of multiple factors associated with both locations and users, how to efficiently incorporate these information still remains challenging. Inspired by the recent success of word embedding in natural language processing, in this paper, we propose a novel embedding model called Venue2Vec which automatically incorporates temporal-spatial context, semantic information, and sequential relations for fine-grained user location prediction. Locations of the same type, and those that are geographically close or often visited successively by users will be situated closer within the embedding space. Based on our proposed Venue2Vec model, we design techniques that allow for predicting a user's next check-in location, and also their future check-in location at a given time. We conduct experiments on three real-world GSN datasets to verify the performance of the proposed model. Experimental results on both tasks show that Venue2Vec model outperforms several state-of-the-art models on various evaluation metrics. Furthermore, we show how the Venue2Vec model can be more time-efficient due to being parallelizable

UWE Bristol Research Repository

Unsupervised learning on social data

Author: Borutta Felix
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 11/03/2020
Field of study

Unsupervised learning on social data

Author: Borutta Felix
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 11/03/2020
Field of study

Digitale Hochschulschriften der LMU

PERSONALIZED POINT OF INTEREST RECOMMENDATIONS WITH PRIVACY-PRESERVING TECHNIQUES

Author: Cui Longyin
Publication venue: UKnowledge
Publication date: 01/01/2023
Field of study

Location-based services (LBS) have become increasingly popular, with millions of people using mobile devices to access information about nearby points of interest (POIs). Personalized POI recommender systems have been developed to assist users in discovering and navigating these POIs. However, these systems typically require large amounts of user data, including location history and preferences, to provide personalized recommendations. The collection and use of such data can pose significant privacy concerns. This dissertation proposes a privacy-preserving approach to POI recommendations that address these privacy concerns. The proposed approach uses clustering, tabular generative adversarial networks, and differential privacy to generate synthetic user data, allowing for personalized recommendations without revealing individual user data. Specifically, the approach clusters users based on their fuzzy locations, generates synthetic user data using a tabular generative adversarial network and perturbs user data with differential privacy before it is used for recommendation. The proposed approaches achieve well-balanced trade-offs between accuracy and privacy preservation and can be applied to different recommender systems. The approach is evaluated through extensive experiments on real-world POI datasets, demonstrating that it is effective in providing personalized recommendations while preserving user privacy. The results show that the proposed approach achieves comparable accuracy to traditional POI recommender systems that do not consider privacy while providing significant privacy guarantees for users. The research\u27s contribution is twofold: it compares different methods for synthesizing user data specifically for POI recommender systems and offers a general privacy-preserving framework for different recommender systems. The proposed approach provides a novel solution to the privacy concerns of POI recommender systems, contributes to the development of more trustworthy and user-friendly LBS applications, and can enhance the trust of users in these systems

University of Kentucky

A Cascade Framework for Privacy-Preserving Point-of-Interest Recommender System

Author: Cui Longyin
Wang Xiwei
Publication venue: 'MDPI AG'
Publication date: 01/04/2022
Field of study

Point-of-interest (POI) recommender systems (RSes) have gained significant popularity in recent years due to the prosperity of location-based social networks (LBSN). However, in the interest of personalization services, various sensitive contextual information is collected, causing potential privacy concerns. This paper proposes a cascaded privacy-preserving POI recommendation (CRS) framework that protects contextual information such as user comments and locations. We demonstrate a minimized trade-off between the privacy-preserving feature and prediction accuracy by applying a semi-decentralized model to real-world datasets

Directory of Open Access Journals

NEIU Digital Commons (Northeastern Illinois University)

Comparing Predictions of Object Movements

Author: Böhm Klemens
Schäler Martin
Taghizadeh Saeed
Publication venue: Karlsruher Institut für Technologie
Publication date: 01/01/2018
Field of study

Estimating the future location of moving objects using different estimation models, such as linear or probabilistic models, has been investigated extensively. However, the location estimations of those models are generally not comparable. For instance, one model might return a position for some object, another one a Gaussian probability distribution, and a third one a uniform distribution. Similar issues arise for query answers. In this paper, we examine the question how estimations of different models can be compared. To do so, we propose a general model based on the central limit theorem. This allows handling different PDF-based approaches as well as models from the other groups (i.e., linear estimations) in a unified manner. Furthermore, we show how to inject privacy into the general model, a fundamental pre-requisite for user acceptance. Thus, we support well-known approaches like k-anonymity and spatial obfuscation. Based on our general model, we conduct a comprehensive experimental study considering a real-world road network; comparing models form different groups for the first time. Our results, for instance, reveal that estimation models based on individual velocity profiles are not necessarily better than models, which estimate the future location of objects only based on their direction. In more abstract terms, our general model allows comparison of estimation models that could not be compared before and gives way to build models that solve the privacy-accuracy challenge

KITopen

Big privacy: challenges and opportunities of privacy study in the age of big data

Author: Yu Shui
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

One of the biggest concerns of big data is privacy. However, the study on big data privacy is still at a very early stage. We believe the forthcoming solutions and theories of big data privacy root from the in place research output of the privacy discipline. Motivated by these factors, we extensively survey the existing research outputs and achievements of the privacy field in both application and theoretical angles, aiming to pave a solid starting ground for interested readers to address the challenges in the big data case. We first present an overview of the battle ground by defining the roles and operations of privacy systems. Second, we review the milestones of the current two major research categories of privacy: data clustering and privacy frameworks. Third, we discuss the effort of privacy study from the perspectives of different disciplines, respectively. Fourth, the mathematical description, measurement, and modeling on privacy are presented. We summarize the challenges and opportunities of this promising topic at the end of this paper, hoping to shed light on the exciting and almost uncharted land

Deakin Research Online