2,624 research outputs found
Privacy-Aware Recommender Systems Challenge on Twitter's Home Timeline
Recommender systems constitute the core engine of most social network
platforms nowadays, aiming to maximize user satisfaction along with other key
business objectives. Twitter is no exception. Despite the fact that Twitter
data has been extensively used to understand socioeconomic and political
phenomena and user behaviour, the implicit feedback provided by users on Tweets
through their engagements on the Home Timeline has only been explored to a
limited extent. At the same time, there is a lack of large-scale public social
network datasets that would enable the scientific community to both benchmark
and build more powerful and comprehensive models that tailor content to user
interests. By releasing an original dataset of 160 million Tweets along with
engagement information, Twitter aims to address exactly that. During this
release, special attention is drawn on maintaining compliance with existing
privacy laws. Apart from user privacy, this paper touches on the key challenges
faced by researchers and professionals striving to predict user engagements. It
further describes the key aspects of the RecSys 2020 Challenge that was
organized by ACM RecSys in partnership with Twitter using this dataset.Comment: 16 pages, 2 table
Machine Learning and Integrative Analysis of Biomedical Big Data.
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
Differentially Private Link Prediction With Protected Connections
Link prediction (LP) algorithms propose to each node a ranked list of nodes
that are currently non-neighbors, as the most likely candidates for future
linkage. Owing to increasing concerns about privacy, users (nodes) may prefer
to keep some of their connections protected or private. Motivated by this
observation, our goal is to design a differentially private LP algorithm, which
trades off between privacy of the protected node-pairs and the link prediction
accuracy. More specifically, we first propose a form of differential privacy on
graphs, which models the privacy loss only of those node-pairs which are marked
as protected. Next, we develop DPLP , a learning to rank algorithm, which
applies a monotone transform to base scores from a non-private LP system, and
then adds noise. DPLP is trained with a privacy induced ranking loss, which
optimizes the ranking utility for a given maximum allowed level of privacy
leakage of the protected node-pairs. Under a recently-introduced latent node
embedding model, we present a formal trade-off between privacy and LP utility.
Extensive experiments with several real-life graphs and several LP heuristics
show that DPLP can trade off between privacy and predictive performance more
effectively than several alternatives
Towards Data Privacy and Utility in the Applications of Graph Neural Networks
Graph Neural Networks (GNNs) are essential for handling graph-structured data, often containing sensitive information. It’s vital to maintain a balance between data privacy and usability. To address this, this dissertation introduces three studies aimed at enhancing privacy and utility in GNN applications, particularly in node classification, link prediction, and graph classification. The first work tackles celebrity privacy in social networks. We develop a novel framework using adversarial learning for link-privacy preserved graph embedding, which effectively safeguards sensitive links without compromising the graph’s structure and node attributes. This approach is validated using real social network data. In the second work, we confront challenges in federated graph learning with non-independent and identically distributed (non-IID) data. We introduce PPFL-GNN, a privacy-preserving federated graph neural network framework that mitigates overfitting on the client side and inefficient aggregation on the server side. It leverages local graph data for embeddings and employs embedding alignment techniques for enhanced privacy, addressing the hurdles in federated learning on non-IID graph data. The third work explores Few-Shot graph classification, which aims to classify novel graph types with limited labeled data. We propose a unique framework combining Meta-learning and contrastive learning to better utilize graph structures in molecular and social network datasets. Additionally, we offer benchmark graph datasets with extensive node-attribute dimensions for future research. These studies collectively advance the field of graph-based machine learning by addressing critical issues of data privacy and utility in GNN applications
- …