274 research outputs found
Budgeted Embedding Table For Recommender Systems
At the heart of contemporary recommender systems (RSs) are latent factor
models that provide quality recommendation experience to users. These models
use embedding vectors, which are typically of a uniform and fixed size, to
represent users and items. As the number of users and items continues to grow,
this design becomes inefficient and hard to scale. Recent lightweight embedding
methods have enabled different users and items to have diverse embedding sizes,
but are commonly subject to two major drawbacks. Firstly, they limit the
embedding size search to optimizing a heuristic balancing the recommendation
quality and the memory complexity, where the trade-off coefficient needs to be
manually tuned for every memory budget requested. The implicitly enforced
memory complexity term can even fail to cap the parameter usage, making the
resultant embedding table fail to meet the memory budget strictly. Secondly,
most solutions, especially reinforcement learning based ones derive and
optimize the embedding size for each each user/item on an instance-by-instance
basis, which impedes the search efficiency. In this paper, we propose Budgeted
Embedding Table (BET), a novel method that generates table-level actions (i.e.,
embedding sizes for all users and items) that is guaranteed to meet
pre-specified memory budgets. Furthermore, by leveraging a set-based action
formulation and engaging set representation learning, we present an innovative
action search strategy powered by an action fitness predictor that efficiently
evaluates each table-level action. Experiments have shown state-of-the-art
performance on two real-world datasets when BET is paired with three popular
recommender models under different memory budgets.Comment: Accepted by WSDM 202
Decentralized Collaborative Learning Framework for Next POI Recommendation
Next Point-of-Interest (POI) recommendation has become an indispensable
functionality in Location-based Social Networks (LBSNs) due to its
effectiveness in helping people decide the next POI to visit. However, accurate
recommendation requires a vast amount of historical check-in data, thus
threatening user privacy as the location-sensitive data needs to be handled by
cloud servers. Although there have been several on-device frameworks for
privacy-preserving POI recommendations, they are still resource-intensive when
it comes to storage and computation, and show limited robustness to the high
sparsity of user-POI interactions. On this basis, we propose a novel
decentralized collaborative learning framework for POI recommendation (DCLR),
which allows users to train their personalized models locally in a
collaborative manner. DCLR significantly reduces the local models' dependence
on the cloud for training, and can be used to expand arbitrary centralized
recommendation models. To counteract the sparsity of on-device user data when
learning each local model, we design two self-supervision signals to pretrain
the POI representations on the server with geographical and categorical
correlations of POIs. To facilitate collaborative learning, we innovatively
propose to incorporate knowledge from either geographically or semantically
similar users into each local model with attentive aggregation and mutual
information maximization. The collaborative learning process makes use of
communications between devices while requiring only minor engagement from the
central server for identifying user groups, and is compatible with common
privacy preservation mechanisms like differential privacy. We evaluate DCLR
with two real-world datasets, where the results show that DCLR outperforms
state-of-the-art on-device frameworks and yields competitive results compared
with centralized counterparts.Comment: 21 Pages, 3 figures, 4 table
Manipulating Federated Recommender Systems: Poisoning with Synthetic Users and Its Countermeasures
Federated Recommender Systems (FedRecs) are considered privacy-preserving
techniques to collaboratively learn a recommendation model without sharing user
data. Since all participants can directly influence the systems by uploading
gradients, FedRecs are vulnerable to poisoning attacks of malicious clients.
However, most existing poisoning attacks on FedRecs are either based on some
prior knowledge or with less effectiveness. To reveal the real vulnerability of
FedRecs, in this paper, we present a new poisoning attack method to manipulate
target items' ranks and exposure rates effectively in the top-
recommendation without relying on any prior knowledge. Specifically, our attack
manipulates target items' exposure rate by a group of synthetic malicious users
who upload poisoned gradients considering target items' alternative products.
We conduct extensive experiments with two widely used FedRecs (Fed-NCF and
Fed-LightGCN) on two real-world recommendation datasets. The experimental
results show that our attack can significantly improve the exposure rate of
unpopular target items with extremely fewer malicious users and fewer global
epochs than state-of-the-art attacks. In addition to disclosing the security
hole, we design a novel countermeasure for poisoning attacks on FedRecs.
Specifically, we propose a hierarchical gradient clipping with sparsified
updating to defend against existing poisoning attacks. The empirical results
demonstrate that the proposed defending mechanism improves the robustness of
FedRecs.Comment: This paper has been accepted by SIGIR202
Learning Compact Compositional Embeddings via Regularized Pruning for Recommendation
Latent factor models are the dominant backbones of contemporary recommender
systems (RSs) given their performance advantages, where a unique vector
embedding with a fixed dimensionality (e.g., 128) is required to represent each
entity (commonly a user/item). Due to the large number of users and items on
e-commerce sites, the embedding table is arguably the least memory-efficient
component of RSs. For any lightweight recommender that aims to efficiently
scale with the growing size of users/items or to remain applicable in
resource-constrained settings, existing solutions either reduce the number of
embeddings needed via hashing, or sparsify the full embedding table to switch
off selected embedding dimensions. However, as hash collision arises or
embeddings become overly sparse, especially when adapting to a tighter memory
budget, those lightweight recommenders inevitably have to compromise their
accuracy. To this end, we propose a novel compact embedding framework for RSs,
namely Compositional Embedding with Regularized Pruning (CERP). Specifically,
CERP represents each entity by combining a pair of embeddings from two
independent, substantially smaller meta-embedding tables, which are then
jointly pruned via a learnable element-wise threshold. In addition, we
innovatively design a regularized pruning mechanism in CERP, such that the two
sparsified meta-embedding tables are encouraged to encode information that is
mutually complementary. Given the compatibility with agnostic latent factor
models, we pair CERP with two popular recommendation models for extensive
experiments, where results on two real-world datasets under different memory
budgets demonstrate its superiority against state-of-the-art baselines. The
codebase of CERP is available in https://github.com/xurong-liang/CERP.Comment: Accepted by ICDM'2
An Evaluation of Model-Based Approaches to Sensor Data Compression
As the volumes of sensor data being accumulated are likely to soar, data compression has become essential in a wide range of sensor-data applications. This has led to a plethora of data compression techniques for sensor data, in particular model-based approaches have been spotlighted due to their significant compression performance. These methods, however, have never been compared and analyzed under the same setting, rendering a ârightâ choice of compression technique for a particular application very difficult. Addressing this problem, this paper presents a benchmark that offers a comprehensive empirical study on the performance comparison of the model-based compression techniques. Specifically, we re-implemented several state-of-the-art methods in a comparablemanner, andmeasured various performance factors with our benchmark, including compression ratio, computation time, model maintenance cost, approximation quality, and robustness to noisy data. We then provide in-depth analysis of the benchmark results, obtained by using 11 different real datasets consisting of 346 heterogeneous sensor data signals. We believe that the findings from the benchmark will be able to serve as a practical guideline for applications that need to compress sensor data
Result Selection and Summarization for Web Table Search
The amount of information available on the Web has been growing dramatically, raising the importance of techniques for searching the Web. Recently, Web Tables emerged as a model, which enables users to search for information in a structured way. However, effective presentation of results for Web Table search requires (1) selecting a ranking of tables that acknowledges the diversity within the search result; and (2) summarizing the information content of the selected tables concisely but meaningful. In this paper, we formalize these requirements as the \emph{diversified table selection} problem and the \emph{structured table summarization} problem. We show that both problems are computationally intractable and, thus, present heuristic algorithms to solve them. For these algorithms, we prove salient performance guarantees, such as near-optimality, stability, and fairness. Our experiments with real-world collections of thousands of Web Tables highlight the scalability of our techniques. We achieve improvements up to 50\% in diversity and 10\% in relevance over baselines for Web Table selection, and reduce the information loss induced by table summarization by up to 50\%. In a user study, we observed that our techniques are preferred over alternative solutions
Robust and Hierarchical Stop Discovery in Sparse and Diverse Trajectories
The advance of GPS tracking technique brings a large amount of trajectory data. To better understand such mobility data, semantic models like âstop/moveâ (or inferring âactivityâ, âtransportation modeâ) recently become a hot topic for trajectory data analysis. Stops are important parts of tra- jectories, such as âworking at officeâ, âshopping in a mallâ, âwaiting for the busâ. There are several methods such as velocity, clustering, density algorithms being designed to discover stops. However, existing works focus on well-defined trajectories like movement of vehicle and taxi, not working well for heterogeneous cases like diverse and sparse trajectories. On the contrary, our paper addresses three main challenges: (1) provide a robust clustering-based method to discover stops; (2) discover both shared stops and personalized stops, where shared stops are the common places where many trajectories pass and stay for a while (e.g. shopping mall), whilst personalized stops are individual places where user stays for his/her own purpose (e.g. home, office); (3) further build stop hierarchy (e.g. a big stop like EPFL campus and a small stop like an office building). We evaluate our approach with several diverse and spare real-life GPS data, compare it with other methods, and show its better data abstraction on trajectory
Chemical components and biological properties from acetone extracts of Conamomum vietnamense
Conamomum vietnamense is an endemic and rare species from Vietnam. The aim of this study is to determine the chemical compositions, antibacterial and antioxidant properties of the acetone extracts obtained from the different organs of this species for the first time. A total of 82 components were identified from the acetone extracts of leaf, flower, and rhizome of C. vietnamense using Gas chromatographyâmass spectrometry (GC/MS) technique. Furthermore, the agar disk-diffusion method was also used to determine the antibacterial activity of the C. vietnamense extracts. Accordingly, the leaf extract was found to be effective against eight out of nine bacterial strains while the flower and rhizome extracts displayed activity against four out of nine tested bacteria. In addition, the three organs of C. vietnamense also possessed the high DPPH scavenging properties. The results of this study indicate that C. vietnamense extracts have the potential to be developed into pharmaceutical products in the future
- âŠ