274 research outputs found

    Budgeted Embedding Table For Recommender Systems

    Full text link
    At the heart of contemporary recommender systems (RSs) are latent factor models that provide quality recommendation experience to users. These models use embedding vectors, which are typically of a uniform and fixed size, to represent users and items. As the number of users and items continues to grow, this design becomes inefficient and hard to scale. Recent lightweight embedding methods have enabled different users and items to have diverse embedding sizes, but are commonly subject to two major drawbacks. Firstly, they limit the embedding size search to optimizing a heuristic balancing the recommendation quality and the memory complexity, where the trade-off coefficient needs to be manually tuned for every memory budget requested. The implicitly enforced memory complexity term can even fail to cap the parameter usage, making the resultant embedding table fail to meet the memory budget strictly. Secondly, most solutions, especially reinforcement learning based ones derive and optimize the embedding size for each each user/item on an instance-by-instance basis, which impedes the search efficiency. In this paper, we propose Budgeted Embedding Table (BET), a novel method that generates table-level actions (i.e., embedding sizes for all users and items) that is guaranteed to meet pre-specified memory budgets. Furthermore, by leveraging a set-based action formulation and engaging set representation learning, we present an innovative action search strategy powered by an action fitness predictor that efficiently evaluates each table-level action. Experiments have shown state-of-the-art performance on two real-world datasets when BET is paired with three popular recommender models under different memory budgets.Comment: Accepted by WSDM 202

    Decentralized Collaborative Learning Framework for Next POI Recommendation

    Full text link
    Next Point-of-Interest (POI) recommendation has become an indispensable functionality in Location-based Social Networks (LBSNs) due to its effectiveness in helping people decide the next POI to visit. However, accurate recommendation requires a vast amount of historical check-in data, thus threatening user privacy as the location-sensitive data needs to be handled by cloud servers. Although there have been several on-device frameworks for privacy-preserving POI recommendations, they are still resource-intensive when it comes to storage and computation, and show limited robustness to the high sparsity of user-POI interactions. On this basis, we propose a novel decentralized collaborative learning framework for POI recommendation (DCLR), which allows users to train their personalized models locally in a collaborative manner. DCLR significantly reduces the local models' dependence on the cloud for training, and can be used to expand arbitrary centralized recommendation models. To counteract the sparsity of on-device user data when learning each local model, we design two self-supervision signals to pretrain the POI representations on the server with geographical and categorical correlations of POIs. To facilitate collaborative learning, we innovatively propose to incorporate knowledge from either geographically or semantically similar users into each local model with attentive aggregation and mutual information maximization. The collaborative learning process makes use of communications between devices while requiring only minor engagement from the central server for identifying user groups, and is compatible with common privacy preservation mechanisms like differential privacy. We evaluate DCLR with two real-world datasets, where the results show that DCLR outperforms state-of-the-art on-device frameworks and yields competitive results compared with centralized counterparts.Comment: 21 Pages, 3 figures, 4 table

    Manipulating Federated Recommender Systems: Poisoning with Synthetic Users and Its Countermeasures

    Full text link
    Federated Recommender Systems (FedRecs) are considered privacy-preserving techniques to collaboratively learn a recommendation model without sharing user data. Since all participants can directly influence the systems by uploading gradients, FedRecs are vulnerable to poisoning attacks of malicious clients. However, most existing poisoning attacks on FedRecs are either based on some prior knowledge or with less effectiveness. To reveal the real vulnerability of FedRecs, in this paper, we present a new poisoning attack method to manipulate target items' ranks and exposure rates effectively in the top-KK recommendation without relying on any prior knowledge. Specifically, our attack manipulates target items' exposure rate by a group of synthetic malicious users who upload poisoned gradients considering target items' alternative products. We conduct extensive experiments with two widely used FedRecs (Fed-NCF and Fed-LightGCN) on two real-world recommendation datasets. The experimental results show that our attack can significantly improve the exposure rate of unpopular target items with extremely fewer malicious users and fewer global epochs than state-of-the-art attacks. In addition to disclosing the security hole, we design a novel countermeasure for poisoning attacks on FedRecs. Specifically, we propose a hierarchical gradient clipping with sparsified updating to defend against existing poisoning attacks. The empirical results demonstrate that the proposed defending mechanism improves the robustness of FedRecs.Comment: This paper has been accepted by SIGIR202

    Learning Compact Compositional Embeddings via Regularized Pruning for Recommendation

    Full text link
    Latent factor models are the dominant backbones of contemporary recommender systems (RSs) given their performance advantages, where a unique vector embedding with a fixed dimensionality (e.g., 128) is required to represent each entity (commonly a user/item). Due to the large number of users and items on e-commerce sites, the embedding table is arguably the least memory-efficient component of RSs. For any lightweight recommender that aims to efficiently scale with the growing size of users/items or to remain applicable in resource-constrained settings, existing solutions either reduce the number of embeddings needed via hashing, or sparsify the full embedding table to switch off selected embedding dimensions. However, as hash collision arises or embeddings become overly sparse, especially when adapting to a tighter memory budget, those lightweight recommenders inevitably have to compromise their accuracy. To this end, we propose a novel compact embedding framework for RSs, namely Compositional Embedding with Regularized Pruning (CERP). Specifically, CERP represents each entity by combining a pair of embeddings from two independent, substantially smaller meta-embedding tables, which are then jointly pruned via a learnable element-wise threshold. In addition, we innovatively design a regularized pruning mechanism in CERP, such that the two sparsified meta-embedding tables are encouraged to encode information that is mutually complementary. Given the compatibility with agnostic latent factor models, we pair CERP with two popular recommendation models for extensive experiments, where results on two real-world datasets under different memory budgets demonstrate its superiority against state-of-the-art baselines. The codebase of CERP is available in https://github.com/xurong-liang/CERP.Comment: Accepted by ICDM'2

    An Evaluation of Model-Based Approaches to Sensor Data Compression

    Get PDF
    As the volumes of sensor data being accumulated are likely to soar, data compression has become essential in a wide range of sensor-data applications. This has led to a plethora of data compression techniques for sensor data, in particular model-based approaches have been spotlighted due to their significant compression performance. These methods, however, have never been compared and analyzed under the same setting, rendering a ‘right’ choice of compression technique for a particular application very difficult. Addressing this problem, this paper presents a benchmark that offers a comprehensive empirical study on the performance comparison of the model-based compression techniques. Specifically, we re-implemented several state-of-the-art methods in a comparablemanner, andmeasured various performance factors with our benchmark, including compression ratio, computation time, model maintenance cost, approximation quality, and robustness to noisy data. We then provide in-depth analysis of the benchmark results, obtained by using 11 different real datasets consisting of 346 heterogeneous sensor data signals. We believe that the findings from the benchmark will be able to serve as a practical guideline for applications that need to compress sensor data

    Result Selection and Summarization for Web Table Search

    Get PDF
    The amount of information available on the Web has been growing dramatically, raising the importance of techniques for searching the Web. Recently, Web Tables emerged as a model, which enables users to search for information in a structured way. However, effective presentation of results for Web Table search requires (1) selecting a ranking of tables that acknowledges the diversity within the search result; and (2) summarizing the information content of the selected tables concisely but meaningful. In this paper, we formalize these requirements as the \emph{diversified table selection} problem and the \emph{structured table summarization} problem. We show that both problems are computationally intractable and, thus, present heuristic algorithms to solve them. For these algorithms, we prove salient performance guarantees, such as near-optimality, stability, and fairness. Our experiments with real-world collections of thousands of Web Tables highlight the scalability of our techniques. We achieve improvements up to 50\% in diversity and 10\% in relevance over baselines for Web Table selection, and reduce the information loss induced by table summarization by up to 50\%. In a user study, we observed that our techniques are preferred over alternative solutions

    Robust and Hierarchical Stop Discovery in Sparse and Diverse Trajectories

    Get PDF
    The advance of GPS tracking technique brings a large amount of trajectory data. To better understand such mobility data, semantic models like “stop/move” (or inferring “activity”, “transportation mode”) recently become a hot topic for trajectory data analysis. Stops are important parts of tra- jectories, such as “working at office”, “shopping in a mall”, “waiting for the bus”. There are several methods such as velocity, clustering, density algorithms being designed to discover stops. However, existing works focus on well-defined trajectories like movement of vehicle and taxi, not working well for heterogeneous cases like diverse and sparse trajectories. On the contrary, our paper addresses three main challenges: (1) provide a robust clustering-based method to discover stops; (2) discover both shared stops and personalized stops, where shared stops are the common places where many trajectories pass and stay for a while (e.g. shopping mall), whilst personalized stops are individual places where user stays for his/her own purpose (e.g. home, office); (3) further build stop hierarchy (e.g. a big stop like EPFL campus and a small stop like an office building). We evaluate our approach with several diverse and spare real-life GPS data, compare it with other methods, and show its better data abstraction on trajectory

    Chemical components and biological properties from acetone extracts of Conamomum vietnamense

    Get PDF
    Conamomum vietnamense is an endemic and rare species from Vietnam. The aim of this study is to determine the chemical compositions, antibacterial and antioxidant properties of the acetone extracts obtained from the different organs of this species for the first time. A total of 82 components were identified from the acetone extracts of leaf, flower, and rhizome of C. vietnamense using Gas chromatography–mass spectrometry (GC/MS) technique. Furthermore, the agar disk-diffusion method was also used to determine the antibacterial activity of the C. vietnamense extracts. Accordingly, the leaf extract was found to be effective against eight out of nine bacterial strains while the flower and rhizome extracts displayed activity against four out of nine tested bacteria. In addition, the three organs of C. vietnamense also possessed the high DPPH scavenging properties. The results of this study indicate that C. vietnamense extracts have the potential to be developed into pharmaceutical products in the future
    • 

    corecore