447 research outputs found

    How to Retrain Recommender System? A Sequential Meta-Learning Method

    Full text link
    Practical recommender systems need be periodically retrained to refresh the model with new interaction data. To pursue high model fidelity, it is usually desirable to retrain the model on both historical and new data, since it can account for both long-term and short-term user preference. However, a full model retraining could be very time-consuming and memory-costly, especially when the scale of historical data is large. In this work, we study the model retraining mechanism for recommender systems, a topic of high practical values but has been relatively little explored in the research community. Our first belief is that retraining the model on historical data is unnecessary, since the model has been trained on it before. Nevertheless, normal training on new data only may easily cause overfitting and forgetting issues, since the new data is of a smaller scale and contains fewer information on long-term user preference. To address this dilemma, we propose a new training method, aiming to abandon the historical data during retraining through learning to transfer the past training experience. Specifically, we design a neural network-based transfer component, which transforms the old model to a new model that is tailored for future recommendations. To learn the transfer component well, we optimize the "future performance" -- i.e., the recommendation accuracy evaluated in the next time period. Our Sequential Meta-Learning(SML) method offers a general training paradigm that is applicable to any differentiable model. We demonstrate SML on matrix factorization and conduct experiments on two real-world datasets. Empirical results show that SML not only achieves significant speed-up, but also outperforms the full model retraining in recommendation accuracy, validating the effectiveness of our proposals. We release our codes at: https://github.com/zyang1580/SML.Comment: Appear in SIGIR 202

    Objective Optimization for Multilevel Neutral-Point-Clamped Converters with Zero-Sequence Signal Control

    Get PDF

    Bilinear Graph Neural Network with Neighbor Interactions

    Full text link
    Graph Neural Network (GNN) is a powerful model to learn representations and make predictions on graph data. Existing efforts on GNN have largely defined the graph convolution as a weighted sum of the features of the connected nodes to form the representation of the target node. Nevertheless, the operation of weighted sum assumes the neighbor nodes are independent of each other, and ignores the possible interactions between them. When such interactions exist, such as the co-occurrence of two neighbor nodes is a strong signal of the target node's characteristics, existing GNN models may fail to capture the signal. In this work, we argue the importance of modeling the interactions between neighbor nodes in GNN. We propose a new graph convolution operator, which augments the weighted sum with pairwise interactions of the representations of neighbor nodes. We term this framework as Bilinear Graph Neural Network (BGNN), which improves GNN representation ability with bilinear interactions between neighbor nodes. In particular, we specify two BGNN models named BGCN and BGAT, based on the well-known GCN and GAT, respectively. Empirical results on three public benchmarks of semi-supervised node classification verify the effectiveness of BGNN -- BGCN (BGAT) outperforms GCN (GAT) by 1.6% (1.5%) in classification accuracy.Codes are available at: https://github.com/zhuhm1996/bgnn.Comment: Accepted by IJCAI 2020. SOLE copyright holder is IJCAI (International Joint Conferences on Artificial Intelligence), all rights reserve

    Not All Image Regions Matter: Masked Vector Quantization for Autoregressive Image Generation

    Full text link
    Existing autoregressive models follow the two-stage generation paradigm that first learns a codebook in the latent space for image reconstruction and then completes the image generation autoregressively based on the learned codebook. However, existing codebook learning simply models all local region information of images without distinguishing their different perceptual importance, which brings redundancy in the learned codebook that not only limits the next stage's autoregressive model's ability to model important structure but also results in high training cost and slow generation speed. In this study, we borrow the idea of importance perception from classical image coding theory and propose a novel two-stage framework, which consists of Masked Quantization VAE (MQ-VAE) and Stackformer, to relieve the model from modeling redundancy. Specifically, MQ-VAE incorporates an adaptive mask module for masking redundant region features before quantization and an adaptive de-mask module for recovering the original grid image feature map to faithfully reconstruct the original images after quantization. Then, Stackformer learns to predict the combination of the next code and its position in the feature map. Comprehensive experiments on various image generation validate our effectiveness and efficiency. Code will be released at https://github.com/CrossmodalGroup/MaskedVectorQuantization.Comment: accepted by CVPR 202

    Kalman-filter-based state estimation for system information exchange in a multi-bus islanded microgrid

    Get PDF

    MCDAN: a Multi-scale Context-enhanced Dynamic Attention Network for Diffusion Prediction

    Full text link
    Information diffusion prediction aims at predicting the target users in the information diffusion path on social networks. Prior works mainly focus on the observed structure or sequence of cascades, trying to predict to whom this cascade will be infected passively. In this study, we argue that user intent understanding is also a key part of information diffusion prediction. We thereby propose a novel Multi-scale Context-enhanced Dynamic Attention Network (MCDAN) to predict which user will most likely join the observed current cascades. Specifically, to consider the global interactive relationship among users, we take full advantage of user friendships and global cascading relationships, which are extracted from the social network and historical cascades, respectively. To refine the model's ability to understand the user's preference for the current cascade, we propose a multi-scale sequential hypergraph attention module to capture the dynamic preference of users at different time scales. Moreover, we design a contextual attention enhancement module to strengthen the interaction of user representations within the current cascade. Finally, to engage the user's own susceptibility, we construct a susceptibility label for each user based on user susceptibility analysis and use the rank of this label for auxiliary prediction. We conduct experiments over four widely used datasets and show that MCDAN significantly overperforms the state-of-the-art models. The average improvements are up to 10.61% in terms of Hits@100 and 9.71% in terms of MAP@100, respectively

    Symmetrical Linguistic Feature Distillation with CLIP for Scene Text Recognition

    Full text link
    In this paper, we explore the potential of the Contrastive Language-Image Pretraining (CLIP) model in scene text recognition (STR), and establish a novel Symmetrical Linguistic Feature Distillation framework (named CLIP-OCR) to leverage both visual and linguistic knowledge in CLIP. Different from previous CLIP-based methods mainly considering feature generalization on visual encoding, we propose a symmetrical distillation strategy (SDS) that further captures the linguistic knowledge in the CLIP text encoder. By cascading the CLIP image encoder with the reversed CLIP text encoder, a symmetrical structure is built with an image-to-text feature flow that covers not only visual but also linguistic information for distillation.Benefiting from the natural alignment in CLIP, such guidance flow provides a progressive optimization objective from vision to language, which can supervise the STR feature forwarding process layer-by-layer.Besides, a new Linguistic Consistency Loss (LCL) is proposed to enhance the linguistic capability by considering second-order statistics during the optimization. Overall, CLIP-OCR is the first to design a smooth transition between image and text for the STR task.Extensive experiments demonstrate the effectiveness of CLIP-OCR with 93.8% average accuracy on six popular STR benchmarks.Code will be available at https://github.com/wzx99/CLIPOCR.Comment: Accepted by ACM MM 202
    corecore