7,149 research outputs found
Collaborative Deep Learning for Recommender Systems
Collaborative filtering (CF) is a successful approach commonly used by many
recommender systems. Conventional CF-based methods use the ratings given to
items by users as the sole source of information for learning to make
recommendation. However, the ratings are often very sparse in many
applications, causing CF-based methods to degrade significantly in their
recommendation performance. To address this sparsity problem, auxiliary
information such as item content information may be utilized. Collaborative
topic regression (CTR) is an appealing recent method taking this approach which
tightly couples the two components that learn from two different sources of
information. Nevertheless, the latent representation learned by CTR may not be
very effective when the auxiliary information is very sparse. To address this
problem, we generalize recent advances in deep learning from i.i.d. input to
non-i.i.d. (CF-based) input and propose in this paper a hierarchical Bayesian
model called collaborative deep learning (CDL), which jointly performs deep
representation learning for the content information and collaborative filtering
for the ratings (feedback) matrix. Extensive experiments on three real-world
datasets from different domains show that CDL can significantly advance the
state of the art
VST++: Efficient and Stronger Visual Saliency Transformer
While previous CNN-based models have exhibited promising results for salient
object detection (SOD), their ability to explore global long-range dependencies
is restricted. Our previous work, the Visual Saliency Transformer (VST),
addressed this constraint from a transformer-based sequence-to-sequence
perspective, to unify RGB and RGB-D SOD. In VST, we developed a multi-task
transformer decoder that concurrently predicts saliency and boundary outcomes
in a pure transformer architecture. Moreover, we introduced a novel token
upsampling method called reverse T2T for predicting a high-resolution saliency
map effortlessly within transformer-based structures. Building upon the VST
model, we further propose an efficient and stronger VST version in this work,
i.e. VST++. To mitigate the computational costs of the VST model, we propose a
Select-Integrate Attention (SIA) module, partitioning foreground into
fine-grained segments and aggregating background information into a single
coarse-grained token. To incorporate 3D depth information with low cost, we
design a novel depth position encoding method tailored for depth maps.
Furthermore, we introduce a token-supervised prediction loss to provide
straightforward guidance for the task-related tokens. We evaluate our VST++
model across various transformer-based backbones on RGB, RGB-D, and RGB-T SOD
benchmark datasets. Experimental results show that our model outperforms
existing methods while achieving a 25% reduction in computational costs without
significant performance compromise. The demonstrated strong ability for
generalization, enhanced performance, and heightened efficiency of our VST++
model highlight its potential
Transfer Learning via Contextual Invariants for One-to-Many Cross-Domain Recommendation
The rapid proliferation of new users and items on the social web has
aggravated the gray-sheep user/long-tail item challenge in recommender systems.
Historically, cross-domain co-clustering methods have successfully leveraged
shared users and items across dense and sparse domains to improve inference
quality. However, they rely on shared rating data and cannot scale to multiple
sparse target domains (i.e., the one-to-many transfer setting). This, combined
with the increasing adoption of neural recommender architectures, motivates us
to develop scalable neural layer-transfer approaches for cross-domain learning.
Our key intuition is to guide neural collaborative filtering with
domain-invariant components shared across the dense and sparse domains,
improving the user and item representations learned in the sparse domains. We
leverage contextual invariances across domains to develop these shared modules,
and demonstrate that with user-item interaction context, we can learn-to-learn
informative representation spaces even with sparse interaction data. We show
the effectiveness and scalability of our approach on two public datasets and a
massive transaction dataset from Visa, a global payments technology company
(19% Item Recall, 3x faster vs. training separate models for each domain). Our
approach is applicable to both implicit and explicit feedback settings.Comment: SIGIR 202
- …