3,335 research outputs found
Signed Distance-based Deep Memory Recommender
Personalized recommendation algorithms learn a user's preference for an item
by measuring a distance/similarity between them. However, some of the existing
recommendation models (e.g., matrix factorization) assume a linear relationship
between the user and item. This approach limits the capacity of recommender
systems, since the interactions between users and items in real-world
applications are much more complex than the linear relationship. To overcome
this limitation, in this paper, we design and propose a deep learning framework
called Signed Distance-based Deep Memory Recommender, which captures non-linear
relationships between users and items explicitly and implicitly, and work well
in both general recommendation task and shopping basket-based recommendation
task. Through an extensive empirical study on six real-world datasets in the
two recommendation tasks, our proposed approach achieved significant
improvement over ten state-of-the-art recommendation models
General highlight detection in sport videos
Attention is a psychological measurement of human reflection against stimulus. We propose a general framework of highlight detection by comparing attention intensity during the watching of sports videos. Three steps are involved: adaptive selection on salient features, unified attention estimation and highlight identification. Adaptive selection computes feature correlation to decide an optimal set of salient features. Unified estimation combines these features by the technique of multi-resolution autoregressive (MAR) and thus creates a temporal curve of attention intensity. We rank the intensity of attention to discriminate boundaries of highlights. Such a framework alleviates semantic uncertainty around sport highlights and leads to an efficient and effective highlight detection. The advantages are as follows: (1) the capability of using data at coarse temporal resolutions; (2) the robustness against noise caused by modality asynchronism, perception uncertainty and feature mismatch; (3) the employment of Markovian constrains on content presentation, and (4) multi-resolution estimation on attention intensity, which enables the precise allocation of event boundaries
Recent Advances of Local Mechanisms in Computer Vision: A Survey and Outlook of Recent Work
Inspired by the fact that human brains can emphasize discriminative parts of
the input and suppress irrelevant ones, substantial local mechanisms have been
designed to boost the development of computer vision. They can not only focus
on target parts to learn discriminative local representations, but also process
information selectively to improve the efficiency. In terms of application
scenarios and paradigms, local mechanisms have different characteristics. In
this survey, we provide a systematic review of local mechanisms for various
computer vision tasks and approaches, including fine-grained visual
recognition, person re-identification, few-/zero-shot learning, multi-modal
learning, self-supervised learning, Vision Transformers, and so on.
Categorization of local mechanisms in each field is summarized. Then,
advantages and disadvantages for every category are analyzed deeply, leaving
room for exploration. Finally, future research directions about local
mechanisms have also been discussed that may benefit future works. To the best
our knowledge, this is the first survey about local mechanisms on computer
vision. We hope that this survey can shed light on future research in the
computer vision field
ID Embedding as Subtle Features of Content and Structure for Multimodal Recommendation
Multimodal recommendation aims to model user and item representations
comprehensively with the involvement of multimedia content for effective
recommendations. Existing research has shown that it is beneficial for
recommendation performance to combine (user- and item-) ID embeddings with
multimodal salient features, indicating the value of IDs. However, there is a
lack of a thorough analysis of the ID embeddings in terms of feature semantics
in the literature. In this paper, we revisit the value of ID embeddings for
multimodal recommendation and conduct a thorough study regarding its semantics,
which we recognize as subtle features of content and structures. Then, we
propose a novel recommendation model by incorporating ID embeddings to enhance
the semantic features of both content and structures. Specifically, we put
forward a hierarchical attention mechanism to incorporate ID embeddings in
modality fusing, coupled with contrastive learning, to enhance content
representations. Meanwhile, we propose a lightweight graph convolutional
network for each modality to amalgamate neighborhood and ID embeddings for
improving structural representations. Finally, the content and structure
representations are combined to form the ultimate item embedding for
recommendation. Extensive experiments on three real-world datasets (Baby,
Sports, and Clothing) demonstrate the superiority of our method over
state-of-the-art multimodal recommendation methods and the effectiveness of
fine-grained ID embeddings
- …