291 research outputs found
Parameterized Neural Network Language Models for Information Retrieval
Information Retrieval (IR) models need to deal with two difficult issues,
vocabulary mismatch and term dependencies. Vocabulary mismatch corresponds to
the difficulty of retrieving relevant documents that do not contain exact query
terms but semantically related terms. Term dependencies refers to the need of
considering the relationship between the words of the query when estimating the
relevance of a document. A multitude of solutions has been proposed to solve
each of these two problems, but no principled model solve both. In parallel, in
the last few years, language models based on neural networks have been used to
cope with complex natural language processing tasks like emotion and paraphrase
detection. Although they present good abilities to cope with both term
dependencies and vocabulary mismatch problems, thanks to the distributed
representation of words they are based upon, such models could not be used
readily in IR, where the estimation of one language model per document (or
query) is required. This is both computationally unfeasible and prone to
over-fitting. Based on a recent work that proposed to learn a generic language
model that can be modified through a set of document-specific parameters, we
explore use of new neural network models that are adapted to ad-hoc IR tasks.
Within the language model IR framework, we propose and study the use of a
generic language model as well as a document-specific language model. Both can
be used as a smoothing component, but the latter is more adapted to the
document at hand and has the potential of being used as a full document
language model. We experiment with such models and analyze their results on
TREC-1 to 8 datasets
Enhancing Information Retrieval Through Concept-Based Language Modeling and Semantic Smoothing.
Traditionally, many information retrieval models assume that terms occur in documents independently. Although these models have already shown good performance, the word independency assumption seems to be unrealistic from a natural language point of view, which considers that terms are related to each other. Therefore, such an assumption leads to two wellâknown problems in information retrieval (IR), namely, polysemy, or term mismatch, and synonymy. In language models, these issues have been addressed by considering dependencies such as bigrams, phrasalâconcepts, or word relationships, but such models are estimated using simple nâgrams or concept counting. In this paper, we address polysemy and synonymy mismatch with a conceptâbased language modeling approach that combines ontological concepts from external resources with frequently found collocations from the document collection. In addition, the conceptâbased model is enriched with subconcepts and semantic relationships through a semantic smoothing technique so as to perform semantic matching. Experiments carried out on TREC collections show that our model achieves significant improvements over a single wordâbased model and the Markov Random Field model (using a Markov classifier)
Text-Video Retrieval via Variational Multi-Modal Hypergraph Networks
Text-video retrieval is a challenging task that aims to identify relevant
videos given textual queries. Compared to conventional textual retrieval, the
main obstacle for text-video retrieval is the semantic gap between the textual
nature of queries and the visual richness of video content. Previous works
primarily focus on aligning the query and the video by finely aggregating
word-frame matching signals. Inspired by the human cognitive process of
modularly judging the relevance between text and video, the judgment needs
high-order matching signal due to the consecutive and complex nature of video
contents. In this paper, we propose chunk-level text-video matching, where the
query chunks are extracted to describe a specific retrieval unit, and the video
chunks are segmented into distinct clips from videos. We formulate the
chunk-level matching as n-ary correlations modeling between words of the query
and frames of the video and introduce a multi-modal hypergraph for n-ary
correlation modeling. By representing textual units and video frames as nodes
and using hyperedges to depict their relationships, a multi-modal hypergraph is
constructed. In this way, the query and the video can be aligned in a
high-order semantic space. In addition, to enhance the model's generalization
ability, the extracted features are fed into a variational inference component
for computation, obtaining the variational representation under the Gaussian
distribution. The incorporation of hypergraphs and variational inference allows
our model to capture complex, n-ary interactions among textual and visual
contents. Experimental results demonstrate that our proposed method achieves
state-of-the-art performance on the text-video retrieval task
Recommending on graphs: a comprehensive review from a data perspective
Recent advances in graph-based learning approaches have demonstrated their
effectiveness in modelling users' preferences and items' characteristics for
Recommender Systems (RSS). Most of the data in RSS can be organized into graphs
where various objects (e.g., users, items, and attributes) are explicitly or
implicitly connected and influence each other via various relations. Such a
graph-based organization brings benefits to exploiting potential properties in
graph learning (e.g., random walk and network embedding) techniques to enrich
the representations of the user and item nodes, which is an essential factor
for successful recommendations. In this paper, we provide a comprehensive
survey of Graph Learning-based Recommender Systems (GLRSs). Specifically, we
start from a data-driven perspective to systematically categorize various
graphs in GLRSs and analyze their characteristics. Then, we discuss the
state-of-the-art frameworks with a focus on the graph learning module and how
they address practical recommendation challenges such as scalability, fairness,
diversity, explainability and so on. Finally, we share some potential research
directions in this rapidly growing area.Comment: Accepted by UMUA
Hypergraph Transformer for Skeleton-based Action Recognition
Skeleton-based action recognition aims to predict human actions given human
joint coordinates with skeletal interconnections. To model such off-grid data
points and their co-occurrences, Transformer-based formulations would be a
natural choice. However, Transformers still lag behind state-of-the-art methods
using graph convolutional networks (GCNs). Transformers assume that the input
is permutation-invariant and homogeneous (partially alleviated by positional
encoding), which ignores an important characteristic of skeleton data, i.e.,
bone connectivity. Furthermore, each type of body joint has a clear physical
meaning in human motion, i.e., motion retains an intrinsic relationship
regardless of the joint coordinates, which is not explored in Transformers. In
fact, certain re-occurring groups of body joints are often involved in specific
actions, such as the subconscious hand movement for keeping balance. Vanilla
attention is incapable of describing such underlying relations that are
persistent and beyond pair-wise. In this work, we aim to exploit these unique
aspects of skeleton data to close the performance gap between Transformers and
GCNs. Specifically, we propose a new self-attention (SA) extension, named
Hypergraph Self-Attention (HyperSA), to incorporate inherently higher-order
relations into the model. The K-hop relative positional embeddings are also
employed to take bone connectivity into account. We name the resulting model
Hyperformer, and it achieves comparable or better performance w.r.t. accuracy
and efficiency than state-of-the-art GCN architectures on NTU RGB+D, NTU RGB+D
120, and Northwestern-UCLA datasets. On the largest NTU RGB+D 120 dataset, the
significantly improved performance reached by our Hyperformer demonstrates the
underestimated potential of Transformer models in this field
- âŠ