739 research outputs found
RankMamba: Benchmarking Mamba's Document Ranking Performance in the Era of Transformers
Transformer structure has achieved great success in multiple applied machine
learning communities, such as natural language processing (NLP), computer
vision (CV) and information retrieval (IR). Transformer architecture's core
mechanism -- attention requires time complexity in training and
time complexity in inference. Many works have been proposed to improve the
attention mechanism's scalability, such as Flash Attention and Multi-query
Attention. A different line of work aims to design new mechanisms to replace
attention. Recently, a notable model structure -- Mamba, which is based on
state space models, has achieved transformer-equivalent performance in multiple
sequence modeling tasks.
In this work, we examine \mamba's efficacy through the lens of a classical IR
task -- document ranking. A reranker model takes a query and a document as
input, and predicts a scalar relevance score. This task demands the language
model's ability to comprehend lengthy contextual inputs and to capture the
interaction between query and document tokens. We find that (1) Mamba models
achieve competitive performance compared to transformer-based models with the
same training recipe; (2) but also have a lower training throughput in
comparison to efficient transformer implementations such as flash attention. We
hope this study can serve as a starting point to explore Mamba models in other
classical IR tasks. Our code implementation and trained checkpoints are made
public to facilitate reproducibility
(https://github.com/zhichaoxu-shufe/RankMamba)
Context-aware Decoding Reduces Hallucination in Query-focused Summarization
Query-focused summarization (QFS) aims to provide a summary of a single
document/multi documents that can satisfy the information needs of a given
query. It is useful for various real-world applications, such as abstractive
snippet generation or more recent retrieval augmented generation (RAG). A
prototypical QFS pipeline consists of a retriever (sparse or dense retrieval)
and a generator (usually a large language model). However, applying large
language models (LLM) potentially leads to hallucinations, especially when the
evidence contradicts the prior belief of LLMs. There has been growing interest
in developing new decoding methods to improve generation quality and reduce
hallucination. In this work, we conduct a large-scale reproducibility study on
one recently proposed decoding method -- Context-aware Decoding (CAD). In
addition to replicating CAD's experiments on news summarization datasets, we
include experiments on QFS datasets, and conduct more rigorous analysis on
computational complexity and hyperparameter sensitivity. Experiments with eight
different language models show that performance-wise, CAD improves QFS quality
by (1) reducing factuality errors/hallucinations while (2) mostly retaining the
match of lexical patterns, measured by ROUGE scores, while also at a cost of
increased inference-time FLOPs and reduced decoding speed. The code
implementation based on Huggingface Library is made available
https://github.com/zhichaoxu-shufe/context-aware-decoding-qfsComment: technical repor
Towards Efficient Path Query on Social Network with Hybrid RDF Management
The scalability and exibility of Resource Description Framework(RDF) model
make it ideally suited for representing online social networks(OSN). One basic
operation in OSN is to find chains of relations,such as k-Hop friends. Property
path query in SPARQL can express this type of operation, but its implementation
suffers from performance problem considering the ever growing data size and
complexity of OSN.In this paper, we present a main memory/disk based hybrid RDF
data management framework for efficient property path query. In this hybrid
framework, we realize an efficient in-memory algebra operator for property path
query using graph traversal, and estimate the cost of this operator to
cooperate with existing cost-based optimization. Experiments on benchmark and
real dataset demonstrated that our approach can achieve a good tradeoff between
data load expense and online query performance
- …