2,242 research outputs found
Multilingual adaptive search for digital libraries
This paper describes a framework for Adaptive Multilingual Information Retrieval (AMIR) which allows multilingual resource discovery and delivery using on-the-fly machine translation of documents and queries. Result documents
are presented to the user in a contextualised manner. Challenges and affordances of both Adaptive and Multilingual IR, with a particular focus on Digital Libraries, are detailed. The framework components are motivated by a series of results from experiments on query logs and documents from The European Library. We conclude that factoring adaptivity and multilinguality aspects into the search process can enhance the user’s experience with online Digital Libraries
Cross-language Information Retrieval
Two key assumptions shape the usual view of ranked retrieval: (1) that the
searcher can choose words for their query that might appear in the documents
that they wish to see, and (2) that ranking retrieved documents will suffice
because the searcher will be able to recognize those which they wished to find.
When the documents to be searched are in a language not known by the searcher,
neither assumption is true. In such cases, Cross-Language Information Retrieval
(CLIR) is needed. This chapter reviews the state of the art for CLIR and
outlines some open research questions.Comment: 49 pages, 0 figure
Ensembling Multilingual Pre-Trained Models for Predicting Multi-Label Regression Emotion Share from Speech
Speech emotion recognition has evolved from research to practical
applications. Previous studies of emotion recognition from speech have focused
on developing models on certain datasets like IEMOCAP. The lack of data in the
domain of emotion modeling emerges as a challenge to evaluate models in the
other dataset, as well as to evaluate speech emotion recognition models that
work in a multilingual setting. This paper proposes an ensemble learning to
fuse results of pre-trained models for emotion share recognition from speech.
The models were chosen to accommodate multilingual data from English and
Spanish. The results show that ensemble learning can improve the performance of
the baseline model with a single model and the previous best model from the
late fusion. The performance is measured using the Spearman rank correlation
coefficient since the task is a regression problem with ranking values. A
Spearman rank correlation coefficient of 0.537 is reported for the test set,
while for the development set, the score is 0.524. These scores are higher than
the previous study of a fusion method from monolingual data, which achieved
scores of 0.476 for the test and 0.470 for the development.Comment: 4 pages, 6 tables, accepted in APSIPA-ASC 202
Ad Hoc Table Retrieval using Semantic Similarity
We introduce and address the problem of ad hoc table retrieval: answering a
keyword query with a ranked list of tables. This task is not only interesting
on its own account, but is also being used as a core component in many other
table-based information access scenarios, such as table completion or table
mining. The main novel contribution of this work is a method for performing
semantic matching between queries and tables. Specifically, we (i) represent
queries and tables in multiple semantic spaces (both discrete sparse and
continuous dense vector representations) and (ii) introduce various similarity
measures for matching those semantic representations. We consider all possible
combinations of semantic representations and similarity measures and use these
as features in a supervised learning model. Using a purpose-built test
collection based on Wikipedia tables, we demonstrate significant and
substantial improvements over a state-of-the-art baseline.Comment: The web conference 2018 (WWW'18
Boosting Cross-Language Retrieval by Learning Bilingual Phrase Associations from Relevance Rankings
We present an approach to learning bilingual n-gram correspondences from relevance rankings of English documents for Japanese queries. We show that directly optimizing cross-lingual rankings rivals and complements machine translation-based cross-language information retrieval (CLIR). We propose an efficient boosting algorithm that deals with very large cross-product spaces of word correspondences. We show in an experimental evaluation on patent prior art search that our approach, and in particular a consensus-based combination of boosting and translation-based approaches, yields substantial improvements in CLIR performance. Our training and test data are made publicly available.
Dense Text Retrieval based on Pretrained Language Models: A Survey
Text retrieval is a long-standing research topic on information seeking,
where a system is required to return relevant information resources to user's
queries in natural language. From classic retrieval methods to learning-based
ranking functions, the underlying retrieval models have been continually
evolved with the ever-lasting technical innovation. To design effective
retrieval models, a key point lies in how to learn the text representation and
model the relevance matching. The recent success of pretrained language models
(PLMs) sheds light on developing more capable text retrieval approaches by
leveraging the excellent modeling capacity of PLMs. With powerful PLMs, we can
effectively learn the representations of queries and texts in the latent
representation space, and further construct the semantic matching function
between the dense vectors for relevance modeling. Such a retrieval approach is
referred to as dense retrieval, since it employs dense vectors (a.k.a.,
embeddings) to represent the texts. Considering the rapid progress on dense
retrieval, in this survey, we systematically review the recent advances on
PLM-based dense retrieval. Different from previous surveys on dense retrieval,
we take a new perspective to organize the related work by four major aspects,
including architecture, training, indexing and integration, and summarize the
mainstream techniques for each aspect. We thoroughly survey the literature, and
include 300+ related reference papers on dense retrieval. To support our
survey, we create a website for providing useful resources, and release a code
repertory and toolkit for implementing dense retrieval models. This survey aims
to provide a comprehensive, practical reference focused on the major progress
for dense text retrieval
Document-Level Language Models for Machine Translation
Despite the known limitations, most machine translation systems today still
operate on the sentence-level. One reason for this is, that most parallel
training data is only sentence-level aligned, without document-level meta
information available. In this work, we set out to build context-aware
translation systems utilizing document-level monolingual data instead. This can
be achieved by combining any existing sentence-level translation model with a
document-level language model. We improve existing approaches by leveraging
recent advancements in model combination. Additionally, we propose novel
weighting techniques that make the system combination more flexible and
significantly reduce computational overhead. In a comprehensive evaluation on
four diverse translation tasks, we show that our extensions improve
document-targeted scores substantially and are also computationally more
efficient. However, we also find that in most scenarios, back-translation gives
even better results, at the cost of having to re-train the translation system.
Finally, we explore language model fusion in the light of recent advancements
in large language models. Our findings suggest that there might be strong
potential in utilizing large language models via model combination.Comment: accepted at WMT 202
- …