Search CORE

30 research outputs found

Dense Text Retrieval based on Pretrained Language Models: A Survey

Author: Liu Jing
Ren Ruiyang
Wen Ji-Rong
Zhao Wayne Xin
Publication venue
Publication date: 27/11/2022
Field of study

Text retrieval is a long-standing research topic on information seeking, where a system is required to return relevant information resources to user's queries in natural language. From classic retrieval methods to learning-based ranking functions, the underlying retrieval models have been continually evolved with the ever-lasting technical innovation. To design effective retrieval models, a key point lies in how to learn the text representation and model the relevance matching. The recent success of pretrained language models (PLMs) sheds light on developing more capable text retrieval approaches by leveraging the excellent modeling capacity of PLMs. With powerful PLMs, we can effectively learn the representations of queries and texts in the latent representation space, and further construct the semantic matching function between the dense vectors for relevance modeling. Such a retrieval approach is referred to as dense retrieval, since it employs dense vectors (a.k.a., embeddings) to represent the texts. Considering the rapid progress on dense retrieval, in this survey, we systematically review the recent advances on PLM-based dense retrieval. Different from previous surveys on dense retrieval, we take a new perspective to organize the related work by four major aspects, including architecture, training, indexing and integration, and summarize the mainstream techniques for each aspect. We thoroughly survey the literature, and include 300+ related reference papers on dense retrieval. To support our survey, we create a website for providing useful resources, and release a code repertory and toolkit for implementing dense retrieval models. This survey aims to provide a comprehensive, practical reference focused on the major progress for dense text retrieval

arXiv.org e-Print Archive

Exquisitor:Interactive Learning for Multimedia

Author: Khan Omar Shahbaz
Publication venue: IT-Universitetet i København
Publication date: 01/01/2022
Field of study

The IT University of Copenhagen's Repository

Computer Science & Technology Series : XXI Argentine Congress of Computer Science. Selected papers

Author: Feierherd Guillermo Eugenio
Pesado Patricia Mabel
Russo Claudia Cecilia
Publication venue: Editorial de la Universidad Nacional de La Plata (EDULP)
Publication date: 08/02/2017
Field of study

CACIC’15 was the 21thCongress in the CACIC series. It was organized by the School of Technology at the UNNOBA (North-West of Buenos Aires National University) in Junín, Buenos Aires. The Congress included 13 Workshops with 131 accepted papers, 4 Conferences, 2 invited tutorials, different meetings related with Computer Science Education (Professors, PhD students, Curricula) and an International School with 6 courses. CACIC 2015 was organized following the traditional Congress format, with 13 Workshops covering a diversity of dimensions of Computer Science Research. Each topic was supervised by a committee of 3-5 chairs of different Universities. The call for papers attracted a total of 202 submissions. An average of 2.5 review reports werecollected for each paper, for a grand total of 495 review reports that involved about 191 different reviewers. A total of 131 full papers, involving 404 authors and 75 Universities, were accepted and 24 of them were selected for this book.Red de Universidades con Carreras en Informática (RedUNCI

Computer Science & Technology Series : XXI Argentine Congress of Computer Science. Selected papers

Author: Feierherd Guillermo Eugenio
Pesado Patricia Mabel
Russo Claudia Cecilia
Publication venue: Editorial de la Universidad Nacional de La Plata (EDULP)
Publication date: 01/01/2016
Field of study

Interactive Constrained {B}oolean Matrix Factorization

Author: Miettinen P.
Mukuze N.
Publication venue
Publication date: 01/01/2016
Field of study

MPG.PuRe

Retained or lost in transmission?:Analyzing and predicting stability in Dutch folk songs

Author: Janssen B.D.
Publication venue
Publication date: 01/01/2018
Field of study

International Migration, Integration and Social Cohesion online publications

Efficient Nearest Neighbor Search on Metric Time Series

Author: Schäfer Jörg Peter
Publication venue
Publication date: 01/12/2022
Field of study

While Deep-Learning approaches beat Nearest-Neighbor classifiers in an increasing number of areas, searching existing uncertain data remains an exclusive task for similarity search. Numerous specific solutions exist for different types of data and queries. This thesis aims at finding fast and general solutions for searching and indexing arbitrarily typed time series. A time series is considered a sequence of elements where the elements' order matters but not their actual time stamps. Since this thesis focuses on measuring distances between time series, the metric space is the most appropriate concept where the time series' elements come from. Hence, this thesis mainly considers metric time series as data type. Simple examples include time series in Euclidean vector spaces or graphs. For general similarity search solutions in time series, two primitive comparison semantics need to be distinguished, the first of which compares the time series' trajectories ignoring time warping. A ubiquitous example of such a distance function is the Dynamic Time Warping distance (DTW) developed in the area of speech recognition. The Dog Keeper distance (DK) is another time-warping distance that, opposed to DTW, is truly invariant under time warping and yields a metric space. After canonically extending DTW to accept multi-dimensional time series, this thesis contributes a new algorithm computing DK that outperforms DTW on time series in high-dimensional vector spaces by more than one order of magnitude. An analytical study of both distance functions reveals the reasons for the superiority of DK over DTW in high-dimensional spaces. The second comparison semantic compares time series in Euclidean vector spaces regardless of their position or orientation. This thesis proposes the Congruence distance that is the Euclidean distance minimized under all isometric transformations; thus, it is invariant under translation, rotation, and reflection of the time series and therefore disregards the position or orientation of the time series. A proof contributed in this thesis shows that there can be no efficient algorithm computing this distance function (unless P=NP). Therefore, this thesis contributes the Delta distance, a metric distance function serving as a lower bound for the Congruence distance. While the Delta distance has quadratic time complexity, the provided evaluation shows a speedup of more than two orders of magnitude against the Congruence distance. Furthermore, the Delta distance is shown to be tight on random time series, although the tightness can be arbitrarily bad in corner-case situations. Orthogonally to the previous mentioned comparison semantics, similarity search on time series consists of two different types of queries: whole sequence matching and subsequence search. Metric index structures (e. g., the M-Tree) only provide whole matching queries natively. This thesis contributes the concept of metric subset spaces and the SuperM-Tree for indexing metric subset spaces as a generic solution for subsequence search. Examples for metric subset spaces include subsequence search regarding the distance functions from the comparison semantics mentioned above. The provided evaluation shows that the SuperM-Tree outperforms a linear search by multiple orders of magnitude

Institute of Transport Research:Publications