3 research outputs found
Use of Solr and Xapian in the Invenio document repository software
Invenio is a free comprehensive web-based document repository and digital
library software suite originally developed at CERN. It can serve a variety of
use cases from an institutional repository or digital library to a web journal.
In order to fully use full-text documents for efficient search and ranking,
Solr was integrated into Invenio through a generic bridge. Solr indexes
extracted full-texts and most relevant metadata. Consequently, Invenio takes
advantage of Solr's efficient search and word similarity ranking capabilities.
In this paper, we first give an overview of Invenio, its capabilities and
features. We then present our open source Solr integration as well as
scalability challenges that arose for an Invenio-based multi-million record
repository: the CERN Document Server. We also compare our Solr adapter to an
alternative Xapian adapter using the same generic bridge. Both integrations are
distributed with the Invenio package and ready to be used by the institutions
using or adopting Invenio
Enhancing Invenio Digital Library With An External Relevance Ranking Engine
Invenio is a comprehensive web-based free digital library software suite originally developed at CERN. In order to improve its information retrieval and word similarity ranking capabilities, the goal of this thesis is to enhance Invenio by bridging it with modern external information retrieval systems. In the first part a comparison of various information retrieval systems such as Solr and Xapian is made. In the second part a system-independent bridge for word similarity ranking is designed and implemented. Subsequently, Solr and Xapian are integrated in Invenio via adapters to the bridge. In the third part scalability tests are performed. Finally, a future outlook is briefly discussed
Personalized Search
As the volume of electronically available information grows, relevant items
become harder to find. This work presents an approach to personalizing search
results in scientific publication databases. This work focuses on re-ranking
search results from existing search engines like Solr or ElasticSearch. This
work also includes the development of Obelix, a new recommendation system used
to re-rank search results. The project was proposed and performed at CERN,
using the scientific publications available on the CERN Document Server (CDS).
This work experiments with re-ranking using offline and online evaluation of
users and documents in CDS. The experiments conclude that the personalized
search result outperform both latest first and word similarity in terms of
click position in the search result for global search in CDS