3 research outputs found

    Use of Solr and Xapian in the Invenio document repository software

    Full text link
    Invenio is a free comprehensive web-based document repository and digital library software suite originally developed at CERN. It can serve a variety of use cases from an institutional repository or digital library to a web journal. In order to fully use full-text documents for efficient search and ranking, Solr was integrated into Invenio through a generic bridge. Solr indexes extracted full-texts and most relevant metadata. Consequently, Invenio takes advantage of Solr's efficient search and word similarity ranking capabilities. In this paper, we first give an overview of Invenio, its capabilities and features. We then present our open source Solr integration as well as scalability challenges that arose for an Invenio-based multi-million record repository: the CERN Document Server. We also compare our Solr adapter to an alternative Xapian adapter using the same generic bridge. Both integrations are distributed with the Invenio package and ready to be used by the institutions using or adopting Invenio

    Enhancing Invenio Digital Library With An External Relevance Ranking Engine

    No full text
    Invenio is a comprehensive web-based free digital library software suite originally developed at CERN. In order to improve its information retrieval and word similarity ranking capabilities, the goal of this thesis is to enhance Invenio by bridging it with modern external information retrieval systems. In the first part a comparison of various information retrieval systems such as Solr and Xapian is made. In the second part a system-independent bridge for word similarity ranking is designed and implemented. Subsequently, Solr and Xapian are integrated in Invenio via adapters to the bridge. In the third part scalability tests are performed. Finally, a future outlook is briefly discussed

    Personalized Search

    Full text link
    As the volume of electronically available information grows, relevant items become harder to find. This work presents an approach to personalizing search results in scientific publication databases. This work focuses on re-ranking search results from existing search engines like Solr or ElasticSearch. This work also includes the development of Obelix, a new recommendation system used to re-rank search results. The project was proposed and performed at CERN, using the scientific publications available on the CERN Document Server (CDS). This work experiments with re-ranking using offline and online evaluation of users and documents in CDS. The experiments conclude that the personalized search result outperform both latest first and word similarity in terms of click position in the search result for global search in CDS
    corecore