    As the volume of electronically available information grows, relevant items become harder to find. This work presents an approach to personalizing search results in scientific publication databases. This work focuses on re-ranking search results from existing search engines like Solr or ElasticSearch. This work also includes the development of Obelix, a new recommendation system used to re-rank search results. The project was proposed and performed at CERN, using the scientific publications available on the CERN Document Server (CDS). This work experiments with re-ranking using offline and online evaluation of users and documents in CDS. The experiments conclude that the personalized search result outperform both latest first and word similarity in terms of click position in the search result for global search in CDS

    BlogForever D5.2: Implementation of Case Studies

    This document presents the internal and external testing results for the BlogForever case studies. The evaluation of the BlogForever implementation process is tabulated under the most relevant themes and aspects obtained within the testing processes. The case studies provide relevant feedback for the sustainability of the platform in terms of potential users’ needs and relevant information on the possible long term impact

    BlogForever: D3.1 Preservation Strategy Report

    This report describes preservation planning approaches and strategies recommended by the BlogForever project as a core component of a weblog repository design. More specifically, we start by discussing why we would want to preserve weblogs in the first place and what it is exactly that we are trying to preserve. We further present a review of past and present work and highlight why current practices in web archiving do not address the needs of weblog preservation adequately. We make three distinctive contributions in this volume: a) we propose transferable practical workflows for applying a combination of established metadata and repository standards in developing a weblog repository, b) we provide an automated approach to identifying significant properties of weblog content that uses the notion of communities and how this affects previous strategies, c) we propose a sustainability plan that draws upon community knowledge through innovative repository design

    Development of a flexible tool for the automatic comparison of bibliographic records. Application to sample collections - Développement d'un logiciel flexible pour la comparaison de notices bibliographiques et application à différentes collections

    Due to the multiplication of digital bibliographic catalogues (open repositories, library and bookseller catalogues), information specialists are facing the challenge of mass-processing huge amounts of metadata for various purposes. Among the many possible applications, determining the similarity between records is an important issue. Such a similarity can be interesting from a bibliographic point of view (i.e., do the records describe the same document, the answer to which can be useful for deduplication or for collection overlap studies) as well as from a thematic point of view (suggestion of documents to the user, as well as content management within the framework of a library policy, automatic classification of documents, and so on). In order to fulfil such various needs, we propose a flexible, open-source, multiplatform software tool supporting the implementation of multiple strategies for record comparisons. In a second step, we study the relevance and performance of several algorithms applied to a selection of collections (size, origin, document types...)

    BlogForever: D2.5 Weblog Spam Filtering Report and Associated Methodology

    This report is written as a first attempt to define the BlogForever spam detection strategy. It comprises a survey of weblog spam technology and approaches to their detection. While the report was written to help identify possible approaches to spam detection as a component within the BlogForver software, the discussion has been extended to include observations related to the historical, social and practical value of spam, and proposals of other ways of dealing with spam within the repository without necessarily removing them. It contains a general overview of spam types, ready-made anti-spam APIs available for weblogs, possible methods that have been suggested for preventing the introduction of spam into a blog, and research related to spam focusing on those that appear in the weblog context, concluding in a proposal for a spam detection workflow that might form the basis for the spam detection component of the BlogForever software

    Como mejorar la visibilidad y el posicionamiento en los motores de búsqueda de un repositorio digital mediante el uso de Schema.org.

    Màster Universitari de Gestió i Direcció de Biblioteques i Serveis d'Informacó. Facultat d'Informació i Mitjans Audiovisuals. Universitat de Barcelona. Curs: 2018-2019. Tutor: Rubén Alcaraz.La Web 3.0 ha transformado la forma de acceder y compartir el conocimiento. La Web Semántica abre un nuevo espacio de relación entre la información que se publica en la Web y la comprensión que de ellas pueden extraer las máquinas para dar una mejor respuesta al usuario en sus búsquedas. En este contexto, la optimización de motores de búsqueda (SEO) se convierte en un factor crucial como método para mejorar la visibilidad de un sitio o página web en los resultados de búsqueda de un motor de búsqueda. En este camino de mejora de la interoperabilidad semántica, Google, Yahoo y Bing presentan en 2011 Schema.org, un vocabulario creado para hacer que el contenido web sea comprensible a los rastreadores web y otras máquinas. Un vocabulario que permite describir la información que contienen las webs con una serie de propiedades que se insertan dentro del código HTML, semantizando sus contenidos y haciendo haciendo sus datos legibles e interpretables por aplicaciones informáticas. Los repositorios digitales se enfrentan al desafío de que los usuarios encuentren su contenido en un entorno tan grande como Internet. Si consideramos que el comportamiento de los investigadores respecto a cómo descubren, leen y utilizan la literatura académica ha cambiado considerablemente con el avance de la Web, el planteamiento de trabajo por parte de los repositorios digitales en pro de la visibilidad e interoperabilidad con la web, necesita afrontar mejoras semánticas para no convertirse en repositorios invisibles, porque no pueden ser recuperados de forma apropiada por los motores de búsqueda en internet. En este trabajo se realiza una introducción a los conceptos básicos y diferentes tecnologías, estándares y recomendaciones que conforman el entorno de la Web Semántica y se plantea el reto de implementar una herramienta como Schema.org en el repositorio institucional de la Universitat de València (RODERIC) como ingrediente de mejora en el entendimiento entre la web, la información que contiene y los buscadores..

    Электронные библиотеки: перспективные методы и технологии, электронные коллекции

    Электронные библиотеки – область исследований и разработок, направленных на развитие теории и практики обработки, распространения, хранения, анализа и поиска цифровых данных различной природы. Основная цель серии конференций RCDL заключается в формировании сообщества специалистов России, ведущих исследования и разработки в области электронных библиотек и близких областях. Всероссийская научная конференция 2009 г. (RCDL'2009) является одиннадцатой конференцией по данной тематике (1999 г. – Санкт-Петербург, 2000 г. – Протвино, 2001 г. – Петрозаводск, 2002 г. – Дубна, 2003 г. – Санкт-Петербург, 2004 г. – Пущино, 2005 г. – Ярославль, 2006 г. – Суздаль, 2007 г. – Переславль-Залесский, 2008 г. – Дубна). Настоящий сборник включает тексты докладов, коротких сообщений и стендовых докладов, отобранных Программным комитетом RCDL'2009 в результате проведенного рецензирования

    The Future of Information Sciences : INFuture2009 : Digital Resources and Knowledge Sharing

