53 research outputs found
Personalized Search
As the volume of electronically available information grows, relevant items
become harder to find. This work presents an approach to personalizing search
results in scientific publication databases. This work focuses on re-ranking
search results from existing search engines like Solr or ElasticSearch. This
work also includes the development of Obelix, a new recommendation system used
to re-rank search results. The project was proposed and performed at CERN,
using the scientific publications available on the CERN Document Server (CDS).
This work experiments with re-ranking using offline and online evaluation of
users and documents in CDS. The experiments conclude that the personalized
search result outperform both latest first and word similarity in terms of
click position in the search result for global search in CDS
BlogForever D5.2: Implementation of Case Studies
This document presents the internal and external testing results for the BlogForever case studies. The evaluation of the BlogForever implementation process is tabulated under the most relevant themes and aspects obtained within the testing processes. The case studies provide relevant feedback for the sustainability of the platform in terms of potential users’ needs and relevant information on the possible long term impact
BlogForever: D3.1 Preservation Strategy Report
This report describes preservation planning approaches and strategies recommended by the BlogForever project as a core component of a weblog repository design. More specifically, we start by discussing why we would want to preserve weblogs in the first place and what it is exactly that we are trying to preserve. We further present a review of past and present work and highlight why current practices in web archiving do not address the needs of weblog preservation adequately. We make three distinctive contributions in this volume: a) we propose transferable practical workflows for applying a combination of established metadata and repository standards in developing a weblog repository, b) we provide an automated approach to identifying significant properties of weblog content that uses the notion of communities and how this affects previous strategies, c) we propose a sustainability plan that draws upon community knowledge through innovative repository design
Development of a flexible tool for the automatic comparison of bibliographic records. Application to sample collections - Développement d'un logiciel flexible pour la comparaison de notices bibliographiques et application à différentes collections
Due to the multiplication of digital bibliographic catalogues (open repositories, library and bookseller catalogues), information specialists are facing the challenge of mass-processing huge amounts of metadata for various purposes. Among the many possible applications, determining the similarity between records is an important issue. Such a similarity can be interesting from a bibliographic point of view (i.e., do the records describe the same document, the answer to which can be useful for deduplication or for collection overlap studies) as well as from a thematic point of view (suggestion of documents to the user, as well as content management within the framework of a library policy, automatic classification of documents, and so on). In order to fulfil such various needs, we propose a flexible, open-source, multiplatform software tool supporting the implementation of multiple strategies for record comparisons. In a second step, we study the relevance and performance of several algorithms applied to a selection of collections (size, origin, document types...)
BlogForever: D2.5 Weblog Spam Filtering Report and Associated Methodology
This report is written as a first attempt to define the BlogForever spam detection strategy. It comprises a survey of weblog spam technology and approaches to their detection. While the report was written to help identify possible approaches to spam detection as a component within the BlogForver software, the discussion has been extended to include observations related to the historical, social and practical value of spam, and proposals of other ways of dealing with spam within the repository without necessarily removing them. It contains a general overview of spam types, ready-made anti-spam APIs available for weblogs, possible methods that have been suggested for preventing the introduction of spam into a blog, and research related to spam focusing on those that appear in the weblog context, concluding in a proposal for a spam detection workflow that might form the basis for the spam detection component of the BlogForever software
Como mejorar la visibilidad y el posicionamiento en los motores de búsqueda de un repositorio digital mediante el uso de Schema.org.
Màster Universitari de Gestió i Direcció de Biblioteques i Serveis d'Informacó. Facultat d'Informació i Mitjans Audiovisuals. Universitat de Barcelona. Curs: 2018-2019. Tutor: Rubén Alcaraz.La Web 3.0 ha transformado la forma de acceder y compartir el conocimiento. La Web
Semántica abre un nuevo espacio de relación entre la información que se publica en la Web
y la comprensión que de ellas pueden extraer las máquinas para dar una mejor respuesta al
usuario en sus búsquedas. En este contexto, la optimización de motores de búsqueda
(SEO) se convierte en un factor crucial como método para mejorar la visibilidad de un sitio o
página web en los resultados de búsqueda de un motor de búsqueda.
En este camino de mejora de la interoperabilidad semántica, Google, Yahoo y Bing
presentan en 2011 Schema.org, un vocabulario creado para hacer que el contenido web
sea comprensible a los rastreadores web y otras máquinas. Un vocabulario que permite
describir la información que contienen las webs con una serie de propiedades que se
insertan dentro del código HTML, semantizando sus contenidos y haciendo haciendo sus
datos legibles e interpretables por aplicaciones informáticas.
Los repositorios digitales se enfrentan al desafío de que los usuarios encuentren su
contenido en un entorno tan grande como Internet. Si consideramos que el comportamiento
de los investigadores respecto a cómo descubren, leen y utilizan la literatura académica ha
cambiado considerablemente con el avance de la Web, el planteamiento de trabajo por
parte de los repositorios digitales en pro de la visibilidad e interoperabilidad con la web,
necesita afrontar mejoras semánticas para no convertirse en repositorios invisibles, porque
no pueden ser recuperados de forma apropiada por los motores de búsqueda en internet.
En este trabajo se realiza una introducción a los conceptos básicos y diferentes tecnologías,
estándares y recomendaciones que conforman el entorno de la Web Semántica y se plantea
el reto de implementar una herramienta como Schema.org en el repositorio institucional de
la Universitat de València (RODERIC) como ingrediente de mejora en el entendimiento
entre la web, la información que contiene y los buscadores..
Электронные библиотеки: перспективные методы и технологии, электронные коллекции
Электронные библиотеки – область исследований и разработок, направленных на развитие теории и практики обработки, распространения, хранения, анализа и поиска цифровых данных различной природы. Основная цель серии конференций RCDL заключается в формировании сообщества специалистов России, ведущих исследования и разработки в области электронных библиотек и близких областях. Всероссийская научная конференция 2009 г. (RCDL'2009) является одиннадцатой конференцией по данной тематике (1999 г. – Санкт-Петербург, 2000 г. – Протвино, 2001 г. – Петрозаводск, 2002 г. – Дубна, 2003 г. – Санкт-Петербург, 2004 г. – Пущино, 2005 г. – Ярославль, 2006 г. – Суздаль, 2007 г. – Переславль-Залесский, 2008 г. – Дубна).
Настоящий сборник включает тексты докладов, коротких сообщений и стендовых докладов, отобранных Программным комитетом RCDL'2009 в результате проведенного рецензирования
Recommended from our members
A model of scientists' information seeking and a user-interface design
Information systems that are available today do not optimally address the information-seeking behaviour of scholars, particularly those who belong to scientific communities; as a result, scholarly discovery is often cumbersome and incomplete. The hypothesis of this study is that an information-seeking system that is designed to address the nature of scholarly materials and the information seeking behaviour of scholars, particularly the members of one scientific community, will increase the effectiveness of the scholars’ searches and enable them to find and obtain relevant materials with greater ease and precision than current practices do.
The information-seeking behaviour and search practices deployed by high-energy physics (HEP) researchers are explored through a series of interviews and observations. More than 2,100 responses obtained from a HEP survey are also examined; in particular, the participants’ open-ended responses are analysed. On the basis of qualitative and quantitative research regarding the characteristics of HEP scientists and their information-seeking practices, a set of six personas, representing typical members of the HEP community, is constructed.
An original model is developed that leverages existing models of information behaviour, information seeking, and information searching and reflects the full spectrum of active information-seeking and information-searching practices of HEP scholars and the nature of the data that these researchers seek. The model is then evaluated by means of seven scenarios involving the personas constructed
earlier.
On the basis of the information-seeking model, a software user interface is designed as the future interface for the HEP INSPIRE information system. The user-interface design is corroborated through the model, and the personas are used to evaluate the design. Methods are suggested for long-term quantitative and qualitative monitoring of the ways in which this design supports HEP researchers. It is argued that the proposed user interface, which provides an information environment that accommodates the information-seeking practices of the HEP community in a friendly and efficient manner, will support HEP academic research—and research of other scholarly communities that share some of the HEP community’s characteristics—by shortening the search process and improving the findability of quality materials.
This thesis contributes to the body of information-science knowledge in the novel modelling of information-seeking behaviour of a well-defined scientific community, the use of personas for the modelling, and the concretization of the
model into a new user-interface design
- …