30 research outputs found

    On Suggesting Entities as Web Search Queries

    Get PDF
    The Web of Data is growing in popularity and dimension, and named entity exploitation is gaining importance in many research fields. In this paper, we explore the use of entities that can be extracted from a query log to enhance query recommendation. In particular, we extend a state-of-the-art recommendation algorithm to take into account the semantic information associated with submitted queries. Our novel method generates highly related and diversified suggestions that we as- sess by means of a new evaluation technique. The manually annotated dataset used for performance comparisons has been made available to the research community to favor the repeatability of experiments

    An Approach for Curating Collections of Historical Documents with the Use of Topic Detection Technologies

    Get PDF
    Digital curation of materials available in large online repositories is required to enable the reuse of Cultural Heritage resources in specific activities like education or scientific research. The digitization of such valuable objects is an important task for making them accessible through digital platforms such as Europeana, therefore ensuring the success of transcription campaigns via the Transcribathon platform is highly important for this goal. Based on impact assessment results, people are more engaged in the transcription process if the content is more oriented to specific themes, such as First World War. Currently, efforts to group related documents into thematic collections are in general hand-crafted and due to the large ingestion of new material they are difficult to maintain and update. The current solutions based on text retrieval are not able to support the discovery of related content since the existing collections are multi-lingual and contain heterogeneous items like postcards, letters, journals, photographs etc. Technological advances in natural language understanding and in data management have led to the automation of document categorization and via automatic topic detection. To use existing topic detection technologies on Europeana collections there are several challenges to be addressed: (1) ensure representative and qualitative training data, (2) ensure the quality of the learned topics, and (3) efficient and scalable solutions for searching related content based on the automatically detected topics, and for suggesting the most relevant topics on new items. This paper describes in more details each such challenge and the proposed solutions thus offering a novel perspective on how digital curation practices can be enhanced with the help of machine learning technologies

    Discovering Europeana users’ search behavior

    Get PDF
    Europeana is a strategic project funded by the European Commission with the goal of making Europe's cultural and scientific heritage accessible to the public. ASSETS is a two-year Best Practice Network co-funded by the CIP PSP Programme to improve performance, accessibility and usability of the Europeana search engine. Here we present a characterization of the Europeana logs by showing statistics on common behavioural patterns of the Europeana users

    Linking subject labels in Cultural Heritage Metadata to MIMO vocabulary using CultuurLink

    Get PDF
    The Europeana Sounds 1 project aims to increase the amount of cultural audio content in Europeana. It also strongly focuses on enriching the metadata records that are aggregated by Europeana. To provide metadata to Europeana, Data Providers are asked to convert their records from the format and model they use internally to a specific profile of the Europeana Data Model 2 (EDM) for sound resources. These metadata include subjects, which typically use a vocabulary internal to each partner. The problem is that the values in subject fields come too often as simple literals (strings) that are specific to one (or a couple of) language(s) -the one(s) of the Data Provider. For Europeana to take full advantage of subjects from these vocabularies for purposes such as cross-lingual search, it is essential that they are connected with richer, multilingual data. A first solution to this problem is to semantically enrich metadata for individual cultural objects with links to concepts from a (multilingual) vocabulary (say, 'vocM'). Such new object-vocM links can be used to later provide more semantics and labels in multiple languages for search indexes or display functions. A second option is to perform alignment at the level of vocabularies, linking the elements of an original 1 http://www.europeanasounds.eu/

    Aggregating a Knowledge Base of File Formats from Linked Open Data: Poster - iPRES 2012 - Digital Curation Institute, iSchool, Toronto

    No full text
    This paper presents an approach for semi-automatic aggregation of knowledge on computer file formats used to support planning for long term preservation. Our goal is to create a solid knowledge base from linked open data repositories which represents the fundament of the DiPRec recommender system. The ontology mapping approach is employed for collecting the information and integrating it in a common domain model. Furthermore, we employ expert rules for inferring explicit knowledge on the nature and preservation friendliness of the file formats

    On Enhancing the FFMA Knowledge Base: Paper - iPres 2013 - Lisbon

    No full text
    Ensuring the long term access to digitized content is a major concern of digital libraries. The document migration and summarization are key activities employed reach this goal. The evaluation of preservation friendliness and making recommendations for long term preservation requires deep domain knowledge which is currently not available in any integrated knowledge base. In this paper we present an approach for enhancing the automatic aggregated knowledge on computer file formats. A clustering algorithm is employed to identify related file formats and to predict missing semantic associations between file formats and software tools. This is used to improve the discovery of software tools supporting the less popular file formats

    A Risk Analysis of File Formats for Preservation Planning: Paper - iPres 2013 - Lisbon

    No full text
    This paper presents an approach for the automatic estimation of preservation risks for _le formats. The main contribution of this work is the definition of risk factors with associated severity levels and their automatic computation. Our goal is to make use of a solid knowledge base automatically aggregated from linked open data repositories as the basis for a risk analysis in the digital preservation domain. This method is meant to facilitate decision making with regard to preservation of digital content in libraries and archives. We have developed a tool for aggregating rich and trusted _le format descriptions. It exploits available linked data resources and uses expert models to infer knowledge regarding the long-term preservation of digital content. The ontology mapping technique is employed for collecting the information from the web of linked data and integrating it in a common representation. Furthermore, we employ AI techniques (i.e. expert rules, clustering) for inferring explicit knowledge on the nature and preservation-friendliness of the _le formats. A statistical analysis of the aggregated information and the qualitative analysis of the aggregated knowledge are presented in the evaluation part of the paper. A Web service is created to support programmatic access to format and risk analysis reports
    corecore