thesis

A semantic metadata enrichment software ecosystem (SMESE) : its prototypes for digital libraries, metadata enrichments and assisted literature reviews

Abstract

Contribution 1: Initial design of a semantic metadata enrichment ecosystem (SMESE) for Digital Libraries The Semantic Metadata Enrichments Software Ecosystem (SMESE V1) for Digital Libraries (DLs) proposed in this paper implements a Software Product Line Engineering (SPLE) process using a metadata-based software architecture approach. It integrates a components-based ecosystem, including metadata harvesting, text and data mining and machine learning models. SMESE V1 is based on a generic model for standardizing meta-entity metadata and a mapping ontology to support the harvesting of various types of documents and their metadata from the web, databases and linked open data. SMESE V1 supports a dynamic metadata-based configuration model using multiple thesauri. The proposed model defines rules-based crosswalks that create pathways to different sources of data and metadata. Each pathway checks the metadata source structure and performs data and metadata harvesting. SMESE V1 proposes a metadata model in six categories of metadata instead of the four currently proposed in the literature for DLs; this makes it possible to describe content by defined entity, thus increasing usability. In addition, to tackle the issue of varying degrees of depth, the proposed metadata model describes the most elementary aspects of a harvested entity. A mapping ontology model has been prototyped in SMESE V1 to identify specific text segments based on thesauri in order to enrich content metadata with topics and emotions; this mapping ontology also allows interoperability between existing metadata models. Contribution 2: Metadata enrichments ecosystem based on topics and interests The second contribution extends the original SMESE V1 proposed in Contribution 1. Contribution 2 proposes a set of topic- and interest-based content semantic enrichments. The improved prototype, SMESE V3 (see following figure), uses text analysis approaches for sentiment and emotion detection and provides machine learning models to create a semantically enriched repository, thus enabling topic- and interest-based search and discovery. SMESE V3 has been designed to find short descriptions in terms of topics, sentiments and emotions. It allows efficient processing of large collections while keeping the semantic and statistical relationships that are useful for tasks such as: 1. topic detection, 2. contents classification, 3. novelty detection, 4. text summarization, 5. similarity detection. Contribution 3: Metadata-based scientific assisted literature review The third contribution proposes an assisted literature review (ALR) prototype, STELLAR V1 (Semantic Topics Ecosystem Learning-based Literature Assisted Review), based on machine learning models and a semantic metadata ecosystem. Its purpose is to identify, rank and recommend relevant papers for a literature review (LR). This third prototype can assist researchers, in an iterative process, in finding, evaluating and annotating relevant papers harvested from different sources and input into the SMESE V3 platform, available at any time. The key elements and concepts of this prototype are: 1. text and data mining, 2. machine learning models, 3. classification models, 4. researchers annotations, 5. semantically enriched metadata. STELLAR V1 helps the researcher to build a list of relevant papers according to a selection of metadata related to the subject of the ALR. The following figure presents the model, the related machine learning models and the metadata ecosystem used to assist the researcher in the task of producing an ALR on a specific topic

    Similar works