948 research outputs found
Ranking Archived Documents for Structured Queries on Semantic Layers
Archived collections of documents (like newspaper and web archives) serve as
important information sources in a variety of disciplines, including Digital
Humanities, Historical Science, and Journalism. However, the absence of
efficient and meaningful exploration methods still remains a major hurdle in
the way of turning them into usable sources of information. A semantic layer is
an RDF graph that describes metadata and semantic information about a
collection of archived documents, which in turn can be queried through a
semantic query language (SPARQL). This allows running advanced queries by
combining metadata of the documents (like publication date) and content-based
semantic information (like entities mentioned in the documents). However, the
results returned by such structured queries can be numerous and moreover they
all equally match the query. In this paper, we deal with this problem and
formalize the task of "ranking archived documents for structured queries on
semantic layers". Then, we propose two ranking models for the problem at hand
which jointly consider: i) the relativeness of documents to entities, ii) the
timeliness of documents, and iii) the temporal relations among the entities.
The experimental results on a new evaluation dataset show the effectiveness of
the proposed models and allow us to understand their limitation
LODE: Linking Digital Humanities Content to the Web of Data
Numerous digital humanities projects maintain their data collections in the
form of text, images, and metadata. While data may be stored in many formats,
from plain text to XML to relational databases, the use of the resource
description framework (RDF) as a standardized representation has gained
considerable traction during the last five years. Almost every digital
humanities meeting has at least one session concerned with the topic of digital
humanities, RDF, and linked data. While most existing work in linked data has
focused on improving algorithms for entity matching, the aim of the
LinkedHumanities project is to build digital humanities tools that work "out of
the box," enabling their use by humanities scholars, computer scientists,
librarians, and information scientists alike. With this paper, we report on the
Linked Open Data Enhancer (LODE) framework developed as part of the
LinkedHumanities project. With LODE we support non-technical users to enrich a
local RDF repository with high-quality data from the Linked Open Data cloud.
LODE links and enhances the local RDF repository without compromising the
quality of the data. In particular, LODE supports the user in the enhancement
and linking process by providing intuitive user-interfaces and by suggesting
high-quality linking candidates using tailored matching algorithms. We hope
that the LODE framework will be useful to digital humanities scholars
complementing other digital humanities tools
Linked Data Supported Information Retrieval
Um Inhalte im World Wide Web ausfindig zu machen, sind Suchmaschienen nicht mehr wegzudenken. Semantic Web und Linked Data Technologien ermöglichen ein detaillierteres und eindeutiges Strukturieren der Inhalte und erlauben vollkommen neue Herangehensweisen an die Lösung von Information Retrieval Problemen. Diese Arbeit befasst sich mit den Möglichkeiten, wie Information Retrieval Anwendungen von der Einbeziehung von Linked Data profitieren können. Neue Methoden der computer-gestützten semantischen Textanalyse, semantischen Suche, Informationspriorisierung und -visualisierung werden vorgestellt und umfassend evaluiert. Dabei werden Linked Data Ressourcen und ihre Beziehungen in die Verfahren integriert, um eine Steigerung der Effektivität der Verfahren bzw. ihrer Benutzerfreundlichkeit zu erzielen. Zunächst wird eine Einführung in die Grundlagen des Information Retrieval und Linked Data gegeben. Anschließend werden neue manuelle und automatisierte Verfahren zum semantischen Annotieren von Dokumenten durch deren Verknüpfung mit Linked Data Ressourcen vorgestellt (Entity Linking). Eine umfassende Evaluation der Verfahren wird durchgeführt und das zu Grunde liegende Evaluationssystem umfangreich verbessert. Aufbauend auf den Annotationsverfahren werden zwei neue Retrievalmodelle zur semantischen Suche vorgestellt und evaluiert. Die Verfahren basieren auf dem generalisierten Vektorraummodell und beziehen die semantische Ähnlichkeit anhand von taxonomie-basierten Beziehungen der Linked Data Ressourcen in Dokumenten und Suchanfragen in die Berechnung der Suchergebnisrangfolge ein. Mit dem Ziel die Berechnung von semantischer Ähnlichkeit weiter zu verfeinern, wird ein Verfahren zur Priorisierung von Linked Data Ressourcen vorgestellt und evaluiert. Darauf aufbauend werden Visualisierungstechniken aufgezeigt mit dem Ziel, die Explorierbarkeit und Navigierbarkeit innerhalb eines semantisch annotierten Dokumentenkorpus zu verbessern. Hierfür werden zwei Anwendungen präsentiert. Zum einen eine Linked Data basierte explorative Erweiterung als Ergänzung zu einer traditionellen schlüsselwort-basierten Suchmaschine, zum anderen ein Linked Data basiertes Empfehlungssystem
Recommender Systems based on Linked Data
Backgrounds: The increase in the amount of structured data published using the principles of Linked Data, means that now it is more likely to find resources in the Web of Data that describe real life concepts. However, discovering resources related to any given resource is still an open research area. This thesis studies Recommender Systems (RS) that use Linked Data as a source for generating recommendations exploiting the large amount of available resources and the relationships among them.
Aims: The main objective of this study was to propose a recommendation tech- nique for resources considering semantic relationships between concepts from Linked Data. The specific objectives were: (i) Define semantic relationships derived from resources taking into account the knowledge found in Linked Data datasets. (ii) Determine semantic similarity measures based on the semantic relationships derived from resources. (iii) Propose an algorithm to dynami- cally generate automatic rankings of resources according to defined similarity measures.
Methodology: It was based on the recommendations of the Project management Institute and the Integral Model for Engineering Professionals (Universidad del Cauca). The first one for managing the project, and the second one for developing the experimental prototype. Accordingly, the main phases were: (i) Conceptual base generation for identifying the main problems, objectives and the project scope. A Systematic Literature Review was conducted for this phase, which highlighted the relationships and similarity measures among resources in Linked Data, and the main issues, features, and types of RS based on Linked Data. (ii) Solution development is about designing and developing the experimental prototype for testing the algorithms studied in this thesis.
Results: The main results obtained were: (i) The first Systematic Literature Re- view on RS based on Linked Data. (ii) A framework to execute and an- alyze recommendation algorithms based on Linked Data. (iii) A dynamic algorithm for resource recommendation based on on the knowledge of Linked Data relationships. (iv) A comparative study of algorithms for RS based on Linked Data. (v) Two implementations of the proposed framework. One with graph-based algorithms and other with machine learning algorithms. (vi) The application of the framework to various scenarios to demonstrate its feasibility within the context of real applications.
Conclusions: (i) The proposed framework demonstrated to be useful for develop- ing and evaluating different configurations of algorithms to create novel RS based on Linked Data suitable to users’ requirements, applications, domains and contexts. (ii) The layered architecture of the proposed framework is also useful towards the reproducibility of the results for the research community. (iii) Linked data based RS are useful to present explanations of the recommen- dations, because of the graph structure of the datasets. (iv) Graph-based algo- rithms take advantage of intrinsic relationships among resources from Linked Data. Nevertheless, their execution time is still an open issue. Machine Learn- ing algorithms are also suitable, they provide functions useful to deal with large amounts of data, so they can help to improve the performance (execution time) of the RS. However most of them need a training phase that require to know a priory the application domain in order to obtain reliable results. (v) A log- ical evolution of RS based on Linked Data is the combination of graph-based with machine learning algorithms to obtain accurate results while keeping low execution times. However, research and experimentation is still needed to ex- plore more techniques from the vast amount of machine learning algorithms to determine the most suitable ones to deal with Linked Data
CONTEXT-BASED AUTOSUGGEST ON GRAPH DATA
Autosuggest is an important feature in any search applications. Currently, most applications only suggest a single term based on how frequent that term appears in the indexed documents or how often it is searched upon. These approaches might not provide the most relevant suggestions because users often enter a series of related query terms to answer a question they have in mind. In this project, we implemented the Smart Solr Suggester plugin using a context-based approach that takes into account the relationships among search keywords. In particular, we used the keywords that the user has chosen so far in the search text box as the context to autosuggest their next incomplete keyword. This context-based approach uses the relationships between entities in the graph data that the user is searching on and therefore would provide more meaningful suggestions
- …