948 research outputs found

    Ranking Archived Documents for Structured Queries on Semantic Layers

    Full text link
    Archived collections of documents (like newspaper and web archives) serve as important information sources in a variety of disciplines, including Digital Humanities, Historical Science, and Journalism. However, the absence of efficient and meaningful exploration methods still remains a major hurdle in the way of turning them into usable sources of information. A semantic layer is an RDF graph that describes metadata and semantic information about a collection of archived documents, which in turn can be queried through a semantic query language (SPARQL). This allows running advanced queries by combining metadata of the documents (like publication date) and content-based semantic information (like entities mentioned in the documents). However, the results returned by such structured queries can be numerous and moreover they all equally match the query. In this paper, we deal with this problem and formalize the task of "ranking archived documents for structured queries on semantic layers". Then, we propose two ranking models for the problem at hand which jointly consider: i) the relativeness of documents to entities, ii) the timeliness of documents, and iii) the temporal relations among the entities. The experimental results on a new evaluation dataset show the effectiveness of the proposed models and allow us to understand their limitation

    LODE: Linking Digital Humanities Content to the Web of Data

    Full text link
    Numerous digital humanities projects maintain their data collections in the form of text, images, and metadata. While data may be stored in many formats, from plain text to XML to relational databases, the use of the resource description framework (RDF) as a standardized representation has gained considerable traction during the last five years. Almost every digital humanities meeting has at least one session concerned with the topic of digital humanities, RDF, and linked data. While most existing work in linked data has focused on improving algorithms for entity matching, the aim of the LinkedHumanities project is to build digital humanities tools that work "out of the box," enabling their use by humanities scholars, computer scientists, librarians, and information scientists alike. With this paper, we report on the Linked Open Data Enhancer (LODE) framework developed as part of the LinkedHumanities project. With LODE we support non-technical users to enrich a local RDF repository with high-quality data from the Linked Open Data cloud. LODE links and enhances the local RDF repository without compromising the quality of the data. In particular, LODE supports the user in the enhancement and linking process by providing intuitive user-interfaces and by suggesting high-quality linking candidates using tailored matching algorithms. We hope that the LODE framework will be useful to digital humanities scholars complementing other digital humanities tools

    Linked Data Supported Information Retrieval

    Get PDF
    Um Inhalte im World Wide Web ausfindig zu machen, sind Suchmaschienen nicht mehr wegzudenken. Semantic Web und Linked Data Technologien ermöglichen ein detaillierteres und eindeutiges Strukturieren der Inhalte und erlauben vollkommen neue Herangehensweisen an die Lösung von Information Retrieval Problemen. Diese Arbeit befasst sich mit den Möglichkeiten, wie Information Retrieval Anwendungen von der Einbeziehung von Linked Data profitieren können. Neue Methoden der computer-gestützten semantischen Textanalyse, semantischen Suche, Informationspriorisierung und -visualisierung werden vorgestellt und umfassend evaluiert. Dabei werden Linked Data Ressourcen und ihre Beziehungen in die Verfahren integriert, um eine Steigerung der Effektivität der Verfahren bzw. ihrer Benutzerfreundlichkeit zu erzielen. Zunächst wird eine Einführung in die Grundlagen des Information Retrieval und Linked Data gegeben. Anschließend werden neue manuelle und automatisierte Verfahren zum semantischen Annotieren von Dokumenten durch deren Verknüpfung mit Linked Data Ressourcen vorgestellt (Entity Linking). Eine umfassende Evaluation der Verfahren wird durchgeführt und das zu Grunde liegende Evaluationssystem umfangreich verbessert. Aufbauend auf den Annotationsverfahren werden zwei neue Retrievalmodelle zur semantischen Suche vorgestellt und evaluiert. Die Verfahren basieren auf dem generalisierten Vektorraummodell und beziehen die semantische Ähnlichkeit anhand von taxonomie-basierten Beziehungen der Linked Data Ressourcen in Dokumenten und Suchanfragen in die Berechnung der Suchergebnisrangfolge ein. Mit dem Ziel die Berechnung von semantischer Ähnlichkeit weiter zu verfeinern, wird ein Verfahren zur Priorisierung von Linked Data Ressourcen vorgestellt und evaluiert. Darauf aufbauend werden Visualisierungstechniken aufgezeigt mit dem Ziel, die Explorierbarkeit und Navigierbarkeit innerhalb eines semantisch annotierten Dokumentenkorpus zu verbessern. Hierfür werden zwei Anwendungen präsentiert. Zum einen eine Linked Data basierte explorative Erweiterung als Ergänzung zu einer traditionellen schlüsselwort-basierten Suchmaschine, zum anderen ein Linked Data basiertes Empfehlungssystem

    Recommender Systems based on Linked Data

    Get PDF
    Backgrounds: The increase in the amount of structured data published using the principles of Linked Data, means that now it is more likely to find resources in the Web of Data that describe real life concepts. However, discovering resources related to any given resource is still an open research area. This thesis studies Recommender Systems (RS) that use Linked Data as a source for generating recommendations exploiting the large amount of available resources and the relationships among them. Aims: The main objective of this study was to propose a recommendation tech- nique for resources considering semantic relationships between concepts from Linked Data. The specific objectives were: (i) Define semantic relationships derived from resources taking into account the knowledge found in Linked Data datasets. (ii) Determine semantic similarity measures based on the semantic relationships derived from resources. (iii) Propose an algorithm to dynami- cally generate automatic rankings of resources according to defined similarity measures. Methodology: It was based on the recommendations of the Project management Institute and the Integral Model for Engineering Professionals (Universidad del Cauca). The first one for managing the project, and the second one for developing the experimental prototype. Accordingly, the main phases were: (i) Conceptual base generation for identifying the main problems, objectives and the project scope. A Systematic Literature Review was conducted for this phase, which highlighted the relationships and similarity measures among resources in Linked Data, and the main issues, features, and types of RS based on Linked Data. (ii) Solution development is about designing and developing the experimental prototype for testing the algorithms studied in this thesis. Results: The main results obtained were: (i) The first Systematic Literature Re- view on RS based on Linked Data. (ii) A framework to execute and an- alyze recommendation algorithms based on Linked Data. (iii) A dynamic algorithm for resource recommendation based on on the knowledge of Linked Data relationships. (iv) A comparative study of algorithms for RS based on Linked Data. (v) Two implementations of the proposed framework. One with graph-based algorithms and other with machine learning algorithms. (vi) The application of the framework to various scenarios to demonstrate its feasibility within the context of real applications. Conclusions: (i) The proposed framework demonstrated to be useful for develop- ing and evaluating different configurations of algorithms to create novel RS based on Linked Data suitable to users’ requirements, applications, domains and contexts. (ii) The layered architecture of the proposed framework is also useful towards the reproducibility of the results for the research community. (iii) Linked data based RS are useful to present explanations of the recommen- dations, because of the graph structure of the datasets. (iv) Graph-based algo- rithms take advantage of intrinsic relationships among resources from Linked Data. Nevertheless, their execution time is still an open issue. Machine Learn- ing algorithms are also suitable, they provide functions useful to deal with large amounts of data, so they can help to improve the performance (execution time) of the RS. However most of them need a training phase that require to know a priory the application domain in order to obtain reliable results. (v) A log- ical evolution of RS based on Linked Data is the combination of graph-based with machine learning algorithms to obtain accurate results while keeping low execution times. However, research and experimentation is still needed to ex- plore more techniques from the vast amount of machine learning algorithms to determine the most suitable ones to deal with Linked Data

    CONTEXT-BASED AUTOSUGGEST ON GRAPH DATA

    Get PDF
    Autosuggest is an important feature in any search applications. Currently, most applications only suggest a single term based on how frequent that term appears in the indexed documents or how often it is searched upon. These approaches might not provide the most relevant suggestions because users often enter a series of related query terms to answer a question they have in mind. In this project, we implemented the Smart Solr Suggester plugin using a context-based approach that takes into account the relationships among search keywords. In particular, we used the keywords that the user has chosen so far in the search text box as the context to autosuggest their next incomplete keyword. This context-based approach uses the relationships between entities in the graph data that the user is searching on and therefore would provide more meaningful suggestions
    • …
    corecore