15 research outputs found

    Building blocks for semantic data organization on the desktop

    Get PDF
    Die Organisation von (Multimedia-) Daten auf Desktop-Systemen wird derzeit hauptsächlich durch das Einordnen von Dateien in ein hierarchisches Dateisystem bewerkstelligt. Zusätzlich werden gewisse Inhalte (z.B. Musik oder Fotos) von spezialisierter Software mit Hilfe Datei-bezogener Metadaten verwaltet. Diese Metadaten werden meist direkt im Dateikopf in einer Unzahl verschiedener, vorwiegend proprietärer Formate gespeichert. Allgemein nehmen Metadaten und Links die Schlüsselrollen in fortgeschrittenen Datenorganisationskonzepten ein, ihre eingeschränkte Unterstützung in vorherrschenden Dateisystemen macht die Einführung solcher Konzepte auf dem Desktop jedoch schwierig: Erstens müssen Anwendungen sowohl Dateiformat als auch Metadatenschema verstehen um auf Metadaten zugreifen zu können; zweitens ist ein getrennter Zugriff auf Daten und Metadaten nicht möglich und drittens kann man solche Metadaten nicht mit mehreren Dateien oder mit Dateiordnern assoziieren obgleich letztere die derzeit wichtigsten Konstrukte für die Dateiorganisation darstellen. Dies bedeutet in weiterer Folge: (i) eingeschränkte Möglichkeiten der Datenorganisation, (ii) eingeschränkte Navigationsmöglichkeiten, (iii) schlechte Auffindbarkeit der gespeicherten Daten, und (iv) Fragmentierung von Metadaten. Obschon es Versuche gab, diese Situation (zum Beispiel mit Hilfe semantischer Dateisysteme) zu verbessern, wurden die meisten dieser Probleme bisher vor allem im Web und im Speziellen im semantischen Web adressiert und gelöst. Das Anwenden dort entwickelter Lösungen auf dem Desktop, einer zentralen Plattform der Daten- und Metadatenmanipulation, wäre zweifellos von Vorteil. In der vorliegenden Arbeit wird ein neues, rückwärts-kompatibles Metadatenmodell als Lösungsversuch für die oben genannten Probleme präsentiert. Dieses Modell basiert auf stabilen Datei-Identifikatoren und externen, semantischen, Datei- bezogenen Metadatenbeschreibungen welche im RDF Graphenmodell repräsentiert werden. Diese Beschreibungen sind durch eine einheitliche Linked-Data- Schnittstelle zugänglich und können mit anderen Beschreibungen und Ressourcen verlinkt werden. Im Speziellen erlaubt dieses Modell semantische Links zwischen lokalen Dateisystemobjekten und Netzressourcen im Web sowie im entstehenden “Daten Web” und ermöglicht somit die Integration dieser Datenräume. Das Modell hängt entscheidend von der Stabilität dieser Links ab weshalb zwei Algorithmen präsentiert werden, welche deren Integrität in lokalen und vernetzten Umgebungen erhalten können. Dies bedeutet, dass Links zwischen Dateisystemobjekten, Metadatenbeschreibungen und Netzressourcen nicht brechen wenn sich deren Adressen ändern, z.B. wenn Dateien verschoben oder Linked-Data Ressourcen unter geänderten URIs publiziert werden. Schließlich wird eine prototypische Implementierung des vorgeschlagenen Metadatenmodells präsentiert, welche demonstriert wie die Summe dieser Bausteine eine Metadatenschicht bildet die als Grundlage für semantische Datenorganisation auf dem Desktop verwendet werden kann.The organization of (multimedia) data on current desktop systems is done to a large part by arranging files in hierarchical file systems, but also by specialized applications (e.g., music or photo organizing software) that make use of file-related metadata for this task. These metadata are predominantly stored in embedded file headers, using a magnitude of mainly proprietary formats. Generally, metadata and links play the key roles in advanced data organization concepts. Their limited support in prevalent file system implementations, however, hinders the adoption of such concepts on the desktop: First, non-uniform access interfaces require metadata consuming applications to understand both a file’s format and its metadata scheme; second, separate data/metadata access is not possible, and third, metadata cannot be attached to multiple files or to file folders although the latter are the primary constructs for file organization. As a consequence of this, current desktops suffer, inter alia, from (i) limited data organization possibilities, (ii) limited navigability, (iii) limited data findability, and (iv) metadata fragmentation. Although there were attempts to improve this situation, e.g., by introducing semantic file systems, most of these issues were successfully addressed and solved in the Web and in particular in the Semantic Web and reusing these solutions on the desktop, a central hub of data and metadata manipulation, is clearly desirable. In this thesis a novel, backwards-compatible metadata model that addresses the above-mentioned issues is introduced. This model is based on stable file identifiers and external, file-related, semantic metadata descriptions that are represented using the generic RDF graph model. Descriptions are accessible via a uniform Linked Data interface and can be linked with other descriptions and resources. In particular, this model enables semantic linking between local file system objects and remote resources on the Web or the emerging Web of Data, thereby enabling the integration of these data spaces. As the model crucially relies on the stability of these links, we contribute two algorithms that preserve their integrity in local and in remote environments. This means that links between file system objects, metadata descriptions and remote resources do not break even if their addresses change, e.g., when files are moved or Linked Data resources are re-published using different URIs. Finally, we contribute a prototypical implementation of the proposed metadata model that demonstrates how these building blocks sum up to constitute a metadata layer that may act as a foundation for semantic data organization on the desktop

    Interest-based RDF Update Propagation

    Full text link
    Many LOD datasets, such as DBpedia and LinkedGeoData, are voluminous and process large amounts of requests from diverse applications. Many data products and services rely on full or partial local LOD replications to ensure faster querying and processing. While such replicas enhance the flexibility of information sharing and integration infrastructures, they also introduce data duplication with all the associated undesirable consequences. Given the evolving nature of the original and authoritative datasets, to ensure consistent and up-to-date replicas frequent replacements are required at a great cost. In this paper, we introduce an approach for interest-based RDF update propagation, which propagates only interesting parts of updates from the source to the target dataset. Effectively, this enables remote applications to `subscribe' to relevant datasets and consistently reflect the necessary changes locally without the need to frequently replace the entire dataset (or a relevant subset). Our approach is based on a formal definition for graph-pattern-based interest expressions that is used to filter interesting parts of updates from the source. We implement the approach in the iRap framework and perform a comprehensive evaluation based on DBpedia Live updates, to confirm the validity and value of our approach.Comment: 16 pages, Keywords: Change Propagation, Dataset Dynamics, Linked Data, Replicatio

    DELTA-R: a change detection approach for RDF datasets

    Get PDF
    This paper presents the DELTA-R approach that detects and classifies the changes between two versions of a linked dataset. It contributes to the state of the art firstly: by proposing a more granular classification of the resource level changes, and secondly: by automatically selecting the appropriate resource properties to identify the same resources in different versions of a linked dataset with different URIs and similar representation. The paper also presents the DELTA-R change model to represent the changes detected by the DELTA-R approach. This model bridges the gap between resource-centric and triple-centric views of changes in linked datasets. As a result, a single change detection mechanism will be able to support the use cases like interlink maintenance and dataset or replica synchronization. Additionally, the paper describes an experiment conducted to examine the accuracy of the DELTA-R approach in detecting the changes between two versions of a linked dataset. The result indicates that the accuracy of DELTA-R approach outperforms the state of the art approaches by up to 4%. It is demonstrated that the proposed more granular classification of changes helped to identifyup to 1529 additional updated resources compered to X.By means of a case study, we demonstrate the support of DELTA-R approach and change model for an interlink maintenance use case. The result shows that 100% of the broken interlinks were repaired between DBpedia person snapshot 3.7 and Freebase

    Security Aspects in Web of Data Based on Trust Principles. A brief of Literature Review

    Get PDF
    Within scientific community, there is a certain consensus to define "Big Data" as a global set, through a complex integration that embraces several dimensions from using of research data, Open Data, Linked Data, Social Network Data, etc. These data are scattered in different sources, which suppose a mix that respond to diverse philosophies, great diversity of structures, different denominations, etc. Its management faces great technological and methodological challenges: The discovery and selection of data, its extraction and final processing, preservation, visualization, access possibility, greater or lesser structuring, between other aspects, which allow showing a huge domain of study at the level of analysis and implementation in different knowledge domains. However, given the data availability and its possible opening: What problems do the data opening face? This paper shows a literature review about these security aspects

    Security Aspects in Web of Data Based on Trust Principles. A brief of Literature Review

    Get PDF
    Within scientific community, there is a certain consensus to define "Big Data" as a global set, through a complex integration that embraces several dimensions from using of research data, Open Data, Linked Data, Social Network Data, etc. These data are scattered in different sources, which suppose a mix that respond to diverse philosophies, great diversity of structures, different denominations, etc. Its management faces great technological and methodological challenges: The discovery and selection of data, its extraction and final processing, preservation, visualization, access possibility, greater or lesser structuring, between other aspects, that allow showing a huge domain of study at the level of analysis and implementation in different knowledge domains. However, given the data availability and its possible opening: What problems do the data opening face? This paper shows a literature review about these security aspects

    Gestión de fondos de archivos con datos enlazados y consultas federadas

    Get PDF
    In this paper the major technologies of the Semantic Web which may be useful for archives management are summarized. Several local and international projects that generate ontologies from standardized descriptions based on ISAD-G are examined. It is also discussed LIAM (Linked Archival Metadata), that facilitates the transformation of archive records into RFD (Resource Description Framework) format. Furthermore, we analyze how Linked Data enables interoperability between information systems and faceted search of OWL (Ontology Web Language), SKOS (Simple Knowledge Organization System) and Dublin Core records. The authors propose the use of a CMS (Content Management System) compatible with SIOC (Semantically-Interlinked Online Communities) and OAI-PMH (Open Archives Initiative - Protocol for Metadata Harvesting) for archive records to improve the exchange and retrieval of information. We specifically describe the technologies used for developing CoroArchivo, system assessed by an experiment that automatically generates ontologies from ISAD-G records stored in DSpace. The evaluation tool lets users perform federated queries based on the OWL vocabulary disjointness and equivalent classes.En este trabajo se presentan las principales tecnologías de la Web Semántica que pueden ser de utilidad para la gestión de fondos archivísticos. Se examinan diversos proyectos de ámbito internacional y local que parten de descripciones normalizadas ISAD-G para generar ontologías, así como la disponibilidad de LIAM (Linked Archival Metadata), que facilita la transformación de datos de archivo a formato RDF (Resource Description Framework). Por otra parte, se analiza cómo la gestión de datos enlazados permite la interoperabilidad entre sistemas de información y la búsqueda facetada a partir de fondos documentales almacenados, descritos en OWL (Ontology Web Language), SKOS (Simple Knowledge Organization System) y Dublin Core. Los autores proponen la utilización de un CMS (Content Management System) que gestione fondos de archivo, compatible con SIOC (Semantically-Interlinked Online Communities) y OAI-PMH (Open Archives Initiative - Protocol Metadata Harvesting), para facilitar el intercambio y la recuperación de información. En concreto, se detallan las tecnologías que se han utilizado para desarrollar CoroArchivo, sistema que además se evalúa con un experimento que realiza la creación automática de ontologías a partir de descripciones ISAD-G almacenadas en DSpace. La herramienta desarrollada permite realizar consultas federadas sustentadas en las clases de exclusión e igualdad del vocabulario OWL

    Expanding the Usage of Web Archives by Recommending Archived Webpages Using Only the URI

    Get PDF
    Web archives are a window to view past versions of webpages. When a user requests a webpage on the live Web, such as http://tripadvisor.com/where_to_t ravel/, the webpage may not be found, which results in an HyperText Transfer Protocol (HTTP) 404 response. The user then may search for the webpage in a Web archive, such as the Internet Archive. Unfortunately, if this page had never been archived, the user will not be able to view the page, nor will the user gain any information on other webpages that have similar content in the archive, such as the archived webpage http://classy-travel.net. Similarly, if the user requests the webpage http://hokiesports.com/football/ from the Internet Archive, the user will only find the requested webpage, and the user will not gain any information on other webpages that have similar content in the archive, such as the archived webpage http://techsideline.com. In this research, we will build a model for selecting and ranking possible recommended webpages at a Web archive. This is to enhance both HTTP 404 responses and HTTP 200 responses by surfacing webpages in the archive that the user may not know existed. First, we detect semantics in the requested Uniform Resource Identifier (URI). Next, we classify the URI using an ontology, such as DMOZ or any website directory. Finally, we filter and rank candidates based on several features, such as archival quality, webpage popularity, temporal similarity, and content similarity. We measure the performance of each step using different techniques, including calculating the F1 to measure of different tokenization methods and the classification. We tested the model using human evaluation to determine if we could classify and find recommendations for a sample of requests from the Internet Archive’s Wayback Machine access log. Overall, when selecting the full categorization, reviewers agreed with 80.3% of the recommendations, which is much higher than “do not agree” and “I do not know”. This indicates the reviewer is more likely to agree on the recommendations when selecting the full categorization. But when selecting the first level only, reviewers only agreed with 25.5% of the recommendations. This indicates that having deep level categorization improves the performance of finding relevant recommendations
    corecore