151 research outputs found

    Binary RDF for Scalable Publishing, Exchanging and Consumption in the Web of Data

    Get PDF
    El actual diluvio de datos está inundando la web con grandes volúmenes de datos representados en RDF, dando lugar a la denominada 'Web de Datos'. En esta tesis proponemos, en primer lugar, un estudio profundo de aquellos textos que nos permitan abordar un conocimiento global de la estructura real de los conjuntos de datos RDF, HDT, que afronta la representación eficiente de grandes volúmenes de datos RDF a través de estructuras optimizadas para su almacenamiento y transmisión en red. HDT representa efizcamente un conjunto de datos RDF a través de su división en tres componentes: la cabecera (Header), el diccionario (Dictionary) y la estructura de sentencias RDF (Triples). A continuación, nos centramos en proveer estructuras eficientes de dichos componentes, ocupando un espacio comprimido al tiempo que se permite el acceso directo a cualquier dat

    Compacting Frequent Star Patterns in RDF Graphs

    Get PDF
    Knowledge graphs have become a popular formalism for representing entities and their properties using a graph data model, e.g., the Resource Description Framework (RDF). An RDF graph comprises entities of the same type connected to objects or other entities using labeled edges annotated with properties. RDF graphs usually contain entities that share the same objects in a certain group of properties, i.e., they match star patterns composed of these properties and objects. In case the number of these entities or properties in these star patterns is large, the size of the RDF graph and query processing are negatively impacted; we refer these star patterns as frequent star patterns. We address the problem of identifying frequent star patterns in RDF graphs and devise the concept of factorized RDF graphs, which denote compact representations of RDF graphs where the number of frequent star patterns is minimized. We also develop computational methods to identify frequent star patterns and generate a factorized RDF graph, where compact RDF molecules replace frequent star patterns. A compact RDF molecule of a frequent star pattern denotes an RDF subgraph that instantiates the corresponding star pattern. Instead of having all the entities matching the original frequent star pattern, a surrogate entity is added and related to the properties of the frequent star pattern; it is linked to the entities that originally match the frequent star pattern. We evaluate the performance of our factorization techniques on several RDF graph benchmarks and compare with a baseline built on top of gSpan, a state-of-the-art algorithm to detect frequent patterns. The outcomes evidence the efficiency of proposed approach and show that our techniques are able to reduce execution time of the baseline approach in at least three orders of magnitude reducing the RDF graph size by up to 66.56%

    Compacting frequent star patterns in RDF graphs

    Get PDF
    Knowledge graphs have become a popular formalism for representing entities and their properties using a graph data model, e.g., the Resource Description Framework (RDF). An RDF graph comprises entities of the same type connected to objects or other entities using labeled edges annotated with properties. RDF graphs usually contain entities that share the same objects in a certain group of properties, i.e., they match star patterns composed of these properties and objects. In case the number of these entities or properties in these star patterns is large, the size of the RDF graph and query processing are negatively impacted; we refer these star patterns as frequent star patterns. We address the problem of identifying frequent star patterns in RDF graphs and devise the concept of factorized RDF graphs, which denote compact representations of RDF graphs where the number of frequent star patterns is minimized. We also develop computational methods to identify frequent star patterns and generate a factorized RDF graph, where compact RDF molecules replace frequent star patterns. A compact RDF molecule of a frequent star pattern denotes an RDF subgraph that instantiates the corresponding star pattern. Instead of having all the entities matching the original frequent star pattern, a surrogate entity is added and related to the properties of the frequent star pattern; it is linked to the entities that originally match the frequent star pattern. Since the edges between the entities and the objects in the frequent star pattern are replaced by edges between these entities and the surrogate entity of the compact RDF molecule, the size of the RDF graph is reduced. We evaluate the performance of our factorization techniques on several RDF graph benchmarks and compare with a baseline built on top gSpan, a state-of-the-art algorithm to detect frequent patterns. The outcomes evidence the efficiency of proposed approach and show that our techniques are able to reduce execution time of the baseline approach in at least three orders of magnitude. Additionally, RDF graph size can be reduced by up to 66.56% while data represented in the original RDF graph is preserved

    Storing and querying evolving knowledge graphs on the web

    Get PDF

    Evaluating SQuAD-based Question Answering for the Open Research Knowledge Graph Completion

    Get PDF
    Every year, approximately around 2.5 million new scientific papers are published. With the rapidly growing publication trends, it is increasingly difficult to manually sort through and keep track of the relevant research – a problem that is only more acute in a multidisciplinary setting. The Open Research Knowledge Graph (ORKG) is a next-generation scholarly communication platform that aims to address this issue by making knowledge about scholarly contributions machine-actionable, thus enabling completely new ways of human-machine assistance in comprehending re- search progress. As such, the ORKG is powered by a diverse spectrum of NLP services to assist the expert users in structuring scholarly contributions and searching for the most rele- vant contributions. For a prospective recommendation service, this thesis examines the task of automated ORKG completion as an object extraction task from a given paper Abstract for a query ORKG predicate. As a main contribution of this thesis, automated ORKG completion is formulated as an extractive Question Answering (QA) machine learning objective under an open world assumption. Specifically, the task attempted in this work is fixed-prompt Language Model (LM) tuning (LMT) for few-shot ORKG object prediction formulated as the well-known SQuAD extrac- tive QA objective. Three variants of BERT-based transfomer LMs are evaluated. To support the novel LMT task, this thesis introduces a scholarly QA dataset akin in characteristics to the SQuAD QA dataset generated semi-automatically from the ORKG knowledge base. As a result, the BERT model variants when tested in vanilla setting versus after LMT, show a positive, significant performance uplift for auto-mated ORKG completion as an object completion task. This thesis offers a strong empirical basis for future research aiming at a production-ready automated ORKG completion model

    On construction, performance, and diversification for structured queries on the semantic desktop

    Get PDF
    [no abstract

    Linked Data Entity Summarization

    Get PDF
    On the Web, the amount of structured and Linked Data about entities is constantly growing. Descriptions of single entities often include thousands of statements and it becomes difficult to comprehend the data, unless a selection of the most relevant facts is provided. This doctoral thesis addresses the problem of Linked Data entity summarization. The contributions involve two entity summarization approaches, a common API for entity summarization, and an approach for entity data fusion

    Statistical Methodologies

    Get PDF
    Statistical practices have recently been questioned by numerous independent authors, to the extent that a significant fraction of accepted research findings can be questioned. This suggests that statistical methodologies may have gone too far into an engineering practice, with minimal concern for their foundation, interpretation, assumptions, and limitations, which may be jeopardized in the current context. Disguised by overwhelming data sets, advanced processing, and stunning presentations, the basic approach is often intractable to anyone but the analyst. The hierarchical nature of statistical inference, exemplified by Bayesian aggregation of prior and derived knowledge, may also be challenging. Conceptual simplified studies of the kind presented in this book could therefore provide valuable guidance when developing statistical methodologies, but also applying state of the art with greater confidence

    The use of data-mining for the automatic formation of tactics

    Get PDF
    This paper discusses the usse of data-mining for the automatic formation of tactics. It was presented at the Workshop on Computer-Supported Mathematical Theory Development held at IJCAR in 2004. The aim of this project is to evaluate the applicability of data-mining techniques to the automatic formation of tactics from large corpuses of proofs. We data-mine information from large proof corpuses to find commonly occurring patterns. These patterns are then evolved into tactics using genetic programming techniques

    An infrastructure for the development of Semantic Desktop applications

    Get PDF
    In einem permanent wachsenden Ausmaß wird unser Leben digital organisiert. Viele tagtägliche Aktivitäten manifestieren sich (auch) in digitaler Form: einerseits explizit, wenn digitale Informationen für Arbeitsaufgaben oder in der Freizeit entstehen und verwendet werden; andererseits auch implizit, wenn Informationen indirekt, als Konsequenz unseres Handelns, erzeugt oder manipuliert wird. Ein großer Teil dieser Informationsbestände ist persönlicher Natur, d.h., diese Information hat einen bestimmten Bezug zu uns als Person. Die Speicher- und Rechenleistung der Geräte, mit denen wir üblicherweise mit solchen persönlichen Daten interagieren, wurde in den letzten Jahren kontinuierlich erhöht, und es besteht Grund zur Annahme, dass sich diese Entwicklung in der Zukunft fortsetzt. Während also die physische Leistung von Datenspeichern enorm erhöht wurde, hat deren logische und organisatorische Leistung seit der Erfindung der ersten Personal Computer praktisch stagniert. Nach wie vor sind hierarchische Dateisysteme der de-facto-Standard für die Organisation von persönlichen Daten. Solche Dateisysteme repräsentieren Daten als diskrete Einheiten (Dateien), die Blätter eines Baums von beschrifteten Knoten (Verzeichnisse) darstellen. Die Unterteilung des persönlichen Datenraums in kleine Einheiten unterstützt die Handhabung solcher Strukturen durch den Menschen, allerdings können viele Arten von Organisationsinformation nicht adäquat in einer Baumstruktur dargestellt werden. Dies wirkt sich negativ auf die Qualität der Datenorganisation aus. Aktuelle Forschung im Bereich Personal Information Management liefert zwar mögliche Ansätze, um hierarchische Systeme zu ersetzen, tendiert jedoch manchmal dazu, die Arbeit mit Information überzuformalisieren. Dies ist insbesondere kritisch, weil der durchschnittliche Anwender von PIM-Systemen über keine Erfahrung mit komplexen logischen Systemen verfügt. Diese Arbeit präsentiert ein alternatives Organisationsmodell für persönliche Daten, die darauf abzielt, eine Balance zwischen der unstrukturierten Charakteristik von Dateisystemen und den formalen Eigenschaften von logik-basierten Systemen zu finden. Nach einer vergleichenden Studie der aktuellen Forschungssituation im Bereich Semantic Desktop und Personal Information Management wird dieses Modell auf drei Ebenen vorgestellt. Zunächst wird ein abstraktes Modell sowie eine Abfrage-Algebra in Form von abstrakten Operationen auf dieses Modell vorgestellt. Dieses Modell erlaubt die Abbildung von im Personal Information Management gebräuchlichen Daten, aber erfordert keine völlige Umstellung auf Seiten des Benutzers. Anschließend wird dieses abstrakte Modell in konkreten Repräsentationen übergeführt, und es wird gezeigt, wie diese Repräsentationen effizient bearbeitet, gespeichert, und ausgetauscht werden können. Schließlich wird die Anwendung dieses Modells anhand von konkreten prototypischen Implementierungen gezeigt.The extent to which our daily lives are digitized is continuously growing. Many of our everyday activities manifest themselves in digital form; either in an explicit way, when we actively use digital information for work or spare time; or in an implicit way, when information is indirectly created or manipulated as a consequence of our action. A large fraction of these data volumes can be considered as personal information, that is, information that has a certain class of relationship to us as human beings. The storage and processing capacity of the devices that we use to interact with these data has been enormously increasing over the last years, and we can expect this development to continue in the future. However, while the power of physical data storage is permanently increasing, the development of logical data organization power of personal devices has been stagnating since the invention of the first personal computers. Still, hierarchical file systems are the de-facto standard for data organization on personal devices. File systems represent information as a set of discrete data units (files) that are arranged as leaves on a tree of labeled nodes (directories). This structure, on the one hand, can be easily understood by humans, since the separation into small information units supports the manual manageability of the personal data space, in comparison to systems that employ continuous data structures. On the other hand, hierarchical structures suffer from a number of deficiencies which have negative impact on the quality of personal information management, and it lacks of expressive mechanisms which in turn would help to improve information retrieval according to user needs. Significant research effort has been invested in order to improve the mechanisms for personal information management. The resulting works represent potential alternatives or supplements for systems in place, but sometimes run the risk of over-formalizing information management; a problem that is especially apparent in situations where a non-expert end user is the direct consumer of such services. The contribution of this thesis is to present an alternative organizational model for management of personal data that strikes a balance between the unstructured nature of file systems and the highly formal characteristics of logic-based systems. After a comparative analysis of the current situation and recent research effort in this direction, it describes this organizational metaphor on three levels: First, on a conceptual level, it discusses an abstract data model, a corresponding query algebra, and a set of abstract operations on this data model. This formal framework is suitable to represent common data structures and usage patterns that can be found in personal information management, but on the same time does not enforce a complete paradigm shift away from established systems. Second, on a representation level, it discusses how this model can be efficiently processed, stored, and exchanged between different systems. Third, on an implementation level, it describes how concrete realizations of this data model can be built and used in various application scenarios
    corecore