153 research outputs found

    Linking the Semantics Ecosystem with Semantics Derivation Rules for Multimedia Content

    Get PDF
    Multimedia content exhibits multiple semantics that is influenced by different factors like time, contextual use, and personal background. With the semantics ecosystem, we find an elegant and high-level description of the different factors that influence the semantics of multimedia content. On the other hand, semantics derivation rules are a concrete means to extract and to derive semantics of multimedia content while authoring it. These rules are directly applicable in concrete applications and domains. Thus, there is a gap between the high-level ecosystem and the concrete semantics derivation rules. In this position paper, we propose the use of an ontology-based description of events to combine the high-level description of the semantics ecosystem with the concrete method of semantics derivation for page-based multimedia presentations

    Towards flexible indices for distributed graph data: The formal schema-level index model FLuID

    Get PDF
    Graph indices are a key to manage huge amounts of distributed graph data. Instance-level indices are available that focus on the fast retrieval of nodes. Furthermore, there are so-called schema-level indices focusing on summarizing nodes sharing common characteristics, i. e., the combination of attached types and used property-labels. We argue that there is not a one-size-fits-all schema-level index. Rather, a parameterized, formal model is needed that allows to quickly design, tailor, and compare different schema-level indices. We abstract from related works and provide the formal model FLuID using basic building blocks to flexibly define different schema-level indices. The FLuID model provides parameterized simple and complex schema elements together with four parameters. We show that all indices modeled in FLuID can be computed in O(n). Thus, FLuID enables us to efficiently implement, compare, and validate variants of schema-level indices tailored for specific application scenarios

    A systematic comparison of different approaches of unsupervised extraction of text from scholary figures

    Get PDF
    Different approaches have been proposed in the past to address the challenge of extracting text from scholarly figures. However, so far a comparative evaluation of the different approaches has not been conducted. Based on an extensive study, we compare the 7 most relevant approaches described in the literature as well as 25 systematic combinations of methods for extracting text from scholarly figures. To this end, we define a generic pipeline, consisting of six individual steps. We map the existing approaches to this pipeline and re-implement their methods for each pipeline step. The method-wise re-implementation allows to freely combine the different possible methods for each pipeline step. Overall, we have evaluated 32 different pipeline configurations and systematically compared the different methods and approaches. We evaluate the pipeline configurations over four datasets of scholarly figures of different origin and characteristics. The quality of the extraction results is assessed using F-measure and Levenshtein distance. In addition, we measure the runtime performance. The experimental results show that there is an approach that overall shows the best text extraction quality on all datasets. Regarding runtime, we observe huge differences from very fast approaches to those running for several weeks

    Towards a configurable framework for iterative signing of distributed graph data

    Get PDF
    When publishing graph data on the web such as vocabularies using RDF(S) or OWL, one has only limited means to verify its authenticity and integrity. Today’s approaches require a high signature overhead and do not allow for an iterative signing of graph data. This paper presents a configurable framework for signing arbitrary graph data provided in RDF(S), Named Graphs, or OWL. Our framework supports signing graph data at different levels of granularity: minimum self-contained graphs (MSG), sets of MSGs, and entire graphs. It supports an iterative signing of graph data, e. g., when different parties provide different parts of a common graph, and allows for signing multiple graphs. Both can be done with a constant, low overhead for the signature graph, even when iteratively signing graph data

    Virtual Laboratories as Preparation to a Practical Laboratory Course at the Example of Genetics

    Get PDF
    A virtual laboratory is an abstraction of a real laboratory and allows for executing experiments in a computer-based simulation. Goal of virtual laboratories is to train the student’s procedural knowledge that is needed for conducting experiments in a real laboratory environment. Students can train themselves comfortably in a secure environment using the computer and without wasting precious resources such as substances and devices. Different aspects of virtual laboratories in the field of genetics have been evaluated in the past. However, to the best of our knowledge there is so far no evaluation carried out that is investigating the impact of training with a virtual laboratory to the realworld laboratory course. In order to address this gap, we have conducted a comparative study using the photorealistic virtual laboratory GenLab for genetics and genetic engineering. While one group of students (n=18) did receive a training using GenLab prior to real-world laboratory experimentation, the others did not (n=14). We recorded the students’ own assessment of the experiments complexity and comprehensibility. For two experiments, we recorded more detailed information as they were trained using GenLab in the treatment group. In addition, we measured the time needed by the students for conducting experiments in a real laboratory course. The results show that there are some significant differences for the more complex experiment tasks, while this was not observed for the less complex ones. The differences might be explained by the amount of repetitive and rather simpler tasks versus some other tasks that are also repetitive but require higher concentration in order to avoid mistakes. Furthermore, the more complex experiment was reproduced more closely in the virtual lab. This indicates that procedural knowledge is best acquired when the experiment can be reenacted virtually step by step. Overall, working with the virtual lab was perceived positively by the students. Hence, its integration within the curriculum of genetics is considered to be beneficial for the students’ motivation and their preparedness for the real-world lab

    Formalization and preliminary evaluation of a pipeline for text extraction from infographics

    Get PDF
    We propose a pipeline for text extraction from infographics that makes use of a novel combination of data mining and computer vision techniques. The pipeline defines a sequence of steps to identify characters, cluster them into text lines, determine their rotation angle, and apply state-of-the-art OCR to recognise the text. In this paper, we formally define the pipeline and present its current implementation. In addition, we have conducted preliminary evaluations over a data corpus of 121 manually annotated infographics from a broad range of illustration types such as bar charts, pie charts, and line charts, maps, and others. We assess the results of our text extraction pipeline by comparing it with two baselines. Finally, we sketch an outline for future work and possibilities for improving the pipeline

    Will linked data benefit from inverse link traversal?

    Get PDF
    Query execution using link-traversal is a promising approach for retrieving and accessing data on the web. However, this approach finds its limitation when it comes to query patterns such as ?s rdf:type ex:Employee, where one does not know the subject URI. Such queries are quite useful for different application needs. In this paper, we conduct an empirical analysis on the use of such patterns in SPARQL query logs. We present different solution approaches to extend the current Linked Open Data principles with the ability for inverse link traversal. We discuss the advantages and disadvantages of the different approaches

    Information-theoretic analysis of entity dynamics on the linked open data cloud

    Get PDF
    The Linked Open Data (LOD) cloud is expanding continuously. Entities appear, change, and disappear over time. However, relatively little is known about the dynamics of the entities, i. e., the characteristics of their temporal evolution. In this paper, we employ clustering techniques over the dynamics of entities to determine common temporal patterns. We define an entity as RDF resource together with its attached RDF types and properties. The quality of the clusterings is evaluated using entity features such as the entities’ properties, RDF types, and pay-level domain. In addition, we investigate to what extend entities that share a feature value change together over time. As dataset, we use weekly LOD snapshots over a period of more than three years provided by the Dynamic Linked Data Observatory. Insights into the dynamics of entities on the LOD cloud has strong practical implications to any application requiring fresh caches of LOD. The range of applications is from determining crawling strategies for LOD, caching SPARQL queries, to programming against LOD, and recommending vocabularies for reusing LOD vocabularies

    The Split Matters: Flat Minima Methods for Improving the Performance of GNNs

    Full text link
    When training a Neural Network, it is optimized using the available training data with the hope that it generalizes well to new or unseen testing data. At the same absolute value, a flat minimum in the loss landscape is presumed to generalize better than a sharp minimum. Methods for determining flat minima have been mostly researched for independent and identically distributed (i. i. d.) data such as images. Graphs are inherently non-i. i. d. since the vertices are edge-connected. We investigate flat minima methods and combinations of those methods for training graph neural networks (GNNs). We use GCN and GAT as well as extend Graph-MLP to work with more layers and larger graphs. We conduct experiments on small and large citation, co-purchase, and protein datasets with different train-test splits in both the transductive and inductive training procedure. Results show that flat minima methods can improve the performance of GNN models by over 2 points, if the train-test split is randomized. Following Shchur et al., randomized splits are essential for a fair evaluation of GNNs, as other (fixed) splits like 'Planetoid' are biased. Overall, we provide important insights for improving and fairly evaluating flat minima methods on GNNs. We recommend practitioners to always use weight averaging techniques, in particular EWA when using early stopping. While weight averaging techniques are only sometimes the best performing method, they are less sensitive to hyperparameters, need no additional training, and keep the original model unchanged. All source code is available in https://github.com/Foisunt/FMMs-in-GNNs
    • …
    corecore