62,835 research outputs found

    What are Links in Linked Open Data? A Characterization and Evaluation of Links between Knowledge Graphs on the Web

    Get PDF
    Linked Open Data promises to provide guiding principles to publish interlinked knowledge graphs on the Web in the form of findable, accessible, interoperable and reusable datasets. We argue that while as such, Linked Data may be viewed as a basis for instantiating the FAIR principles, there are still a number of open issues that cause significant data quality issues even when knowledge graphs are published as Linked Data. Firstly, in order to define boundaries of single coherent knowledge graphs within Linked Data, a principled notion of what a dataset is, or, respectively, what links within and between datasets are, has been missing. Secondly, we argue that in order to enable FAIR knowledge graphs, Linked Data misses standardised findability and accessability mechanism, via a single entry link. In order to address the first issue, we (i) propose a rigorous definition of a naming authority for a Linked Data dataset (ii) define different link types for data in Linked datasets, (iii) provide an empirical analysis of linkage among the datasets of the Linked Open Data cloud, and (iv) analyse the dereferenceability of those links. We base our analyses and link computations on a scalable mechanism implemented on top of the HDT format, which allows us to analyse quantity and quality of different link types at scale.Series: Working Papers on Information Systems, Information Business and Operation

    NEFI: Network Extraction From Images

    Full text link
    Networks and network-like structures are amongst the central building blocks of many technological and biological systems. Given a mathematical graph representation of a network, methods from graph theory enable a precise investigation of its properties. Software for the analysis of graphs is widely available and has been applied to graphs describing large scale networks such as social networks, protein-interaction networks, etc. In these applications, graph acquisition, i.e., the extraction of a mathematical graph from a network, is relatively simple. However, for many network-like structures, e.g. leaf venations, slime molds and mud cracks, data collection relies on images where graph extraction requires domain-specific solutions or even manual. Here we introduce Network Extraction From Images, NEFI, a software tool that automatically extracts accurate graphs from images of a wide range of networks originating in various domains. While there is previous work on graph extraction from images, theoretical results are fully accessible only to an expert audience and ready-to-use implementations for non-experts are rarely available or insufficiently documented. NEFI provides a novel platform allowing practitioners from many disciplines to easily extract graph representations from images by supplying flexible tools from image processing, computer vision and graph theory bundled in a convenient package. Thus, NEFI constitutes a scalable alternative to tedious and error-prone manual graph extraction and special purpose tools. We anticipate NEFI to enable the collection of larger datasets by reducing the time spent on graph extraction. The analysis of these new datasets may open up the possibility to gain new insights into the structure and function of various types of networks. NEFI is open source and available http://nefi.mpi-inf.mpg.de

    Towards Scalable Visual Exploration of Very Large RDF Graphs

    Full text link
    In this paper, we outline our work on developing a disk-based infrastructure for efficient visualization and graph exploration operations over very large graphs. The proposed platform, called graphVizdb, is based on a novel technique for indexing and storing the graph. Particularly, the graph layout is indexed with a spatial data structure, i.e., an R-tree, and stored in a database. In runtime, user operations are translated into efficient spatial operations (i.e., window queries) in the backend.Comment: 12th Extended Semantic Web Conference (ESWC 2015

    GoFFish: A Sub-Graph Centric Framework for Large-Scale Graph Analytics

    Full text link
    Large scale graph processing is a major research area for Big Data exploration. Vertex centric programming models like Pregel are gaining traction due to their simple abstraction that allows for scalable execution on distributed systems naturally. However, there are limitations to this approach which cause vertex centric algorithms to under-perform due to poor compute to communication overhead ratio and slow convergence of iterative superstep. In this paper we introduce GoFFish a scalable sub-graph centric framework co-designed with a distributed persistent graph storage for large scale graph analytics on commodity clusters. We introduce a sub-graph centric programming abstraction that combines the scalability of a vertex centric approach with the flexibility of shared memory sub-graph computation. We map Connected Components, SSSP and PageRank algorithms to this model to illustrate its flexibility. Further, we empirically analyze GoFFish using several real world graphs and demonstrate its significant performance improvement, orders of magnitude in some cases, compared to Apache Giraph, the leading open source vertex centric implementation.Comment: Under review by a conference, 201

    Scalable Facility Location for Massive Graphs on Pregel-like Systems

    Full text link
    We propose a new scalable algorithm for facility location. Facility location is a classic problem, where the goal is to select a subset of facilities to open, from a set of candidate facilities F , in order to serve a set of clients C. The objective is to minimize the total cost of opening facilities plus the cost of serving each client from the facility it is assigned to. In this work, we are interested in the graph setting, where the cost of serving a client from a facility is represented by the shortest-path distance on the graph. This setting allows to model natural problems arising in the Web and in social media applications. It also allows to leverage the inherent sparsity of such graphs, as the input is much smaller than the full pairwise distances between all vertices. To obtain truly scalable performance, we design a parallel algorithm that operates on clusters of shared-nothing machines. In particular, we target modern Pregel-like architectures, and we implement our algorithm on Apache Giraph. Our solution makes use of a recent result to build sketches for massive graphs, and of a fast parallel algorithm to find maximal independent sets, as building blocks. In so doing, we show how these problems can be solved on a Pregel-like architecture, and we investigate the properties of these algorithms. Extensive experimental results show that our algorithm scales gracefully to graphs with billions of edges, while obtaining values of the objective function that are competitive with a state-of-the-art sequential algorithm
    • …
    corecore