62,835 research outputs found
What are Links in Linked Open Data? A Characterization and Evaluation of Links between Knowledge Graphs on the Web
Linked Open Data promises to provide guiding principles to publish interlinked knowledge graphs on the Web in the form of findable, accessible, interoperable and reusable datasets. We argue that while as such, Linked Data may be viewed as a basis for instantiating the FAIR principles, there are still a number of open issues that cause significant data quality issues even when knowledge graphs are published as Linked Data. Firstly, in order to define boundaries of single coherent knowledge graphs within Linked Data, a principled notion of what a dataset is, or, respectively, what links within and between datasets are, has been missing. Secondly, we argue that in order to enable FAIR knowledge graphs, Linked Data misses standardised findability and accessability mechanism, via a single entry link. In order to address the first issue, we (i) propose a rigorous definition of a naming authority for a Linked Data dataset (ii) define different link types for data in Linked datasets, (iii) provide an empirical analysis of linkage among the datasets of the Linked Open Data cloud, and (iv) analyse the dereferenceability of those links. We base our analyses and link computations on a scalable mechanism implemented on top of the HDT format, which allows us to analyse quantity and quality of different link types at scale.Series: Working Papers on Information Systems, Information Business and Operation
NEFI: Network Extraction From Images
Networks and network-like structures are amongst the central building blocks
of many technological and biological systems. Given a mathematical graph
representation of a network, methods from graph theory enable a precise
investigation of its properties. Software for the analysis of graphs is widely
available and has been applied to graphs describing large scale networks such
as social networks, protein-interaction networks, etc. In these applications,
graph acquisition, i.e., the extraction of a mathematical graph from a network,
is relatively simple. However, for many network-like structures, e.g. leaf
venations, slime molds and mud cracks, data collection relies on images where
graph extraction requires domain-specific solutions or even manual. Here we
introduce Network Extraction From Images, NEFI, a software tool that
automatically extracts accurate graphs from images of a wide range of networks
originating in various domains. While there is previous work on graph
extraction from images, theoretical results are fully accessible only to an
expert audience and ready-to-use implementations for non-experts are rarely
available or insufficiently documented. NEFI provides a novel platform allowing
practitioners from many disciplines to easily extract graph representations
from images by supplying flexible tools from image processing, computer vision
and graph theory bundled in a convenient package. Thus, NEFI constitutes a
scalable alternative to tedious and error-prone manual graph extraction and
special purpose tools. We anticipate NEFI to enable the collection of larger
datasets by reducing the time spent on graph extraction. The analysis of these
new datasets may open up the possibility to gain new insights into the
structure and function of various types of networks. NEFI is open source and
available http://nefi.mpi-inf.mpg.de
Towards Scalable Visual Exploration of Very Large RDF Graphs
In this paper, we outline our work on developing a disk-based infrastructure
for efficient visualization and graph exploration operations over very large
graphs. The proposed platform, called graphVizdb, is based on a novel technique
for indexing and storing the graph. Particularly, the graph layout is indexed
with a spatial data structure, i.e., an R-tree, and stored in a database. In
runtime, user operations are translated into efficient spatial operations
(i.e., window queries) in the backend.Comment: 12th Extended Semantic Web Conference (ESWC 2015
GoFFish: A Sub-Graph Centric Framework for Large-Scale Graph Analytics
Large scale graph processing is a major research area for Big Data
exploration. Vertex centric programming models like Pregel are gaining traction
due to their simple abstraction that allows for scalable execution on
distributed systems naturally. However, there are limitations to this approach
which cause vertex centric algorithms to under-perform due to poor compute to
communication overhead ratio and slow convergence of iterative superstep. In
this paper we introduce GoFFish a scalable sub-graph centric framework
co-designed with a distributed persistent graph storage for large scale graph
analytics on commodity clusters. We introduce a sub-graph centric programming
abstraction that combines the scalability of a vertex centric approach with the
flexibility of shared memory sub-graph computation. We map Connected
Components, SSSP and PageRank algorithms to this model to illustrate its
flexibility. Further, we empirically analyze GoFFish using several real world
graphs and demonstrate its significant performance improvement, orders of
magnitude in some cases, compared to Apache Giraph, the leading open source
vertex centric implementation.Comment: Under review by a conference, 201
Scalable Facility Location for Massive Graphs on Pregel-like Systems
We propose a new scalable algorithm for facility location. Facility location
is a classic problem, where the goal is to select a subset of facilities to
open, from a set of candidate facilities F , in order to serve a set of clients
C. The objective is to minimize the total cost of opening facilities plus the
cost of serving each client from the facility it is assigned to. In this work,
we are interested in the graph setting, where the cost of serving a client from
a facility is represented by the shortest-path distance on the graph. This
setting allows to model natural problems arising in the Web and in social media
applications. It also allows to leverage the inherent sparsity of such graphs,
as the input is much smaller than the full pairwise distances between all
vertices.
To obtain truly scalable performance, we design a parallel algorithm that
operates on clusters of shared-nothing machines. In particular, we target
modern Pregel-like architectures, and we implement our algorithm on Apache
Giraph. Our solution makes use of a recent result to build sketches for massive
graphs, and of a fast parallel algorithm to find maximal independent sets, as
building blocks. In so doing, we show how these problems can be solved on a
Pregel-like architecture, and we investigate the properties of these
algorithms. Extensive experimental results show that our algorithm scales
gracefully to graphs with billions of edges, while obtaining values of the
objective function that are competitive with a state-of-the-art sequential
algorithm
- …