118 research outputs found
Beyond Macrobenchmarks:Microbenchmark-based Graph Database Evaluation
Despite the increasing interest in graph databases their requirements and specifications are not yet fully understoodby everyone, leading to a great deal of variation in the supported functionalities and the achieved performances. Inthis work, we provide a comprehensive study of the existing graph database systems. We introduce a novel microbenchmarking framework that provides insights on their performance that go beyond what macro-benchmarks can offer. The framework includes the largest set of queries andoperators so far considered. The graph database systemsare evaluated on synthetic and real data, from different domains, and at scales much larger than any previous work.The framework is materialized as an open-source suite andis easily extended to new datasets, systems, and queries1
Leveraging Pre-trained Language Models for Time Interval Prediction in Text-Enhanced Temporal Knowledge Graphs
Most knowledge graph completion (KGC) methods learn latent representations of
entities and relations of a given graph by mapping them into a vector space.
Although the majority of these methods focus on static knowledge graphs, a
large number of publicly available KGs contain temporal information stating the
time instant/period over which a certain fact has been true. Such graphs are
often known as temporal knowledge graphs. Furthermore, knowledge graphs may
also contain textual descriptions of entities and relations. Both temporal
information and textual descriptions are not taken into account during
representation learning by static KGC methods, and only structural information
of the graph is leveraged. Recently, some studies have used temporal
information to improve link prediction, yet they do not exploit textual
descriptions and do not support inductive inference (prediction on entities
that have not been seen in training).
We propose a novel framework called TEMT that exploits the power of
pre-trained language models (PLMs) for text-enhanced temporal knowledge graph
completion. The knowledge stored in the parameters of a PLM allows TEMT to
produce rich semantic representations of facts and to generalize on previously
unseen entities. TEMT leverages textual and temporal information available in a
KG, treats them separately, and fuses them to get plausibility scores of facts.
Unlike previous approaches, TEMT effectively captures dependencies across
different time points and enables predictions on unseen entities. To assess the
performance of TEMT, we carried out several experiments including time interval
prediction, both in transductive and inductive settings, and triple
classification. The experimental results show that TEMT is competitive with the
state-of-the-art.Comment: 10 pages, 3 figure
Supporting queries spanning across phases of evolving artifacts using Steiner forests
The problem of managing evolving data has attracted considerable research attention. Researchers have focused on the modeling and querying of schema/instance-level structural changes, such as, ad-dition, deletion and modification of attributes. Databases with such a functionality are known as temporal databases. A limitation of the temporal databases is that they treat changes as independent events, while often the appearance (or elimination) of some structure in the database is the result of an evolution of some existing structure. We claim that maintaining the causal relationship between the two structures is of major importance since it allows additional reason-ing to be performed and answers to be generated for queries that previously had no answers. We present here a novel framework for exploiting the evolution relationships between the structures in the database. In particu-lar, our system combines different structures that are associated through evolution relationships into virtual structures to be used during query answering. The virtual structures define “possible” database instances, in a fashion similar to the possible worlds in the probabilistic databases. The framework includes a query answering mechanism that allows queries to be answered over these possible databases without materializing them. Evaluation of such queries raises many interesting technical challenges, since it requires the discovery of Steiner forests on the evolution graphs. On this prob-lem we have designed and implemented a new dynamic program-ming algorithm with exponential complexity in the size of the input query and polynomial complexity in terms of both the attribute and the evolution data sizes
Discovering Dense Correlated Subgraphs in Dynamic Networks
Given a dynamic network, where edges appear and disappear over time, we are
interested in finding sets of edges that have similar temporal behavior and
form a dense subgraph. Formally, we define the problem as the enumeration of
the maximal subgraphs that satisfy specific density and similarity thresholds.
To measure the similarity of the temporal behavior, we use the correlation
between the binary time series that represent the activity of the edges. For
the density, we study two variants based on the average degree. For these
problem variants we enumerate the maximal subgraphs and compute a compact
subset of subgraphs that have limited overlap. We propose an approximate
algorithm that scales well with the size of the network, while achieving a high
accuracy. We evaluate our framework on both real and synthetic datasets. The
results of the synthetic data demonstrate the high accuracy of the
approximation and show the scalability of the framework.Comment: Full version of the paper included in the proceedings of the PAKDD
2021 conferenc
Mining Dense Subgraphs with Similar Edges
When searching for interesting structures in graphs, it is often important to
take into account not only the graph connectivity, but also the metadata
available, such as node and edge labels, or temporal information. In this paper
we are interested in settings where such metadata is used to define a
similarity between edges. We consider the problem of finding subgraphs that are
dense and whose edges are similar to each other with respect to a given
similarity function. Depending on the application, this function can be, for
example, the Jaccard similarity between the edge label sets, or the temporal
correlation of the edge occurrences in a temporal graph. We formulate a
Lagrangian relaxation-based optimization problem to search for dense subgraphs
with high pairwise edge similarity. We design a novel algorithm to solve the
problem through parametric MinCut, and provide an efficient search scheme to
iterate through the values of the Lagrangian multipliers. Our study is
complemented by an evaluation on real-world datasets, which demonstrates the
usefulness and efficiency of the proposed approach
Preface to the 2nd Workshop on Search, Exploration, and Analysis in Heterogeneous Datastores
Preface to the 2nd Workshop on Search, Exploration, and Analysis in Heterogeneous Datastores Summary: There were 6 research papers accepted for this volume. Moreover, 6 poster papers where also presented and included in this volume
BLADYG: A Graph Processing Framework for Large Dynamic Graphs
International audienceRecently, distributed processing of large dynamic graphs has become very popular , especially in certain domains such as social network analysis, Web graph analysis and spatial network analysis. In this context, many distributed/parallel graph processing systems have been proposed, such as Pregel, PowerGraph, GraphLab, and Trinity. However, these systems deal only with static graphs and do not consider the issue of processing evolving and dynamic graphs. In this paper, we are considering the issues of scale and dynamism in the case of graph processing systems. We present BLADYG, a graph processing framework that addresses the issue of dynamism in large-scale graphs. We present an implementation of BLADYG on top of akka framework. We experimentally evaluate the performance of the proposed framework by applying it to problems such as distributed k-core decomposition and partitioning of large dynamic graphs. The experimental results show that the performance and scalability of BLADYG are satisfying for large-scale dynamic graphs
Enabling entity-based aggregators for web 2.0 data
Selecting and presenting content culled from multiple heterogeneous and physically distributed sources is a challenging task. The exponential growth of the web data in modern times has brought new requirements to such integration systems. Data is not any more produced by content providers alone, but also from regular users through the highly popular Web 2.0 social and semantic web applications. The plethora of the available web content, increased its demand by regular users who could not any more wait the development of advanced integration tools. They wanted to be able to build in a short time their own specialized integration applications. Aggregators came to the risk of these users. They allowed them not only to combine distributed content, but also to process it in ways that generate new services available for further consumption. To cope with the heterogeneous data, the Linked Data initiative aims at the creation and exploitation of correspondences across data values. In this work, although we share the Linked Data community vision, we advocate that for the modern web, linking at the data value level is not enough. Aggregators should base their integration tasks on the concept of an entity, i.e., identifying whether different pieces of information correspond to the same real world entity, such as an event or a person. We describe our theory, system, and experimental results that illustrate the approach’s effectiveness
support of part whole relations in query answering
Part-whole relations are ubiquitous in our world, yet they do not get "first-class" treatment in the data managements systems most commonly used today. One aspect of part-whole relations that is particularly important is that of attribute transitivity. Some attributes of a whole are also attributes of its parts, and vice versa. We propose an extension to a generic entity-centric data model to support part-whole relations and attribute transitivity and provide more meaningful results to certain types of queries as a result. We describe how this model can be implemented using an RDF repository and three approaches to infer the implicit information necessary for query answering that adheres to the semantics of the model. The first approach is a naive implementation and the other two use indexing to improve performance. We evaluate several aspects of our implementations in a series of experimental results that show that the two approaches that use indexing are far superior to the naive approach and exhibit some advantages and disadvantages when compared to each other
- …