Search CORE

118 research outputs found

Beyond Macrobenchmarks:Microbenchmark-based Graph Database Evaluation

Author: Brugnara Martin
Lissandrini Matteo
Velegrakis Yannis
Publication venue: 'VLDB Endowment'
Publication date: 01/01/2018
Field of study

Despite the increasing interest in graph databases their requirements and specifications are not yet fully understoodby everyone, leading to a great deal of variation in the supported functionalities and the achieved performances. Inthis work, we provide a comprehensive study of the existing graph database systems. We introduce a novel microbenchmarking framework that provides insights on their performance that go beyond what macro-benchmarks can offer. The framework includes the largest set of queries andoperators so far considered. The graph database systemsare evaluated on synthetic and real data, from different domains, and at scales much larger than any previous work.The framework is materialized as an open-source suite andis easily extended to new datasets, systems, and queries1

Catalogo dei prodotti della ricerca

VBN

Leveraging Pre-trained Language Models for Time Interval Prediction in Text-Enhanced Temporal Knowledge Graphs

Author: Chekol Mel
Islakoglu Duygu Sezen
Velegrakis Yannis
Publication venue
Publication date: 28/09/2023
Field of study

Most knowledge graph completion (KGC) methods learn latent representations of entities and relations of a given graph by mapping them into a vector space. Although the majority of these methods focus on static knowledge graphs, a large number of publicly available KGs contain temporal information stating the time instant/period over which a certain fact has been true. Such graphs are often known as temporal knowledge graphs. Furthermore, knowledge graphs may also contain textual descriptions of entities and relations. Both temporal information and textual descriptions are not taken into account during representation learning by static KGC methods, and only structural information of the graph is leveraged. Recently, some studies have used temporal information to improve link prediction, yet they do not exploit textual descriptions and do not support inductive inference (prediction on entities that have not been seen in training). We propose a novel framework called TEMT that exploits the power of pre-trained language models (PLMs) for text-enhanced temporal knowledge graph completion. The knowledge stored in the parameters of a PLM allows TEMT to produce rich semantic representations of facts and to generalize on previously unseen entities. TEMT leverages textual and temporal information available in a KG, treats them separately, and fuses them to get plausibility scores of facts. Unlike previous approaches, TEMT effectively captures dependencies across different time points and enables predictions on unseen entities. To assess the performance of TEMT, we carried out several experiments including time interval prediction, both in transductive and inductive settings, and triple classification. The experimental results show that TEMT is competitive with the state-of-the-art.Comment: 10 pages, 3 figure

arXiv.org e-Print Archive

Supporting queries spanning across phases of evolving artifacts using Steiner forests

Author: Flavio Rizzolo
John Mylopoulos
Siarhei Bykau
Yannis Velegrakis
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2011
Field of study

The problem of managing evolving data has attracted considerable research attention. Researchers have focused on the modeling and querying of schema/instance-level structural changes, such as, ad-dition, deletion and modification of attributes. Databases with such a functionality are known as temporal databases. A limitation of the temporal databases is that they treat changes as independent events, while often the appearance (or elimination) of some structure in the database is the result of an evolution of some existing structure. We claim that maintaining the causal relationship between the two structures is of major importance since it allows additional reason-ing to be performed and answers to be generated for queries that previously had no answers. We present here a novel framework for exploiting the evolution relationships between the structures in the database. In particu-lar, our system combines different structures that are associated through evolution relationships into virtual structures to be used during query answering. The virtual structures define “possible” database instances, in a fashion similar to the possible worlds in the probabilistic databases. The framework includes a query answering mechanism that allows queries to be answered over these possible databases without materializing them. Evaluation of such queries raises many interesting technical challenges, since it requires the discovery of Steiner forests on the evolution graphs. On this prob-lem we have designed and implemented a new dynamic program-ming algorithm with exponential complexity in the size of the input query and polynomial complexity in terms of both the attribute and the evolution data sizes

CiteSeerX

Crossref

Discovering Dense Correlated Subgraphs in Dynamic Networks

Author: Gionis Aristides
Preti Giulia
Rozenshtein Polina
Velegrakis Yannis
Publication venue
Publication date: 01/01/2021
Field of study

Given a dynamic network, where edges appear and disappear over time, we are interested in finding sets of edges that have similar temporal behavior and form a dense subgraph. Formally, we define the problem as the enumeration of the maximal subgraphs that satisfy specific density and similarity thresholds. To measure the similarity of the temporal behavior, we use the correlation between the binary time series that represent the activity of the edges. For the density, we study two variants based on the average degree. For these problem variants we enumerate the maximal subgraphs and compute a compact subset of subgraphs that have limited overlap. We propose an approximate algorithm that scales well with the size of the network, while achieving a high accuracy. We evaluate our framework on both real and synthetic datasets. The results of the synthetic data demonstrate the high accuracy of the approximation and show the scalability of the framework.Comment: Full version of the paper included in the proceedings of the PAKDD 2021 conferenc

arXiv.org e-Print Archive

Utrecht University Repository

Mining Dense Subgraphs with Similar Edges

Author: Gionis Aristides
Preti Giulia
Rozenshtein Polina
Velegrakis Yannis
Publication venue
Publication date: 08/07/2020
Field of study

When searching for interesting structures in graphs, it is often important to take into account not only the graph connectivity, but also the metadata available, such as node and edge labels, or temporal information. In this paper we are interested in settings where such metadata is used to define a similarity between edges. We consider the problem of finding subgraphs that are dense and whose edges are similar to each other with respect to a given similarity function. Depending on the application, this function can be, for example, the Jaccard similarity between the edge label sets, or the temporal correlation of the edge occurrences in a temporal graph. We formulate a Lagrangian relaxation-based optimization problem to search for dense subgraphs with high pairwise edge similarity. We design a novel algorithm to solve the problem through parametric MinCut, and provide an efficient search scheme to iterate through the values of the Lagrangian multipliers. Our study is complemented by an evaluation on real-world datasets, which demonstrates the usefulness and efficiency of the proposed approach

arXiv.org e-Print Archive

Utrecht University Repository

Preface to the 2nd Workshop on Search, Exploration, and Analysis in Heterogeneous Datastores

Author: Lissandrini Matteo
Mottin Davide
Roy Senjuti Basu
Velegrakis Yannis
Publication venue
Publication date: 01/01/2021
Field of study

Preface to the 2nd Workshop on Search, Exploration, and Analysis in Heterogeneous Datastores Summary: There were 6 research papers accepted for this volume. Moreover, 6 poster papers where also presented and included in this volume

Catalogo dei prodotti della ricerca

VBN

BLADYG: A Graph Processing Framework for Large Dynamic Graphs

Author: Aridhi Sabeur
Montresor Alberto
Velegrakis Yannis
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

International audienceRecently, distributed processing of large dynamic graphs has become very popular , especially in certain domains such as social network analysis, Web graph analysis and spatial network analysis. In this context, many distributed/parallel graph processing systems have been proposed, such as Pregel, PowerGraph, GraphLab, and Trinity. However, these systems deal only with static graphs and do not consider the issue of processing evolving and dynamic graphs. In this paper, we are considering the issues of scale and dynamism in the case of graph processing systems. We present BLADYG, a graph processing framework that addresses the issue of dynamism in large-scale graphs. We present an implementation of BLADYG on top of akka framework. We experimentally evaluate the performance of the proposed framework by applying it to problems such as distributed k-core decomposition and partitioning of large dynamic graphs. The experimental results show that the performance and scalability of BLADYG are satisfying for large-scale dynamic graphs

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

Enabling entity-based aggregators for web 2.0 data

Author: Claudia Niederée
Ekaterini Ioannou
Yannis Velegrakis
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2010
Field of study

Selecting and presenting content culled from multiple heterogeneous and physically distributed sources is a challenging task. The exponential growth of the web data in modern times has brought new requirements to such integration systems. Data is not any more produced by content providers alone, but also from regular users through the highly popular Web 2.0 social and semantic web applications. The plethora of the available web content, increased its demand by regular users who could not any more wait the development of advanced integration tools. They wanted to be able to build in a short time their own specialized integration applications. Aggregators came to the risk of these users. They allowed them not only to combine distributed content, but also to process it in ways that generate new services available for further consumption. To cope with the heterogeneous data, the Linked Data initiative aims at the creation and exploitation of correspondences across data values. In this work, although we share the Linked Data community vision, we advocate that for the modern web, linking at the data value level is not enough. Aggregators should base their integration tasks on the concept of an entity, i.e., identifying whether different pieces of information correspond to the same real world entity, such as an event or a person. We describe our theory, system, and experimental results that illustrate the approach’s effectiveness

CiteSeerX

Crossref

support of part whole relations in query answering

Author: Ekaterini Ioannou
Francesco Guerra
Piotr Kozikowski
Yannis Velegrakis
Publication venue
Publication date: 01/01/2015
Field of study

Part-whole relations are ubiquitous in our world, yet they do not get "first-class" treatment in the data managements systems most commonly used today. One aspect of part-whole relations that is particularly important is that of attribute transitivity. Some attributes of a whole are also attributes of its parts, and vice versa. We propose an extension to a generic entity-centric data model to support part-whole relations and attribute transitivity and provide more meaningful results to certain types of queries as a result. We describe how this model can be implemented using an RDF repository and three approaches to infer the implicit information necessary for query answering that adheres to the semantics of the model. The first approach is a naive implementation and the other two use indexing to improve performance. We evaluate several aspects of our implementations in a series of experimental results that show that the two approaches that use indexing are far superior to the naive approach and exhibit some advantages and disadvantages when compared to each other

Crossref

Open Access Repository

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia