Search CORE

14 research outputs found

Manipulation Detection in Cryptocurrency Markets: An Anomaly and Change Detection Based Approach

Author: Data Intensive Systems
Kampers Olaf
Mathur Swati
Qahtan Abdulhakim
Sub Data Intensive Systems
Velegrakis Yannis
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 25/04/2022
Field of study

As a financial asset, cryptocurrencies innovated the financial industry in different ways. However, the lack of regulations and transparency in cryptocurrency markets is hindering the industry from reaching its full potential. There is a need for extensive technical analysis of the cryptocurrency market data to detect possible market manipulation attempts. Anomaly detection techniques can reveal information about abnormal activities in the market and provide insights on manipulation attempts. In this study, a robust unsupervised anomaly detection tool (ADT) is developed for this purpose. Experiments show that ADT outperforms a set of methods in detecting the anomalies in features extracted from the cryptocurrency exchanges data and on a set of benchmark data sets

Mining patterns in graphs with multiple weights

Author: Data Intensive Systems
Lissandrini Matteo
Mottin Davide
Preti Giulia
Sub Data Intensive Systems
Velegrakis Yannis
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

Graph pattern mining aims at identifying structures that appear frequently in large graphs, under the assumption that frequency signifies importance. In real life, there are many graphs with weights on nodes and/or edges. For these graphs, it is fair that the importance (score) of a pattern is determined not only by the number of its appearances, but also by the weights on the nodes/edges of those appearances. Scoring functions based on the weights do not generally satisfy the apriori property, which guarantees that the number of appearances of a pattern cannot be larger than the frequency of any of its sub-patterns, and hence allows faster pruning. Therefore, existing approaches employ other, less efficient, pruning strategies. The problem becomes even more challenging in the case of multiple weighting functions that assign different weights to the same nodes/edges. In this work we propose a new family of scoring functions that respects the apriori property, and thus can rely on effective pruning strategies. We provide efficient and effective techniques for mining patterns in multi-weighted graphs, and we devise both an exact and an approximate solution. In addition, we propose a distributed version of our approach, which distributes the appearances of the patterns to examine among multiple workers. Extensive experiments on both real and synthetic datasets prove that the presence of edge weights and the choice of scoring function affect the patterns mined, and the quality of the results returned to the user. Moreover, we show that, even when the performance of the exact algorithm degrades because of an increasing number of weighting functions, the approximate algorithm performs well and with fairly good quality. Finally, the distributed algorithm proves to be the best choice for mining large and rich input graphs

The KEYSTONE IC1302 COST Action

Author: Breslin John G.
Cardoso Jorge
Data Intensive Systems
Guerra Francesco
Sub Data Intensive Systems
Szymanski Julian
Velegrakis Yannis
Velegrakis Yannis
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

As more and more data becomes available on the Web, as its complexity increases and as the Web’s user base shifts towards a more general non-technical population, keyword searching is becoming a valuable alternative to traditional SQL queries, mainly due to its simplicity and the lower effort/expertise it requires. Existing approaches suffer from a number of limitations when applied to multi-source scenarios requiring some form of query planning, without direct access to database instances, and with frequent updates precluding any effective implementation of data indexes. Typical scenarios include Deep Web databases, virtual data integration systems and data on the Web. Therefore, building effective keyword searching techniques can have an extensive impact since it allows non-professional users to access large amounts of information stored in structured repositories through simple keyword-based query interfaces. This revolutionises the paradigm of searching for data since users are offered access to structured data in a similar manner to the one they already use for documents. To build a successful, unified and effective solution, the action “semantic KEYword-based Search on sTructured data sOurcEs” (KEYSTONE) promoted synergies across several disciplines, such as semantic data management, the Semantic Web, information retrieval, artificial intelligence, machine learning, user interaction, interface design, and natural language processing. This paper describes the main achievements of this COST Action

Keymantic: Semantic keyword-based searching in data integration systems

Author: Bergamaschi Sonia
Data Intensive Systems
Domnori Elton
Guerra Francesco
Lado Raquel Trillo
Orsini Mirko
Sub Data Intensive Systems
Velegrakis Yannis
Publication venue: 'VLDB Endowment'
Publication date: 01/09/2010
Field of study

We propose the demonstration of Keymantic, a system for keyword-based searching in relational databases that does not require a-priori knowledge of instances held in a database. It finds numerous applications in situations where traditional keyword-based searching techniques are inapplicable due to the unavailability of the database contents for the construction of the required indexes

Mining Dense Subgraphs with Similar Edges

Author: Data Intensive Systems
Gionis Aristides
Hutter Frank
Kersting Kristian
Lijffijt Jefrey
Preti Giulia
Rozenshtein Polina
Sub Data Intensive Systems
Valera Isabel
Velegrakis Yannis
Publication venue
Publication date: 01/01/2021
Field of study

When searching for interesting structures in graphs, it is often important to take into account not only the graph connectivity, but also the metadata available, such as node and edge labels, or temporal information. In this paper we are interested in settings where such metadata is used to define a similarity between edges. We consider the problem of finding subgraphs that are dense and whose edges are similar to each other with respect to a given similarity function. Depending on the application, this function can be, for example, the Jaccard similarity between the edge label sets, or the temporal correlation of the edge occurrences in a temporal graph. We formulate a Lagrangian relaxation-based optimization problem to search for dense subgraphs with high pairwise edge similarity. We design a novel algorithm to solve the problem through parametric min-cut [15, 17], and provide an efficient search scheme to iterate through the values of the Lagrangian multipliers. Our study is complemented by an evaluation on real-world datasets, which demonstrates the usefulness and efficiency of the proposed approach

Utrecht University Repository

Mining Dense Subgraphs with Similar Edges

Author: Data Intensive Systems
Gionis Aristides
Hutter Frank
Kersting Kristian
Lijffijt Jefrey
Preti Giulia
Rozenshtein Polina
Sub Data Intensive Systems
Valera Isabel
Velegrakis Yannis
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

Graph-Query Suggestions for Knowledge Graph Exploration

Author: Data Intensive Systems
Huang Yennun
King Irwin
Lissandrini Matteo
Liu Tie-Yan
Mottin Davide
Palpanas Themis
Sub Data Intensive Systems
van Steen Maarten
Velegrakis Yannis
Publication venue
Publication date: 20/04/2020
Field of study

We consider the task of exploratory search through graph queries on knowledge graphs. We propose to assist the user by expanding the query with intuitive suggestions to provide a more informative (full) query that can retrieve more detailed and relevant answers. To achieve this result, we propose a model that can bridge graph search paradigms with well-established techniques for information-retrieval. Our approach does not require any additional knowledge from the user and builds on principled language modelling approaches. We empirically show the effectiveness and efficiency of our approach on a large knowledge graph and how our suggestions are able to help build more complete and informative queries

Utrecht University Repository

Graph-Query Suggestions for Knowledge Graph Exploration

Author: Data Intensive Systems
Huang Yennun
King Irwin
Lissandrini Matteo
Liu Tie-Yan
Mottin Davide
Palpanas Themis
Sub Data Intensive Systems
van Steen Maarten
Velegrakis Yannis
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 20/04/2020
Field of study

Are knowledge graph embedding models biased, or is it the data that they are trained on?

Author: Chekol Mel
ICON - Media and Performance Studies
Radstok Wessel
Schaefer Mirko
Sub Data Intensive Systems
Publication venue
Publication date: 01/10/2021
Field of study

Recent studies on bias analysis of knowledge graph (KG) embedding models focus primarily on altering the models such that sensitive features are dealt with differently from other features. The underlying implication is that the models cause bias, or that it is their task to solve it. In this paper we argue that the problem is not caused by the models but by the data, and that it is the responsibility of the expert to ensure that the data is representative for the intended goal. To support this claim, we experiment with two different knowledge graphs and show that the bias is not only present in the models, but also in the data. Next, we show that by adding new samples to balance the distribution of facts with regards to specifc sensitive features, we can reduce the bias in the models

Personalized page rank on knowledge graphs: Particle Filtering is all you need!

Author: Bohm Alexander
Bonifati Angela
Data Intensive Systems
Fletcher George
Gallo Denis
Khan Arijit
Lissandrini Matteo
Olteanu Dan
Sub Data Intensive Systems
Vaz Salles Marcos Antonio
Velegrakis Yannis
Yang Bin
Zhou Yongluan
Publication venue
Publication date: 01/01/2020
Field of study

Graphs are everywhere. Personalized Page Rank (PPR) is a particularly important task to support search and exploration within such datasets. PPR computes the proximity between query nodes and other nodes in the graph. This is used, among others, for entity exploration, query expansion, and product recommendation. Graph databases are used for storing knowledge graphs. Unfortunately, the exact computation of PPR is computationally expensive. While different solutions have been proposed to compute PPR values with high precision, these are extremely complex to implement, and in some cases require heavy preprocessing. In this work, we sustain that a better approach exists: particle filtering. Particle filtering methods produce ranks with sufficient precision while exploiting what graph databases architectures are already optimized for: navigating local connections. We present the implementation of such an approach in a popular commercial database and show how this outperforms the already implemented functionality. With this, we aim to motivate future research to optimize and improve upon this research direction