Search CORE

17 research outputs found

Hercules Against Data Series Similarity Search

Author: Benbrahim Houda
Echihabi Karima
Fatourou Panagiota
Palpanas Themis
Zoumpatianos Kostas
Publication venue: 'VLDB Endowment'
Publication date: 26/12/2022
Field of study

We propose Hercules, a parallel tree-based technique for exact similarity search on massive disk-based data series collections. We present novel index construction and query answering algorithms that leverage different summarization techniques, carefully schedule costly operations, optimize memory and disk accesses, and exploit the multi-threading and SIMD capabilities of modern hardware to perform CPU-intensive calculations. We demonstrate the superiority and robustness of Hercules with an extensive experimental evaluation against state-of-the-art techniques, using many synthetic and real datasets, and query workloads of varying difficulty. The results show that Hercules performs up to one order of magnitude faster than the best competitor (which is not always the same). Moreover, Hercules is the only index that outperforms the optimized scan on all scenarios, including the hard query workloads on disk-based datasets. This paper was published in the Proceedings of the VLDB Endowment, Volume 15, Number 10, June 2022

arXiv.org e-Print Archive

The periodic table of data structures

Author: Athanassoulis Manos
Dayan Niv
Guo Demi
Hentschel Brian
Idreos Stratos
Kester Michael S.
Maas Lukas M.
Qin Wilson
Sun Yiyou
Wasay Abdul
Zoumpatianos Kostas
Publication venue
Publication date: 01/01/2018
Field of study

http://sites.computer.org/debull/A18sept/p64.pdfPublished versio

Boston University Institutional Repository (OpenBU)

Indexing for Very Large Data Series Collections

Author: Zoumpatianos Konstantinos
Publication venue: University of Trento
Publication date: 28/11/2016
Field of study

Data series are a prevalent data type that has attracted lots of interest in recent years. Specifically, there has been an explosive interest towards the analysis of large volumes of data series in many different domains. This is both in businesses (e.g., in mobile applications) and in sciences e.g., in biology). In several time-critical scenarios, analysts need to be able to explore these data as soon as they become available, which is not currently possible for very large data series collections. In this thesis, we present the first adaptive indexing mechanism, specifically tailored to solve the problem of indexing and querying very large data series collections. The main idea is that instead of building the complete index over the complete data set up-front and querying only later, we interactively and adaptively build parts of the index, only for the parts of the data on which the users pose queries. The contents and the resolution of the index are purely driven by query patterns; the more queries that arrive, the more data series are indexed and at a higher resolution. Adaptive indexing significantly outperforms previous solutions, gracefully handling large data series collections, reducing the data to query delay: by the time state-of-the-art indexing techniques finish indexing 1 billion data series (and before answering even a single query), our method has already answered 3 * 10^5 queries. At the same time, we present novel algorithms for both full indexing of data series collections, as well as for efficient exact query answering. Our algorithms perform efficient skip-sequential scans of the data, avoiding the need of costly random accesses on the disk. Moreover, up to this point very little attention has been paid to properly evaluating data series index structures, with most previous work relying solely on randomly selected data series to use as queries (with/without adding noise). In this thesis, we show that random workloads are inherently not suitable for the task at hand and we argue that there is a need for carefully generating a query workload. We define measures that capture the characteristics of queries, and we propose a method for generating workloads with the desired properties, that is, effectively evaluating and comparing data series summarizations and indexes. In our experimental evaluation, with carefully controlled query workloads, we shed light on key factors affecting the performance of nearest neighbor search in large data series collections. Finally, apart from ad hoc data exploration, we also investigate methods for the systematic analysis of very large data series collections, supporting business intelligence applications. We present techniques, which borrow ideas from Strategic Management, for a goal-oriented analysis of large collections of performance indicator data series. Such algorithms can additionally be sped up through the use of the index structures presented in this work

Unitn-eprints PhD

Enhancing the Collective Knowledge for the Engineering of Ontologies in Open and Socially Constructed Learning Spaces

Author: Kotis Konstantinos
Papasalouros Andreas
Pappas Nikolaos
Vouros George
Zoumpatianos Konstantinos
Publication venue: Journal of Universal Computer Science
Publication date: 01/01/2011
Field of study

The aim of this paper is to present a novel technological approach for enhancing the collective knowledge of communities of learners on the engineering of ontologies within a collaborative, open and socially constructed environment. The proposed technology aims at shaping information spaces into ontologies in a collaborative, communicative and learner-centered way during the ontology development life-cycle. The paper conjectures that such a collaborative environment can yield educational benefits, thus there is need to follow principles that apply in the Computer Supported Collaborative Learning (CSCL) paradigm. This work is mainly based on a collaborative and human-centered ontology engineering methodology and on a meta-ontology framework for developing ontologies, namely HCOME and HCOME-3O respectively. The integration of key technologies such as Semantic Wiki and Argumentation models with Ontology Engineering methodologies and tools serve as an enabler of learning spaces construction for different domain-specific information spaces in open settings. Inside these learning spaces innovative conceptualizations (both domain and development) are conceived, described by intertwined ontological meta-models following the HCOME-3O specifications for future reference and tutoring support. Such learning spaces support two types of ontology engineering courses: a) courses related to the know-how of shaping information spaces into ontologies (namely, the development knowledge) and b) courses related to the analysis of the domain itself (namely, the domain knowledge). The paper reports on the evaluation of the approach within a CSCL setting in Ontology Engineering, using the integrated set of tools and the framework that have been developed for the collaborative engineering of ontologies

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

ARPHA OAI-PMH Endpoint

ARPHA Preprints

Data Series Management (Dagstuhl Seminar 19282)

Author: Bagnall Anthony
Cole Richard L.
Palpanas Themis
Zoumpatianos Kostas
Publication venue: Dagstuhl Reports. Dagstuhl Reports, Volume 9, Issue 7
Publication date: 01/01/2019
Field of study

We now witness a very strong interest by users across different domains on data series (a.k.a. time series) management. It is not unusual for industrial applications that produce data series to involve numbers of sequences (or subsequences) in the order of billions (i.e., multiple TBs). As a result, analysts are unable to handle the vast amounts of data series that they have to manage and process. The goal of this seminar is to enable researchers and practitioners to exchange ideas and foster collaborations in the topic of data series management and identify the corresponding open research directions. The main questions answered are the following: i) What are the data series management needs across various domains and what are the shortcomings of current systems, ii) How can we use machine learning to optimize our current data systems, and how can these systems help in machine learning pipelines? iii) How can visual analytics assist the process of analyzing big data series collections? The seminar focuses on the following key topics related to data series management: 1)Data series storage and access paterns, 2) Query optimization, 3) Machine learning and data mining for data serie, 4) Visualization for data series exploration, 5) Applications in multiple domains

Dagstuhl Research Online Publication Server

RINSE

Author: Agrawal R.
Berchtold S.
Camerra A.
Chan K.-P.
Idreos S.
Idreos S.
Idreos S.
Idreos S.
Keogh E.
Lin J.
Zoumpatianos K.
Zoumpatianos K.
Publication venue: 'VLDB Endowment'
Publication date
Field of study

Crossref