Search CORE

244 research outputs found

Advances in Databases and Information Systems - 26th European Conference, ADBIS 2022, Turin, Italy, September 5-8, 2022, Proceedings

Author: Cerquitelli Tania
Chiusano SILVIA ANNA
Silvia Chiusano Tania Cerquitelli, Robert Wrembel (Eds.)
Wrembel Robert
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2022
Field of study

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Automatically configuring parallelism for hybrid layouts

Author: Abelló Gamazo Alberto
Lehner Wolfgang
Munir Rana Faisal
Romero Moral Óscar
Thiele Maik
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Distributed processing frameworks process data in parallel by dividing it into multiple partitions and each partition is processed in a separate task. The number of tasks is always created based on the total file size. However, this can lead to launch more tasks than needed in the case of hybrid layouts, because they help to read less data for certain operations (i.e., projection, selection). The over-provisioning of tasks may increase the job execution time and induce significant waste of computing resources. The latter due to the fact that each task introduces extra overhead (e.g., initialization, garbage collection, etc.). To allow a more efficient use of resources and reduce the job execution time, we propose a cost-based approach that decides the number of tasks based on the data being read. The proposed cost-model can be utilized in a multi-objective approach to decide both the number of tasks and number of machines for execution.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Databases and Information Systems in the AI Era: Contributions from ADBIS, TPDL and EDA 2020 Workshops and Doctoral Consortium

Author: Bellatreche Ladjel
Bentayeb Fadila
Bieliková Mária
Boussaid Omar
Catania Barbara
Ceravolo Paolo
Demidova Elena
Gomez Lopez Maria Teresa
Halfeld Ferrari Mirian
Kordić Slavica
Luković Ivan
Manghi Paolo
Mannocci Andrea
Osborne Francesco
Papatheodorou Christos
Ristić Sonja
Romero Oscar
S. Hara Carmem
Sacharidis Dimitris
Salatino Angelo
Talens Guilaine
van Keulen Maurice
Vergoulis Thanasis
Zumer Maja
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Research on database and information technologies has been rapidly evolving over the last couple of years. This evolution was lead by three major forces: Big Data, AI and Connected World that open the door to innovative research directions and challenges, yet exploiting four main areas: (i) computational and storage resource modeling and organization; (ii) new programming models, (iii) processing power and (iv) new applications that emerge related to health, environment, education, Cultural Heritage, Banking, etc. The 24th East-European Conference on Advances in Databases and Information Systems (ADBIS 2020), the 24th International Conference on Theory and Practice of Digital Libraries (TPDL 2020) and the 16th Workshop on Business Intelligence and Big Data (EDA 2020), held during August 25–27, 2020, at Lyon, France, and associated satellite events aimed at covering some emerging issues related to database and information system research in these areas. The aim of this paper is to present such events, their motivations, and topics of interest, as well as briefly outline the papers selected for presentations. The selected papers will then be included in the remainder of this volume

Crossref

AIR Universita degli studi di Milano

Open Research Online (The Open University)

Archivio istituzionale della ricerca - Università di Genova

University of Twente Research Information

idUS. Depósito de Investigación Universidad de Sevilla

New Trends in Database and Information Systems - ADBIS 2022 Short Papers, Doctoral Consortium and Workshops: DOING, K-GALS, MADEISD, MegaData, SWODCH, Turin, Italy, September 5-8, 2022, Proceedings

Author: Catania Barbara
Cerquitelli Tania
Chiusano SILVIA ANNA
Norvaag Kjetil
Silvia Chiusano Tania Cerquitelli, Robert Wrembel, Kjetil Nørvåg, Barbara Catania, Genoveva Vargas-Solar, Ester Zumpano (Eds.)
Vargas-Solar Genoveva
Wrembel Robert
Zumpano Ester
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2022
Field of study

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Point set signature and algorithm of classifications on its basis

Author: Дашкевич Андрей Александрович
Publication venue: 'National Technical University Kharkiv Polytechnic Institute'
Publication date: 01/01/2018
Field of study

На данный момент существует большое количество задач по автоматизированной обработке многомерных данных, например, классификация, кластеризация, прогнозирование, задачи управления сложными объектами. Соответственно, возникает необходимость в развитии математического и алгоритмического обеспечения для решения возникающих задач. Целью исследования является развитие алгоритмов классификации точечных множеств на основе их пространственного распределения. В работе предлагается рассматривать данные как точки в многомерном метрическом пространстве. В работе рассмотрены подходы к описанию характеристик точечных множеств в пространствах высокой размерности и предлагается подход к описанию точечного множества на основе сигнатур, которые представляют собой характеристику заполненности точечного множества на основе расширения понятия пространственного хеширования. Обобщенный подход к вычислению сигнатур точечных множеств заключается в разбиении пространства, занимаемого множеством на регулярную сетку с помощью метода пространственного хеширования, вычисления геометрических характеристик множества в полученных ячейках и определения наиболее заполненных ячеек по каждому из пространственных измерений. Предлагается новый подход к классификации на основе сигнатур множества, который заключается в нахождении сигнатур для точек с известным значением принадлежности к некоторым классам, а для новых точек вычисляется расстояние от хеша точки до сигнатуры каждого из известных множеств, на основе чего определяется наиболее вероятный класс точки. В качестве используемых метрик предлагаются Евклидово расстояние и метрика городских кварталов. В работе проведён сравнительный анализ используемых метрик с точки зрения точности классификации. Преимуществами предложенного подхода являются простота вычислений и высокая степень точности классификации для равномерно распределенных точек. Представленный алгоритм реализован в виде программного приложения на языке Python с использованием библиотеки NumPy. Также рассмотрены варианты использования предложенного подхода для задач с не числовыми данными, такими как строковые и булевы значения. Для таких данных предложено использовать метрику Хэмминга, проведённые эксперименты показали работоспособность алгоритма для таких типов данных.There are many unsolved problems in the field of automatic multi-dimensional data processing, for example, classification, clustering, regression, and control of complex objects. This leads to the need of development of mathematical and algorithmical background for such problems. In our research we aim to development of classification algorithms of point sets based on their spatial distribution. We propose to consider data as points in multi-dimensional metric space. The approaches to describe point set features in high dimensional spaces are viewed. The algorithm of describing of point set based on their signatures, that are spatial distribution of point set is considered. In our approach we extend spatial hashing technique. The generalized method of computation of point set signatures is to split space, occupied by point set into regular grid by the spatial hashing algorithm, then we evaluate geometrical characteristics of the set in cells of the grid and define cells, that contain most of the points for the all of coordinate axis. The new approach to classification by means of point set signatures is developed that is to find signatures of known points with the classes defined and then we compute spatial hashes for unknown points and their distance to the signatures of classes. The probable class of the tested point is defined by the minimal distance among all distances to each signature. To define distance in our approach we use Manhattan and Euclidean metric. The comparative study of impact of metrics used to the classification error is provided. The main advantage of our method is computation simplicity and low classification error for evenly distributed points. Prototype implementation of our algorithm was written in order to test this algorithm for practical classification applications. The implementation was coded in Python with use NumPy library. The use of our algorithm to the classification of non-numerical data such as texts and booleans is viewed. For such data types we propose use of Hamming distance and experiments done show practical viability for such data types

Electronic National Technical University "Kharkiv Polytechnic Institute" Institutional Repository (eNTUKhPIIR)

Distributed Holistic Clustering on Linked Data

Author: A Saeedi
A-C Ngonga Ngomo
E Rahm
I Megdiche
K Hildebrandt
M Nentwig
M Nentwig
Publication venue
Publication date: 30/08/2017
Field of study

Link discovery is an active field of research to support data integration in the Web of Data. Due to the huge size and number of available data sources, efficient and effective link discovery is a very challenging task. Common pairwise link discovery approaches do not scale to many sources with very large entity sets. We here propose a distributed holistic approach to link many data sources based on a clustering of entities that represent the same real-world object. Our clustering approach provides a compact and fused representation of entities, and can identify errors in existing links as well as many new links. We support a distributed execution of the clustering approach to achieve faster execution times and scalability for large real-world data sets. We provide a novel gold standard for multi-source clustering, and evaluate our methods with respect to effectiveness and efficiency for large data sets from the geographic and music domains

arXiv.org e-Print Archive

Crossref

Similarity-Based Processing of Motion Capture Data

Author: Butepage Judith
Du Yong
Fang Y.
Jain Ashesh
Li Chaolong
Ma Shugao
Poppe Ronald
Sedmidubsky Jan
Sedmidubsky Jan
Sedmidubsky Jan
Sedmidubsky Jan
Zezula Pavel
Zhu Wentao
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

Motion capture technologies digitize human movements by tracking 3D positions of specific skeleton joints in time. Such spatio-temporal data have an enormous application potential in many fields, ranging from computer animation, through security and sports to medicine, but their computerized processing is a difficult problem. The recorded data can be imprecise, voluminous, and the same movement action can be performed by various subjects in a number of alternatives that can vary in speed, timing or a position in space. This requires employing completely different data-processing paradigms compared to the traditional domains such as attributes, text or images. The objective of this tutorial is to explain fundamental principles and technologies designed for similarity comparison, searching, subsequence matching, classification and action detection in the motion capture data. Specifically, we emphasize the importance of similarity needed to express the degree of accordance between pairs of motion sequences and also discuss the machine-learning approaches able to automatically acquire content-descriptive movement features. We explain how the concept of similarity together with the learned features can be employed for searching similar occurrences of interested actions within a long motion sequence. Assuming a user-provided categorization of example motions, we discuss techniques able to recognize types of specific movement actions and detect such kinds of actions within continuous motion sequences. Selected operations will be demonstrated by on-line web applications

Crossref

Univerzitní repozitář Masarykovy univerzity