Search CORE

2 research outputs found

Query Rewriting for Incremental Continuous Query Evaluation in HIFUN

Author: Dimitris Plexousakis
Haridimos Kondylakis
Nicolas Spyratos
Petros Zervoudakis
Publication venue: 'MDPI AG'
Publication date: 08/05/2021
Field of study

HIFUN is a high-level query language for expressing analytic queries of big datasets, offering a clear separation between the conceptual layer, where analytic queries are defined independently of the nature and location of data, and the physical layer, where queries are evaluated. In this paper, we present a methodology based on the HIFUN language, and the corresponding algorithms for the incremental evaluation of continuous queries. In essence, our approach is able to process the most recent data batch by exploiting already computed information, without requiring the evaluation of the query over the complete dataset. We present the generic algorithm which we translated to both SQL and MapReduce using SPARK; it implements various query rewriting methods. We demonstrate the effectiveness of our approach in temrs of query answering efficiency. Finally, we show that by exploiting the formal query rewriting methods of HIFUN, we can further reduce the computational cost, adding another layer of query optimization to our implementation

Multidisciplinary Digital Publishing Institute

Αυξητική αποτίμηση συνεχών αναλυτικών επερωτήσεων βασιζόμενοι σε μια γλώσσα επερωτήσεων υψηλού επιπέδου

Author: Ζερβουδάκης Πέτρος Ν.
Publication venue
Publication date: 27/03/2020
Field of study

Η διαδικασία ανάλυσης δεδομένων έχει λάβει σημαντική προσοχή τα τελευταία χρόνια καθώς τεράστιες ποσότητες δεδομένων παράγονται καθημερινά από διάφορες πηγές. Η ανάλυση αυτών των τεράστιων δεδομένων αποτελεί ένα ενδιαφέρον αλλά και δύσκολο έργο και απαιτεί νέες μορφές επεξεργασίας ώστε να είναι εφικτή η λήψη αποφάσεων, η ανακάλυψη γνώσεων και η βελτίωση των διαδικασιών. Επιπλέον, εκτός από τον συνεχώς αυξανόμενο όγκο τους, τα σύνολα δεδομένων αλλάζουν συνεχώς, και ως εκ τούτου, τα αποτελέσματα σε συνεχόμενα ερωτήματα πρέπει να ενημερώνονται σε σύντομα χρονικά διαστήματα. Σε αυτή την εργασία, αντιμετωπίζουμε το πρόβλημα της αποτίμησης συνεχών ερωτημάτων σε μεγάλες ροές δεδομένων που αλλάζουν συχνά. Προς αυτή την κατεύθυνση, υιοθετούμε την HIFUN, μια γλώσσα ερωτημάτων υψηλού επιπέδου, που προτείνεται για την έκφραση αναλυτικών ερωτημάτων σε μεγάλα σύνολα δεδομένων. Η HIFUN προσφέρει ένα σαφή διαχωρισμό μεταξύ του εννοιολογικού επιπέδου, όπου τα αναλυτικά ερωτήματα ορίζονται ανεξάρτητα από τη φύση και τη θέση των δεδομένων, και το φυσικό επίπεδο όπου τα ερωτήματα αυτά αποτιμώνται, εκφράζοντας τα είτε ως MapReduce διαδικασίες είτε ως SQL ερωτήματα υποστηρίζοντας έτσι διαφορετικούς τύπους δεδομένων. Χρησιμοποιώντας τη HIFUN, σχεδιάζουμε έναν αλγόριθμο για την αυξητική αποτίμηση συνεχών ερωτημάτων, επεξεργάζοντας μόνο το πιο πρόσφατο διαμέρισμα δεδομένων και εκμεταλλευόμενοι τις ήδη υπολογισμένες πληροφορίες, χωρίς να απαιτείται η αποτίμηση του ερωτήματος πάνω από το πλήρες σύνολο δεδομένων. Στη συνέχεια, μεταφράζουμε τον γενικό αλγόριθμο σε SQL και MapReduce χρησιμοποιώντας το SPARK, εκμεταλλεύοντας τις μεθόδους επανεγγραφής ερωτημάτων που παρέχονται από τη HIFUN. Χρησιμοποιώντας ένα συνθετικό σύνολο δεδομένων, επιδεικνύουμε την αποτελεσματικότητα της προσέγγισης μας στην επίτευξη της απόδοσης αποτίμησης της επερώτησης. Τέλος, αποδεικνύουμε ότι υιοθετώντας τις επίσημες μεθόδους επανεγγραφής επερωτήσεων της HIFUN, επιτυγχάνουμε την περαιτέρω μείωση του υπολογιστικού κόστους, προσθέτοντας άλλο ένα επίπεδο βελτιστοποίησης των ερωτημάτων στην υλοποίηση μας.Data analytics have received a significant attention in recent years, as huge amounts of data is generated each day from various sources. Analysis of these massive data poses an interesting but challenging task and requires new forms of processing to enable enhanced decision making, insight discovery and process optimization. In addition, besides their ever increasing volume, data sets change frequently, and as such, results to continuous queries have to be updated at short intervals. In this thesis, we address the problem of evaluating continuous queries over big data streams that are frequently updated. To this end, we adopt HIFUN, a high-level query language, proposed for expressing analytic queries over big data sets. HIFUN offers a clear separation between the conceptual layer, where analytic queries are defined independently of the nature and location of data, and the physical layer where queries are evaluated, by encoding them as map-reduce jobs or as SQL group-by queries, thus supporting different types of data set formats. Using HIFUN, we design an algorithm for incremental evaluation of continuous queries, processing only the most recent data batch, and exploiting already computed information, without requiring the evaluation of the query over the complete data set. Subsequently, we translate the generic algorithm to both SQL and MapReduce using SPARK, exploiting the query rewriting methods provided by HIFUN. Using a synthetic data set, we demonstrate the effectiveness of our approach in achieving query answering efficiency. Finally, we show that by exploiting the formal query rewriting methods of HIFUN, we can further reduce the computational cost, adding another layer of query optimization in our implementation

E-Locus