Search CORE

18 research outputs found

Scalable, workload aware indexing and query processing over unstructured data

Author: Papailiou Nikolaos
Παπαηλίου Νικόλαος
Publication venue: 'National Documentation Centre (EKT)'
Publication date: 01/01/2016
Field of study

The pace at which data are described, queried and exchanged, using unstructured data representations, is constantly growing. Semantic Web technologies have emerged as one of the prevalent unstructured data sources. Utilizing the RDF description model, they attempt to encode and make openly available various World Wide Web datasets. Therefore, the constantly increasing volume of available data calls for efficient and scalable solutions for their management. In this thesis, we devise distributed algorithms and techniques for data management, which can scale and handle huge datasets. We introduce H2RDF+, a fully distributed RDF store that combines the MapReduce processing framework with a NoSQL distributed database. Creating 6 indexes over HBASE tables, H2RDF+ can process complex queries making adaptive decisions on both the join ordering and the join execution. Joins are executed using in distributed or centralized resources, depending on their cost. Furthermore, we present a novel system that addresses graph-based, workload-adaptive indexing of large RDF graphs by caching SPARQL query results. At the heart of the system lies a SPARQL query canonical labelling algorithm that is used to uniquely index and reference SPARQL query graphs as well as their isomorphic forms. We integrate our canonical labelling algorithm with a dynamic programming planner in order to generate the optimal join execution plan, examining the utilization of both primitive triple indexes and cached query results. By monitoring cache requests, our system is able to identify and cache SPARQL queries that, even if not explicitly issued, greatly reduce the average response time of a workload. The proposed cache is modular in design, allowing integration with different RDF stores. Another ever-increasing source of unstructured data is the Internet traffic. Network datasets collected at large networks such as Internet Exchange Points (IXPs) can be in the order of Terabytes per hour. To handle analytics over such datasets, we present Datix, a fully decentralized, open-source analytics system for network traffic data that relies on smart partitioning storage schemes to support fast join algorithms and efficient execution of filtering queries. In brief, Datix manages to efficiently answer queries within minutes compared to more than 24 hours processing when executing existing Python-based code in single node setups. Datix also achieves nearly 70% speedup compared to baseline query implementations of popular big data analytics engines such as Hive and Shark.Ο ρυθμός με τον οποίο τα δεδομένα περιγράφονται, ερωτώνται και ανταλλάσσονται χρησιμοποιώντας μη δομημένες αναπαραστάσεις δεδομένων συνεχώς αυξάνεται. Μια από τις κυριότερες πηγές τέτοιων δεδομένων είναι οι τεχνολογίες Σημασιολογικού Ιστού, οι οποίες χρησιμοποιούν το RDF μοντέλο για την αναπαράσταση των δεδομένων του παγκόσμιου ιστού. Η μεγάλη αύξηση των διαθέσιμων RDF δεδομένων επιβάλει την εύρεση αποδοτικών και κλιμακώσιμων λύσεων για την διαχείρισή τους. Σε αυτή την διατριβή χρησιμοποιούμε κατανεμημένες μεθόδους διαχείρισης των RDF δεδομένων, οι οποίες μπορούν να κλιμακώσουν σε απεριόριστα μεγάλο αριθμό δεδομένων. Παρουσιάζουμε το H2RDF+, μια πλήρως κατανεμημένη βάση αποθήκευσης RDF δεδομένων, η οποία συνδυάζει το πλαίσιο επεξεργασίας του MapReduce με μια κατανεμημένη NoSQL βάση. Δημιουργώντας 6 διαφορετικά ευρετήρια δεδομένων με HBASE πίνακες, το H2RDF μπορεί να επεξεργαστεί σύνθετα ερωτήματα με κλιμακώσιμο τρόπο κάνοντας προσαρμοστικές αποφάσεις για την σειρά και τον τρόπο εκτέλεσης των συνενώσεων. Οι συνενώσεις εκτελούνται κατανεμημένα ή κεντρικά, σε έναν υπολογιστή, ανάλογα με το κόστος τους. Επιπλέον, παρουσιάζουμε ένα καινοτόμο σύστημα που στοχεύει στην προσαρμοστική και βασισμένη στα ερωτήματα που εκτελούνται, δεικτοδότηση RDF γράφων με τη χρήση μιας κρυφής μνήμης για αποτελέσματα SPARQL ερωτημάτων. Στην καρδιά του συστήματος βρίσκεται ένας αλγόριθμος που παράγει κανονικοποιημένες ετικέτες για SPARQL ερωτήματα και χρησιμοποιείται για την μονοσήμαντη δεικτοδότηση και αναφορά σε SPARQL υπογράφους, αντιμετωπίζοντας το πρόβλημα των ισομορφικών γράφων. Ένας αλγόριθμος δυναμικού προγραμματισμού χρησιμοποιείται για την εύρεση του βέλτιστου πλάνου εκτέλεσης των ερωτημάτων, εξετάζοντας την αξιοποίηση τόσο των βασικών RDF ευρετηρίων καθώς και των προσωρινά αποθηκευμένων αποτελεσμάτων SPARQL ερωτημάτων. Με την παρακολούθηση των αιτημάτων στην κρυφή μνήμη, το σύστημά μας είναι σε θέση να προσδιορίσει και να τοποθετήσει στην κρυφή μνήμη ερωτήματα που, αν και δεν έχουν ζητηθεί, μπορούν να μειώσουν τους χρόνους εκτέλεσης των ερωτημάτων των χρηστών. Η προτεινόμενη κρυφή μνήμη είναι επεκτάσιμη, επιτρέποντας την ενσωμάτωσή της σε πολλαπλές RDF βάσεις δεδομένων.Μια ακόμα πηγή συνεχώς αυξανόμενης ποσότητας δεδομένων είναι και η κίνηση δεδομένων στο Internet. Αυτό γίνεται περισσότερο εμφανές σε κόμβους ουδέτερης διασύνδεσης (IXPs) από τους οποίους πλέον διέρχονται έως και Terabytes δεδομένων ανά ώρα. Για την αποδοτική διαχείριση και επεξεργασία τέτοιων δεδομένων παρουσιάζουμε το Datix, ένα πλήρως κατανεμημένο, ανοιχτού κώδικα σύστημα ανάλυσης δεδομένων κίνησης δικτύων. Το Datix βασίζεται σε τεχνικές έξυπνης κατανομής των δεδομένων, οι οποίες μπορούν να χρησιμοποιηθούν για την υποστήριξη γρήγορων συνενώσεων και αποδοτικών λειτουργιών επιλογής δεδομένων. Σαν αποτέλεσμα, το Datix πετυχαίνει να εκτελεί σε λίγα λεπτά ερωτήματα που απαιτούσαν έως και μέρες χρησιμοποιώντας τις υπάρχουσες τεχνολογίες κεντρικής επεξεργασίας. Επίσης παρουσιάζει έως και 70% μείωση χρόνου εκτέλεσης σε σχέση με αντίστοιχες δημοφιλείς πλατφόρμες κατανεμημένης επεξεργασίας, όπως το Hive και το Shark

Hellenic National Archive of Doctoral Dissertations

The arched bridge of Karytaina: documentation, structural assessment & rehabilitation proposals

Author: Papailiou Nikolaos
Παπαηλίου Νικόλαος
Publication venue
Publication date: 10/05/2016
Field of study

Εθνικό Μετσόβιο Πολυτεχνείο. Μεταπτυχιακή εργασία. Διεπιστημονικό - Διατμηματικό Πρόγραμμα Μεταπτυχιακών Σπουδών (Δ.Π.Μ.Σ.) "Συντήρηση και Αποκατάσταση Ιστορικών Κτηρίων και Συνόλων - Προστασία Μνημείων (Κατ. Α')

DSpace at NTUA

Scalable, workload aware indexing and query processing over unstructured data

Author: Papailiou Nikolaos
Παπαηλίου Νικόλαος
Publication venue
Publication date: 01/01/2016
Field of study

DSpace at NTUA

Hellenic National Archive of Doctoral Dissertations

Restauration of the Church Panagia Pantovassilissa in Triglia

Author: Papailiou Nikolaos Th.
Παπαηλίου Νικόλαος Θ.
Publication venue
Publication date: 30/10/2014
Field of study

86 σ. Περιλαμβάνει σχέδιαΕθνικό Μετσόβιο Πολυτεχνείο--Μεταπτυχιακή Εργασία. Διεπιστημονικό-Διατμηματικό Πρόγραμμα Μεταπτυχιακών Σπουδών (Δ.Π.Μ.Σ.) “Δομοστατικός Σχεδιασμός και Ανάλυση των Κατασκευών”Αντικείμενο της μεταπτυχιακής εργασίας αποτελεί η τεκμηρίωση της υφιστάμενης κατάστασης, η διάγνωση της παθολογίας και η πρόταση αποκατάστασης του Ιερού Ναού Παναγίας Παντοβασίλισσας στην περιοχή Τρίγλια, επαρχία της Προύσας, στην Τουρκία. Χρονολογείται στον 13ο αιώνα με προσθήκη κατά το 1883, έχει μορφή τρουλαίας βασιλικής με εκτενή προσθήκη σε ορθογωνική κάτοψη, και η κάλυψή του γίνεται με ημικυλινδρικούς θόλους και τρούλο που σήμερα εδράζονται μέσω τόξων σε μαρμάρινους κίονες. Σήμερα, σώζεται σχεδόν το σύνολο της αρχικής φάσης, μέρος της επέκτασης και μικρά τμήματα από αντηρίδες που προσετέθησαν μετά από επισκευές. Ο Ναός βρίσκεται σήμερα σε κακή κατάσταση από δομοστατικής απόψεως. Παρατηρούνται σοβαρές ρηγματώσεις και αρκετά μεγάλες και μη αναστρέψιμες κατά τόπους παραμορφώσεις. Μικρό τμήμα της θολοδομίας και οι οριζόντιοι φορείς της επέκτασης έχουν καταρρεύσει από ανθρωπογενείς παρεμβάσεις, όπως επίσης και το κωδωνοστάσιο. Η πολυετής εγκατάλειψη και έκθεση στις κλιματολογικές συνθήκες έχουν επιδεινώσει τα αμιγώς δομικά προβλήματα. Πέραν των ανθρωπογενών παρεμβάσεων, η γενική παθολογία του δομήματος παραπέμπει σε αστοχίες λόγω σεισμικών δράσεων, επιδεινούμενων από μη ευμενείς συνθήκες θεμελίωσης. Οι υπολογισμοί επιβεβαιώνουν σε μεγάλο βαθμό την εμφανιζόμενη παθολογία του δομήματος και τη διάγνωση της. Οι προτάσεις επεμβάσεων διέπονται από τις αρχές της αναστηλωτικής δεοντολογίας και περιλαμβάνουν αποκατάσταση του κτηρίου, επισκευή των βλαβών και ενίσχυση του φέροντος οργανισμού. Προκειμένου να διερευνηθεί η συμπεριφορά του κτηριακού συγκροτήματος έναντι κατακορύφων και οριζοντίων φορτίων, έγινε προσομοίωσή της μέσω πεπερασμένων στοιχείων. Οι έλεγχοι επάρκειας πραγματοποιήθηκαν με τη χρήση λογιστικών φύλλων, εξετάζοντας υπερβάσεις της φέρουσας ικανότητας σε όρους δυνάμεων και ροπών. Προκειμένου τα μεγέθη εδαφικών επιταχύνσεων για τα οποία εξετάσθηκε ο φορέας να συσχετίζονται με πιθανοτικά μεγέθη κρίθηκε σκόπιμη η εκτίμηση της καμπύλης σεισμικής επικινδυνότητας για την περιοχή ενδιαφέροντος, σύμφωνα με την πιθανοτική ανάλυση σεισμικού κινδύνου. Αναμένεται πως τα νέα δομικά μέλη θα ικανοποιούν τα κριτήρια των σύγχρονων κανονισμών, ενώ ο υφιστάμενος φορέας από τοιχοποιία θα είναι ικανός να ανταποκριθεί άνευ αστοχιών σε σεισμούς μετρίας εντάσεως και με ασφάλεια έναντι καταρρεύσεων για τους μεγίστους αναμενόμενους σεισμούς. Προκειμένου να γίνει αποδεκτή μια τέτοια προσέγγιση, θα πρέπει να ισχύει πολιτική μειωμένης επισκεψιμότητας στο μνημείο.The subject of the current thesis is the documentation of the existing status, the diagnose of the pathology and the proposal for the rehabilitation of the church Panagia Pantovassilissa which is located in Triglia, province of Bursa, Turkey. The church dates from the 13th century, whith an addition during 1883, and is a domed basilica with an extensive rectangular adition. The semicylindrical domes are constructed on top of marble columns. Today, the original phase is preserved almost totally, in contrast to the addition which is just partly saved. The church, from a structural point of view, is in a bad state. Serious cracks are observed as well as relatively great and non-reveversible deformations. A small part of the dome, the total of the horizontal bearing structure of the addition as well as the bell tower have collapsed under man-made interventions. The long abandonment and exposal to the climatic factors have deteriorated the existing problems. Besides the man-made interventions, the general pathology of the building indicates seismic actions, deteriorated from inadequate foundation conditions. The calculations verify the existing pathology, as well as its diagnose. The interventions are proposed according to the principles of the restauration ethics and concern reconstruction of the missing parts of the bulding, repair of the damages and overall strengthening of the bearing capacity. In order to study the structural behaviour of the building against horizontal and vertical loads, a model has been produced, using the method of finite elements. The checks have been performed using spreadsheets, examining the exceedance of the bearing capacity in terms of forcew and moments. In order to correlate the base acceleration values used for the design to probabilistic values an estimation of the seismic hazard curve has been performed, according to the PSHA (probabilistic seismic hazard analysis) methodology. The new structural members are expected to withstand satisfactorily the loads dictated by the modern standards. The masonry structure is expected to respond without damages for medium earthquakes whereas for design earthquakes the critical failure mechanisms are expected to take action but with safety against collapses. In order for such a safety status to be acceptable, a limited accessibility policy must be followed.Νικόλαος Θ. Παπαηλίο

DSpace at NTUA

Distributed storage and querying, of huge RDF data, using NoSQL and MapReduce

Author: Papailiou Nikolaos P.
Παπαηλίου Νικόλαος Π.
Publication venue
Publication date: 21/10/2011
Field of study

109 σ.Τα τελευταία χρόνια γίνονται μεγάλες προσπάθειες για την υλοποίηση του στόχου του Semantic Web. Διεθνείς οργανισμοί έχουν ορίσει πρότυπα για όλες τις λειτουργίες που θα πρέπει να εκτελούνται. Βασικό πρότυπο για την αποθήκευση και μεταφορά των δεδομένων είναι το RDF. Σύμφωνα με το RDF τα δεδομένα αποθηκεύονται στην μορφή των triples, subject-predicate-object. Η SparQL είναι η βασική γλώσσα με την οποία μπορούμε να κάνουμε ερωτήσεις και να επεξεργαζόμαστε μια RDF βάση δεδομένων. Το διαδίκτυο αναπτύσσεται συνεχώς και τα δεδομένα που περιέχονται σε αυτό αυξάνονται κάθε μέρα και περισσότερο. Αν θέλουμε να υλοποιήσουμε, λοιπόν, το στόχο του Semantic Web, πρέπει να δημιουργήσουμε συστήματα, τα οποία θα είναι σε θέση να χειριστούν το μεγάλο όγκο δεδομένων του διαδικτύου. Η εργασία μας στοχεύει στη δημιουργία ενός συστήματος αποθήκευσης και επερώτησης τέτοιων RDF δεδομένων, μεγάλου όγκου. Σύγχρονη τάση, στις βάσεις δεδομένων, αποτελούν οι NoSQL βάσεις, οι οποίες δεν βασίζονται στη γλώσσα SQL και είναι κυρίως column stores. Η HBase είναι μια τέτοια βάση η οποία είναι κατανεμημένη και αποθηκεύει τα δεδομένα της ταυτόχρονα σε πολλούς υπολογιστές. Έρευνες έχουν δείξει ότι, η HBase μπορεί να αποθηκεύσει τεράστιους πίνακες και να έχει αποδοτική πρόσβαση σε αυτούς. Το MapReduce είναι μια καινούργια τεχνική παραλληλοποίησης, που έχει κερδίσει τεράστιο έδαφος και χρησιμοποιείται, σε μεγάλο βαθμό, για την παραλληλοποίηση εργασιών. Δημιουργήσαμε, λοιπόν, ένα σύστημα αποθήκευσης των RDF δεδομένων σε 3 διαφορετικά index της HBase. Τα 3 index μας επιτρέπουν να απαντάμε αποδοτικά σε όλους τους συνδυασμούς ερωτημάτων SparQL. Για την εκτέλεση των ερωτημάτων SparQL, χρησιμοποιήσαμε άπληστο αλγόριθμο επιλογής του πλάνου εκτέλεσης των join. Ακόμα, υλοποιήσαμε MapReduce προγράμματα για την κατανεμημένη εκτέλεση των SparQL join. Χρησιμοποιήσαμε το MapReduce για την εισαγωγή των RDF δεδομένων στα index της HBase. Τέλος, δείχνουμε ότι το σύστημά μας είναι κλιμακώσιμο και μπορεί να ανταποκριθεί στον μεγάλο όγκο των δεδομένων.Recently, researchers are making great efforts to achieve the objective of the Semantic Web. International organizations have set standards for all the needed functionality. Basic standard for storing and transporting data is RDF. According to RDF, data is stored in the form of triples, subject-predicate-object. SparQL is the basic query language for processing an RDF database. Internet is growing continuously and the data contained in it, grow larger every day. Therefore, if we want to achieve the objective of Semantic Web, we must create systems that will be able to handle the large volume of Internet data. Our work aims to create a system for storing and querying, such, huge RDF datasets. Modern trend in the databases are the NoSQL bases, which do not implement SQL language and are mainly distributed column stores. HBase is such a base, which is distributed and stores data on multiple computers simultaneously. Studies have shown that HBase can store huge tables and provides efficient access to them. MapReduce is a new parallelization technique that has gained enormous ground and is used largely for the parallelization of several tasks. In this work, we created a system of storing RDF data in 3 different HBase indexes. The 3 index schema allows us to respond efficiently to all combinations of SparQL queries. To answer SparQL queries, we used a greedy algorithm for choosing the execution plan of joins. Furthermore, we implemented MapReduce jobs for distributed execution, of SparQL joins. We, also, used MapReduce jobs to insert the RDF data into the indexes of HBase. Finally, we show that our system is scalable and can meet the chalenge of huge RDF datasets.Νικόλαος Π. Παπαηλίο

DSpace at NTUA

H2RDF: Adaptive Query Processing on RDF Data in the Cloud.

Author: Dimitrios Tsoumakos
Ioannis Konstantinou
Nectarios Koziris
Nikolaos Papailiou
Publication venue
Publication date: 01/01/2012
Field of study

In this work we present H2RDF, a fully distributed RDF store that combines the MapReduce processing framework with a NoSQL distributed data store. Our system features two unique characteristics that enable efficient processing of both simple and multi-join SPARQL queries on virtually unlimited number of triples: Join algorithms that execute joins according to query selectivity to reduce processing; and adaptive choice among centralized and distributed (MapReduce-based) join execution for fast query responses. Our system efficiently answers both simple joins and complex multivariate queries and easily scales to 3 billion triples using a small cluster of 9 worker nodes. H2RDF outperforms state-of-the-art distributed solutions in multi-join and nonselective queries while achieving comparable performance to centralized solutions in selective queries. In this demonstration we showcase the system’s functionality through an interactive GUI. Users will be able to execute predefined or custom-made SPARQL queries on datasets of different sizes, using different join algorithms. Moreover, they can repeat all queries utilizing a different number of cluster resources. Using real-time cluster monitoring and detailed statistics, participants will be able to understand the advantages of different execution schemes versus the input data as well as the scalability properties of H2RDF over both the data size and the available worker resources

CiteSeerX

DSpace at NTUA

Uncertain Graph Sparsification

Author: Dimitris Papadias
Francesco Bonchi
Nikolaos Papailiou
Panos Parchas
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Automatic Scaling of Selective SPARQL Joins Using the TIRAMOLA System

Author: Dimitrios Tsoumakos
Evangelos Angelou
Ioannis Konstantinou
Nectarios Koziris
Nikolaos Papailiou
Publication venue
Publication date: 01/01/2012
Field of study

Modern cloud infrastructures based on virtual hardware provide new opportunities and challenges for developers and system administrators alike. Most notable is the promise of resource elasticity, whereby the infrastructure can increase or decrease in size based on demand. Utilizing elastic resources, applications can provide better quality of service and reduce cost by only paying for the required amount of resources. In this work, we extensively study the performance of some popular NoSQL databases over an elastic cloud infrastructure. NoSQL databases focus on analytical processing of large scale datasets, offering increased scalability over commodity hardware. We then proceed to describe TIRA-MOLA, a cloud-enabled framework for automatic provisioning of elastic resources on any NoSQL platform. Our system administers cluster resources (VMs) according to useror application-specified constraints through an expandable monitoring and command-issuing module. Users can easily modify resizing policies, based on application-specific metrics and thus fully utilize the elasticity of the underlying infrastructure. As a realistic use-case, we apply this framework on top of a fully distributed RDF store backed by an elastic NoSQL database. Letting TIRAMOLA manage the number of committed resources results in automated cluster resize actions and throughput maximization, while application experts need only provide simple elasticity rules. 1

CiteSeerX

DSpace at NTUA

H2RDF+: High-performance Distributed Joins over Large-scale RDF Graphs

Author: Dimitrios Tsoumakos
Ioannis Konstantinou
Nectarios Koziris
Nikolaos Papailiou
Panagiotis Karras
Publication venue
Publication date
Field of study

Abstract—The proliferation of data in RDF format calls for efficient and scalable solutions for their management. While scalability in the era of big data is a hard requirement, modern systems fail to adapt based on the complexity of the query. Current approaches do not scale well when faced with substantially complex, non-selective joins, resulting in exponential growth of execution times. In this work we present H2RDF+, an RDF store that efficiently performs distributed Merge and Sort-Merge joins over a multiple index scheme. H2RDF+ is highly scalable, utilizing distributed MapReduce processing and HBase indexes. Utilizing aggressive byte-level compression and result grouping over fast scans, it can process both complex and selective join queries in a highly efficient manner. Furthermore, it adaptively chooses for either single- or multi-machine execution based on join complexity estimated through index statistics. Our extensive evaluation demonstrates that H2RDF+ efficiently answers nonselective joins an order of magnitude faster than both current state-of-the-art distributed and centralized stores, while being only tenths of a second slower in simple queries, scaling linearly to the amount of available resources

CiteSeerX

Aldosterone synthase deficiency type II: an unusual presentation of the first Greek case reported with confirmed genetic analysis

Author: Maritsi Despoina
Papailiou Stayroula
Sertedaki Amalia
Syggelos Nikolaos
Syggelou Angeliki
Vlachopapadopoulou Elpis Athina
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/07/2020
Field of study

Objective. Aldosterone synthase deficiency (ASD) is a rare, autosomal recessive inherited disease with an overall clinical phenotype of failure to thrive, vomiting, severe dehydration, hyperkalemia, and hyponatremia. Mutations in the CYP11B2 gene encoding aldosterone synthase are responsible for the occurrence of ASD. Defects in CYP11B2 gene have only been reported in a limited number of cases worldwide. Due to this potential life-threatening risk, comprehensive hormonal investigation followed by genetic confirmation is essential for the clinical management of offsprings

Directory of Open Access Journals