Search CORE

27 research outputs found

Effect of temperature on the strength development of mortar mixes with GGBS and fly ash

Author: Hatzitheodorou Alexandros
Kanavaris Fragkoulis
Kwasny Jacek
Soutsos Marios
Publication venue: 'Thomas Telford Ltd.'
Publication date: 01/08/2017
Field of study

The concrete mixes used in this study had 28 d mean strengths of 50 and 30 MPa and also had Portland cement (PC) partially replaced with ground granulated blast-furnace slag (GGBS) and fly ash (FA). These mixes were the same as those used in a UK-based project that involved casting of blocks, walls and slabs. The strength development of ‘equivalent’ mortar mixes was determined in the laboratory for curing temperatures of 10, 20, 30, 40 and 50°C. High curing temperatures were found to have a beneficial effect on the early-age strength, but a detrimental effect on the long-term strength. GGBS was found to be more sensitive to high curing temperatures than PC and FA, as reflected in its higher ‘apparent’ activation energy. The accuracy of strength estimates obtained from maturity functions was examined. The temperature dependence of the Nurse–Saul function (i.e. concrete strength gain rate varies linearly with temperature) was not sufficient to account for the improvement in early-age strengths resulting from high curing temperatures. The Arrhenius-based function, on the other hand, overestimated them because of the detrimental effect of high curing temperature on strength starting from a very early age. Both functions overestimated the long-term strengths, as neither function accounts for the detrimental effect of high curing temperatures on the ultimate compressive strength. </jats:p

Queen's University Belfast Research Portal

Crossref

A Survey on the Evolution of Stream Processing Systems

Author: Carbone Paris
Fragkoulis Marios
Kalavri Vasiliki
Katsifodimos Asterios
Publication venue
Publication date: 03/08/2020
Field of study

Stream processing has been an active research field for more than 20 years, but it is now witnessing its prime time due to recent successful efforts by the research community and numerous worldwide open-source communities. This survey provides a comprehensive overview of fundamental aspects of stream processing systems and their evolution in the functional areas of out-of-order data management, state management, fault tolerance, high availability, load management, elasticity, and reconfiguration. We review noteworthy past research findings, outline the similarities and differences between early ('00-'10) and modern ('11-'18) streaming systems, and discuss recent trends and open problems.Comment: 34 pages, 15 figures, 5 table

arXiv.org e-Print Archive

Stateful Entities: Object-oriented Cloud Applications as Distributed Dataflows

Author: Fragkoulis Marios
Katsifodimos Asterios
Psarakis Kyriakos
Visser Eelco
Zorgdrager Wouter
Publication venue
Publication date: 17/11/2021
Field of study

Programming stateful cloud applications remains a very painful experience. Instead of focusing on the business logic, programmers spend most of their time dealing with distributed systems considerations, with the most important being consistency, load balancing, failure management, recovery, and scalability. At the same time, we witness an unprecedented adoption of modern dataflow systems such as Apache Flink, Google Dataflow, and Timely Dataflow. These systems are now performant and fault-tolerant, and they offer excellent state management primitives. With this line of work, we aim at investigating the opportunities and limits of compiling general-purpose programs into stateful dataflows. Given a set of easy-to-follow code conventions, programmers can author stateful entities, a programming abstraction embedded in Python. We present a compiler pipeline named StateFlow, to analyze the abstract syntax tree of a Python application and rewrite it into an intermediate representation based on stateful dataflow graphs. StateFlow compiles that intermediate representation to a target execution system: Apache Flink and Beam, AWS Lambda, Flink's Statefun, and Cloudburst. Through an experimental evaluation, we demonstrate that the code generated by StateFlow incurs minimal overhead. While developing and deploying our prototype, we came to observe important limitations of current dataflow systems in executing cloud applications at scale

arXiv.org e-Print Archive

Leveraging Large Language Models for Sequential Recommendation

Author: Fragkoulis Marios
Harte Jesse
Jannach Dietmar
Katsifodimos Asterios
Louridas Panos
Zorgdrager Wouter
Publication venue
Publication date: 01/01/2023
Field of study

Sequential recommendation problems have received increasing attention in research during the past few years, leading to the inception of a large variety of algorithmic approaches. In this work, we explore how large language models (LLMs), which are nowadays introducing disruptive effects in many AI-based applications, can be used to build or improve sequential recommendation approaches. Specifically, we devise and evaluate three approaches to leverage the power of LLMs in different ways. Our results from experiments on two datasets show that initializing the state-of-the-art sequential recommendation model BERT4Rec with embeddings obtained from an LLM improves NDCG by 15-20% compared to the vanilla BERT4Rec model. Furthermore, we find that a simple approach that leverages LLM embeddings for producing recommendations, can provide competitive performance by highlighting semantically related items. We publicly share the code and data of our experiments to ensure reproducibility.Comment: 9 page

arXiv.org e-Print Archive

TU Delft Repository

Valentine: Evaluating Matching Techniques for Dataset Discovery

Author: Bonifati Angela
Brons Jerry
Fragkoulis Marios
Ionescu Andra
Katsifodimos Asterios
Koutras Christos
Lofi Christoph
Psarakis Kyriakos
Siachamis George
Publication venue
Publication date: 01/10/2020
Field of study

Data scientists today search large data lakes to discover and integrate datasets. In order to bring together disparate data sources, dataset discovery methods rely on some form of schema matching: the process of establishing correspondences between datasets. Traditionally, schema matching has been used to find matching pairs of columns between a source and a target schema. However, the use of schema matching in dataset discovery methods differs from its original use. Nowadays schema matching serves as a building block for indicating and ranking inter-dataset relationships. Surprisingly, although a discovery method's success relies highly on the quality of the underlying matching algorithms, the latest discovery methods employ existing schema matching algorithms in an ad-hoc fashion due to the lack of openly-available datasets with ground truth, reference method implementations, and evaluation metrics. In this paper, we aim to rectify the problem of evaluating the effectiveness and efficiency of schema matching methods for the specific needs of dataset discovery. To this end, we propose Valentine, an extensible open-source experiment suite to execute and organize large-scale automated matching experiments on tabular data. Valentine includes implementations of seminal schema matching methods that we either implemented from scratch (due to absence of open source code) or imported from open repositories. The contributions of Valentine are: i) the definition of four schema matching scenarios as encountered in dataset discovery methods, ii) a principled dataset fabrication process tailored to the scope of dataset discovery methods and iii) the most comprehensive evaluation of schema matching techniques to date, offering insight on the strengths and weaknesses of existing techniques, that can serve as a guide for employing schema matching in future dataset discovery methods

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

HAL

Hal-Diderot

The modified nurse-saul (MNS) maturity function for improved strength estimates at elevated curing temperatures

Author: Kanavaris Fragkoulis
Soutsos Marios
Publication venue: 'Elsevier BV'
Publication date: 01/12/2018
Field of study

Curing temperature affects significantly the compressive strength development of mortar mixtures. Higher curing temperatures accelerate the cement hydration and thus also the early age compressive strength development. However, the age conversion factors in maturity functions, especially that of the Nurse-Saul function, are not sufficient to account for this acceleration and thus an additional “acceleration” factor is needed. The “acceleration” compresses a certain percentage of hydration or strength development into a smaller time interval. The strength development rate was increased because of the “compression” of the hydration. The “acceleration” factor was not equal to the “compression” factor. The reaction at the higher temperature was therefore less efficient in contributing to the compressive strength than the reaction at the lower temperature. A relationship between concrete strength and the Nurse-Saul maturity index combined with an “acceleration” and a “temperature efficiency” factors are used in an iterative procedure for predicting/estimating the strength development for other than the standard 20 °C curing temperature. Keywords: Curing temperature, Compressive strength, Maturity function, Activation energ

Queen's University Belfast Research Portal

Directory of Open Access Journals

Compressive strength estimates for adiabatically cured concretes with the Modified Nurse-Saul (MNS) maturity function

Author: Kanavaris Fragkoulis
Soutsos Marios
Publication venue: 'Elsevier BV'
Publication date: 20/09/2020
Field of study

Queen's University Belfast Research Portal

Τεχνολογίες ανάλυσης δεδομένων κυρίας μνήμης

Author: Fragkoulis Marios
Φραγκούλης Μάριος
Publication venue: 'National Documentation Centre (EKT)'
Publication date: 01/01/2017
Field of study

The digital data has become a key resource for solving scienƟfic and business problems and achieving compeƟƟve advantage. With this purpose the scienƟfic and business communiƟes worldwide are trying to extract knowledge from data available to them. The Ɵmely use of data significantly affects scienƟfic progress, quality of life, and economic acƟvity.In the digital age the efficient processing and effecƟve data analysis are important challenges.The processing of data in main memory can boost processing efficiency especially if it is combinedwith new soŌware system architectures. At the same Ɵme useful and usable tools are required foranalysing main memory data to saƟsfy important use cases not met by database and programminglanguage technologies.The unified management of the memory hierarchy can improve the processing of data in mainmemory. In this architecture the communicaƟon between the different parts of the memory hierarchy is transparent to the applicaƟons and opƟmizaƟon techniques are applied holisƟcally. The data flow in the memory hierarchy so that the ones that will be processed shortly are closest to the Öçs and programming languages treat temporary and permanent data of any type uniformly. As a result, new data analysis systems can be developed that take advantage of faster main memory data structures over disk-based ones for processing the data leaving the memory hierarchy to care for the availability of data.The absence of suitable analyƟcal tools hinders knowledge extracƟon in cases of soŌware applicaƟons that do not need the support of a database system. Some examples are applicaƟons whose data have a complex structure and are oŌen stored in files, eg scienƟfic applicaƟons in areas such as biology, and applicaƟons that do not maintain permanent data, such as data visualizaƟon applicaƟons and diagnosƟc tools. Databases offer widely used and recognized query interfaces, but applicaƟons that do not need the services of a database should not resort to this soluƟon only to saƟsfy the need to analyze their data.Programming languages on the other hand rarely provide expressive and usable query interfaces.These can be used internally in an applicaƟon, but usually they do not offer interacƟve ad-hoc queries at runƟme. Therefore the data analysis scenarios they can support are standard and any addiƟons or modificaƟons to the queries entail recompiling and rerunning the applicaƟon.In addiƟon to solving problems modeled by soŌware applicaƟons, data analysis techniques areuseful for solving problems that occur in the applicaƟons themselves. This is possible by analyzing the metadata that applicaƟons keep in main memory during their operaƟon. This pracƟce can be applied to any kind of system soŌware, such as an operaƟng system.This thesis studies the methods and technologies for supporƟng queries on main memory dataand how the widespread architecture of soŌware systems currently affects technologies. Based onthe findings from the literature we develop a method and a technology to perform interacƟve queries on data that reside in main memory. Our approach is based on the criteria of usefulness and usability.AŌer an overview of the programming languages that fit the data analysis we choose ÝØ½, the standard data manipulaƟon language for decades.The method we develop represents programming data structures in relaƟonal terms as requires ÝØ½. Our method replaces the associaƟons between structures with relaƟonships between relaƟonal representaƟons. The result is a virtual relaƟonal schema of the programming data model, which we call relaƟonal representaƟon.The method’s implementaƟon took place on the and ++ programming languages because oftheir wide use for the development of systems and applicaƟons. An addiƟonal reason why ++ waschosen is the availability of a large number of algorithms and data structures that it offers. The implementaƟon includes a domain specific language for describing relaƟonal representaƟons, a compiler that generates the source code of the relaƟonal interface to the programming data structures given a relaƟonal specificaƟon, and the implementaƟon of ÝØ½ite’s virtual table Ö®. ÝØ½ite is a relaƟonal database system that offers the query engine and the ability to run queries to non-relaƟonal data through its virtual table Ö®.The implementaƟon expands to the development of two diagnosƟc tools for idenƟfying problemsin soŌware systems through queries to main memory metadata related to their state. as theimplementaƟon language of many soŌware systems is ideal for the applicaƟon of this idea. For thispurpose we incorporate our implementaƟon in the Linux kernel. Important implementaƟon aspectsthat we address is synchronized access to data and the integrity of query results. We also apply ourapproach to expand the diagnosƟc capabiliƟes of Valgrind, a system that controls the way that soŌware applicaƟons use memory.The overall evaluaƟon of our approach involves its integraƟon in three ++ soŌware applicaƟons,in the Linux kernel, and in Valgrind, where we also perform a user study with students. For the study we combine qualitaƟve analysis through quesƟonnaire and quanƟtaƟve analysis using code measurements.In the context of the ++ applicaƟons the performance measurements between Öi Ê Ø½queries and the corresponding queries expressed in ++ show that ÝØ½ combined with our relaƟonal representaƟon provides greater expressiveness. The same happens when we compare our approach with ÝØ½ aŌer imporƟng the data into a MyÝØ½ relaƟonal database system. The efficiency of our approach is worse than ++ and beƩer than MyÝØ½. The queries with our approaches need twice as long Ɵme to run compared with ++ regardless of the problem’s size. The ÝØ½ queries in MyÝØ½ require double, triple, or more Ɵme to execute compared to our approach.In the context of the Linux kernel where our relaƟonal interface funcƟons as a diagnosƟc tool wefind real problems by execuƟng queries against the kernel’s data structures. Access to files without the required privileges, unauthorized execuƟon of processes, the idenƟficaƟon of binaries that are used in loading processes but are not used by any, and the direct execuƟon of system calls by processors belonging to a virtual machine are the security problems we idenƟfy. In addiƟon we show queries that combine metrics from different subsystems, such as pages in memory, disk files, processor acƟvity, and network data transfers, which can help idenƟfy performance problems. The measurement of query processing Ɵme and the added overhead to the system encourage the use of our tool.The diagnosƟc tool we developed for Valgrind detects problems, addiƟonal to those found byValgrind, through the use of quesƟons in the collected metadata of the applicaƟon being tested. The bzip2 tool for instance wastes nine hundred » where all the memory cells are consecuƟve in a single pool. This size is equivalent to twelve percent of the total memory that the applicaƟon needs to operate. Through queries on the dynamic funcƟon call graph formed during an applicaƟon’s execuƟonwe find a code path that is performance criƟcal. It is located in the glibc library and is widely used by the sort and uniq Unix tools. This opƟmizaƟon was implemented by glibc’s development team and was included in the next version without our contribuƟon.Finally, in the user study the one group expresses analysis tasks with ÝØ½ queries and the otherwith Python code. The results show that the Ɵme required for the expression of an analysis job issmaller when ÝØ½is used. On the contrary no staƟsƟcally significant differences are observed between the two approaches in terms of usefulness, efficiency, and expressiveness, although our approach has a higher raƟng. For the dimension of usability the evaluaƟons demonstrated no clear winner, but both approaches achieved very good evaluaƟon. The evaluaƟon of the ÝØ½ group code’s performance shows that the ÝØ½ group had more correct replies achieved with less Ɵme of programming. We consider this metric indicaƟve of our approach’s usefulness vis a vis Python, which is also widely used for data analysis. We also consider the Ɵme required for the expression of an analysis task as a usability factor.The challenges to the processing of data conƟnue to emerge at an unabated pace. In this environment soŌware applicaƟons require soluƟons for the analysis of user data, but also to solve problems relaƟng to their operaƟon. The processing of data in main memory can bring important benefits in combinaƟon with other innovaƟons. In this direcƟon new architectures that benefit the efficient processing of data can play an important role. We hope that this thesis will aid the efficient processing and effecƟve data analysis expected by users.Τα ψηφιακά δεδομένα έχουν αναδειχθεί σε καθοριστικό πόρο για την επίλυση επιστημονικών και επιχειρηματικών προβλημάτων και για την επίτευξη ανταγωνιστικού πλεονεκτήματος. Με αυτό το σκοπό οι επιστημονικές και επιχειρηματικές κοινότητες διεθνώς προσπαθούν να εξάγουν γνώση από τα δεδομένα που έχουν στη διάθεσή τους. Η έγκαιρη αξιοποίηση των δεδομένων επηρεάζει σημαντικά την επιστημονική πρόοδο, την ποιότητα ζωής, και την οικονομική δραστηριότητα.Στην ψηφιακή εποχή η αποδοτική επεξεργασία και η αποτελεσματική ανάλυση των δεδομένων αποτελούν σημαντικές προκλήσεις. Η επεξεργασία των δεδομένων στην κύρια μνήμη μπορεί να προάγει την αποδοτικότητά της ειδικά αν συνδυαστεί με νέες αρχιτεκτονικές συστημάτων λογισμικού.Ταυτόχρονα όμως απαιτούνται χρήσιμα και χρηστικά εργαλεία ανάλυσης δεδομένων της κύριας μνήμης για να ικανοποιήσουν σημαντικές περιπτώσεις χρήσεις που δεν πληρούνται από τεχνολογίες βάσεων δεδομένων και γλωσσών προγραμματισμού.Η ενιαία διαχείριση της ιεραρχίας της μνήμης μπορεί να βελτιώσει την επεξεργασία των δεδομένων στην κύρια μνήμη. Σε αυτή την αρχιτεκτονική η επικοινωνία ανάμεσα στα διάφορα τμήματα της ιεραρχίας της μνήμης είναι διαφανής ως προς τις εφαρμογές ενώ οι τεχνικές βελτιστοποίησης εφαρμόζονται ολιστικά. Τα δεδομένα ρέουν στην ιεραρχία της μνήμης ούτως ώστε αυτά που πρόκειται να επεξεργαστούν να βρίσκονται όσο το δυνατόν πλησιέστερα στις κεντρικές μονάδες επεξεργασίας ενώ οι γλώσσες προγραμματισμού αντιμετωπίζουν ομοιόμορφα τα προσωρινά και τα μόνιμα δεδομένα κάθε τύπου. Ως αποτέλεσμα, νέα συστήματα ανάλυσης δεδομένων μπορούν να αναπτυχθούν που εκμεταλλεύονται τις ταχύτερες δομές δεδομένων της κύριας μνήμης έναντι αυτών του δίσκου για την επεξεργασία των δεδομένων αφήνοντας την ιεραρχία της μνήμης να φροντίσει για τη διαθεσιμότητα των δεδομένων.Η απουσία κατάλληλων εργαλείων ανάλυσης δυσχεραίνει την εξαγωγή γνώσης σε περιπτώσεις εφαρμογών λογισμικού που δε χρειάζονται την υποστήριξη μιας βάσης δεδομένων. Μερικά παραδείγματα είναι εφαρμογές των οποίων τα δεδομένα έχουν σύνθετη δομή και συχνά αποθηκεύονται σε αρχεία, π.χ. επιστημονικές εφαρμογές σε τομείς όπως η βιολογία, και εφαρμογές που δε διατηρούν μόνιμα δεδομένα, όπως εφαρμογές οπτικοποίησης δεδομένων και διαγνωστικά εργαλεία. Οι βάσεις δεδομένων προσφέρουν ευρέως χρησιμοποιούμενες και αναγνωρισμένες διεπαφές ερωτημάτων, όμως οι εφαρμογές που δεν έχουν ανάγκη τις υπηρεσίες μιας βάσης δεδομένων δεν πρέπει να καταφεύγουν σε αυτή τη λύση αποκλειστικά και μόνο για να ικανοποιήσουν την ανάγκη ανάλυσης των δεδομένων τους.Οι γλώσσες προγραμματισμού από την άλλη πλευρά σπάνια διαθέτουν εκφραστικές και εύκολες στη χρήση διεπαφές ερωτημάτων. Αυτές μπορούν να χρησιμοποιηθούν εσωτερικά σε μια εφαρμογή, αλλά συνήθως δεν προσφέρουν αλληλεπιδραστικά ερωτήματα σε χρόνο εκτέλεσης. Συνεπώς τα σενάρια ανάλυσης δεδομένων που μπορούν να υποστηρίξουν είναι τυποποιημένα και κάθε προσθήκη επιφέρει επαναμεταγλώττιση και επανεκτέλεση της εφαρμογής.Εκτός από την επίλυση προβλημάτων που μοντελοποιούνται από εφαρμογές λογισμικού, τεχνικές ανάλυσης δεδομένων είναι χρήσιμες και για την επίλυση προβλημάτων που παρουσιάζονται στις ίδιες τις εφαρμογές. Αυτό είναι εφικτό με την ανάλυση των μεταδεδομένων που φυλάσσουν οι εφαρμογές στην κύρια μνήμη κατά τη διάρκεια της λειτουργίας τους. Αυτή η πρακτική μπορεί να εφαρμοστεί σε κάθε είδους σύστημα λογισμικού όπως ένα λειτουργικό σύστημα.Η παρούσα διατριβή μελετά τις μεθόδους και τεχνολογίες για την υποστήριξη ερωτημάτων σε δεδομένα της κύριας μνήμης και πως η διαδομένη αρχιτεκτονική συστημάτων λογισμικού σήμερα επηρεάζει τις τεχνολογίες. Βάσει των ευρημάτων από τη βιβλιογραφία αναπτύσσουμε μια μέθοδο και μια τεχνολογία για την εκτέλεση αλληλεπιδραστικών ερωτημάτων σε δεδομένα της κύριας μνήμης.Η προσέγγισή μας βασίζεται στα κριτήρια της χρησιμότητας και της χρηστικότητας. Με τα από επισκόπηση των γλωσσών προγραμματισμού που ταιριάζουν στην ανάλυση δεδομένων επιλέγουμε την ÝØ½, την πρότυπη γλώσσα διαχείρισης δεδομένων εδώ και δεκαετίες.Η μέθοδος που αναπτύσσουμε αναπαριστά προγραμματιστικές δομές δεδομένων σε σχεσιακούς όρους όπως προϋποθέτει η ÝØ½. Η μέθοδος αντικαθιστά τις συσχετίσεις μεταξύ των δομών με σχέσεις μεταξύ των σχεσιακών αναπαραστάσεων. Το αποτέλεσμα είναι ένα ιδεατό σχεσιακό σχήμα του προγραμματιστικού μοντέλου δεδομένων το οποίο καλούμε σχεσιακή αναπαράσταση. H υλοποίηση της μεθόδου έγινε στις γλώσσες προγραμματισμού και ++ λόγω της ευρείας χρήσης τους για την ανάπτυξη συστημάτων και εφαρμογών. Ένας επιπλέον λόγος που επιλέχθηκε η ++είναι η διαθεσιμότητα μεγάλου αριθμού αλγορίθμων και δομών δεδομένων. H υλοποίηση περιλαμβάνει μια γλώσσα πεδίου για την περιγραφή σχεσιακών αναπράστασεων, έναν μεταγλωττιστή που παράγει τον πηγαίο κώδικα της σχεσιακής διεπαφής προς τις προγραμματιστικές δομές δεδομένων από μια σχεσιακή περιγραφή, και την υλοποίηση της προγραμματιστικής διεπαφής ιδεατών πινάκων της σχεσιακής βάσης δεδομένων ÝØ½ite. Η τελευταία προσφέρει τη μηχανή ερωτημάτων, η οποία δίνει τη δυνατότητα εκτέλεσης ερωτημάτων προς μη σχεσιακά δεδομένα μέσω της διεπαφής ιδεατών πινάκων.Η υλοποίηση επεκτείνεται για την ανάπτυξη δύο διαγνωστικών εργαλείων που εντοπίζουν προβλήματα σε συστήματα λογισμικού μέσω ερωτημάτων προς τα μεταδεδομένα που τα συστήματα φυλάσσουν στην κύρια μνήμη αναφορικά με τη λειτουργία τους. Η ως γλώσσα υλοποίησης πολλών συστημάτων λογισμικού είναι ιδανική για την εφαρμογή αυτής της ιδέας. Γι’ αυτό το σκοπό ενσωματώνουμε την υλοποίησή μας στον πυρήνα του Linux όπου εστιάζουμε στη συγχρονισμένη πρόσβαση στα δεδομένα και στην ακεραιότητα των αποτελεσμάτων των ερωτημάτων. Επίσης εφαρμόζουμε την προσέγγισή μας για να επεκτείνουμε τις διαγνωστικές ικανότητες του Valgrind, ένα σύστημα το οποίο ελέγχει τον τρόπο που εφαρμογές λογισμικού χρησιμοποιούν τη μνήμη.Η αξιολόγηση της προσέγγισής μας συνολικά περιλαμβάνει την ενσωμάτωση σε τρεις εφαρμογές λογισμικού της ++, στον πυρήνα του Linux, και στο Valgrind, όπου διενεργούμε και μια μελέτη αξιολόγησης χρηστών με φοιτητές. Για τη μελέτη συνδυάζουμε ποιοτική ανάλυση μέσω ερωτηματολογίου και ποσοτική ανάλυση μέσω μετρήσεων κώδικα. Οι μετρήσεις στις ++ εφαρμογές ανάμεσα σε ÝØ½ ερωτήματα και αντίστοιχα ερωτήματα εκφρασμένα στη ++ δείχνουν ότι η ÝØ½ σε συνδυασμό με τη σχεσιακή μας αναπαράσταση παρέχει ανώτερη εκφραστικότητα. Το ίδιο συμβαίνει όταν συγκρίνουμε την προσέγγισή μας με την ίδια την ÝØ½ έχοντας εισάγει τα δεδομένα στη σχεσιακή βάση δεδομένων MyÝØ½. Η αποδοτικότητα της προσέγγισής μας είναι χειρότερη αυτής των ερωτημάτων ++ και καλύτερη αυτής των ερωτημάτων ÝØ½ εκτελεσμένων στη MyÝØ½. Τα ερωτήματα με την προσέγγισή μας χρειάζονται το διπλάσιο χρόνο για να εκτελεστούν σε σύγκριση με τη ++ ανεξάρτητα από το μεγέθος του προβλήματος. Τα ερωτήματα ÝØ½ στη MyÝØ½ απαιτούν διπλάσιο, τριπλάσιο, ή και παραπάνω χρόνο για την εκτέλεσή τους σε σχέση με τη δική μας προσέγγιση.Στο πλαίσιο του πυρήνα του Linux όπου η σχεσιακή μας διεπαφή λειτουργεί ως διαγνωστικό εργαλείο βρίσκουμε αληθινά προβλήματα εκτελώντας ÝØ½ ερωτήματα προς τις δομές δεδομένων του συστήματος. Η πρόσβαση σε αρχεία χωρίς τα κατάλληλα δικαιώματα, η μη εξουσιοδοτημένη εκτέλεση διεργασιών, ο εντοπισμός δυαδικών αρχείων που χρησιμεύουν στη φόρτωση διεργασιών αλλά δε χρησιμοποιούνται από καμία, και η εκτέλεση κλήσεων συστήματος από επεξεργαστές που ανήκουν σε ένα ιδεατό μηχάνημα είναι τα προβλήματα ασφάλειας που εντοπίζουμε. Επίσης δείχνουμε ερωτήματα που συνδυάζουν μετρικές από διαφορετικά υποσυστήματα, όπως σελίδες στη μνήμη, αρχεία στο δίσκο, δραστηριότητα επεξεργαστή, και δικτυακές μεταφορές δεδομένων, τα οποία μπορούν να βοηθήσουν στον εντοπισμό προβλημάτων απόδοσης. Η μέτρηση του χρόνου επεξεργασίας των ερωτημάτων και το βάρος που προσθέτουν στο σύστημα ενθαρρύνουν τη χρήση του εργαλείου μας.Το διαγνωστικό εργαλείο που αναπτύξαμε πάνω στο Valgrind εντοπίζει προβλήματα, επιπρόσθετα αυτών που βρίσκει το Valgrind, μέσω της χρήσης ÝØ½ ερωτημάτων στα συλλεγμένα μεταδεδομένα της εφαρμογής που ελέγχεται. Το bzip2 για παράδειγμα σπαταλά 900» μνήμης όπου όλα τα κελιά είναι συνεχόμενα στην ίδια περιοχή. Αυτό το μέγεθος ισοδυναμεί με το 12% της συνολικής μνήμης που χρειάζεται η εφαρμογή για να λειτουργήσει. Επίσης από ερωτήματα πάνω στο γράφο δυναμικών κλήσεων συναρτήσεων που σχηματίζεται κατά την εκτέλεση μιας εφαρμογής διαπιστώνουμε την ύπαρξη κρίσιμου από πλευράς απόδοσης κώδικα που βρίσκεται στη βιβλιοθήκη glibc και χρησιμοποιείται ευρέως από τα εργαλεία Unix sort και uniq. Η εν λόγω βελτιστοποίηση υλοποιήθηκε από την ομάδα ανάπτυξης της βιβλιοθήκης σε επόμενη έκδοση χωρίς δική μας παρέμβαση. Τέλος, στο πείραμα με τους φοιτητές η μια ομάδα εκφράζει εργασίες ανάλυσης με ερωτήματαÝØ½ και η άλλη με κώδικα Python. Τα αποτελέσματα δείχνουν ότι ο χρόνος που απαιτείται για την έκφραση μιας εργασίας ανάλυσης είναι μικρότερος όταν χρησιμοποιείται ÝØ½. Αντίθετα δε σημειώνονται στατιστικά σημαντικές διαφορές στη χρησιμότητα, απόδοση, και εκφραστικότητα ανάμεσα στις δύο προσεγγίσεις παρόλο που η προσέγγισή μας έχει υψηλότερη αξιολόγηση. Στη διάσταση της χρηστικότητας οι αξιολογήσεις δεν κατέδειξαν ξεκάθαρο νικητή, όμως και οι δύο προσεγγίσεις πέτυχαν πολύ καλή αξιολόγηση. Η αξιολόγηση της απόδοσης του κώδικα που έγραψαν οι φοιτητές για τα ερωτήματα δείχνει ότι η ομάδα φοιτητών που χρησιμοποίησε ÝØ½ είχε περισσότερες σωστές απαντήσεις σε μικρότερο χρόνο. Θεωρούμε αυτή τη μετρική ενδεικτική της χρησιμότητας της προσέγγισής μας έναντι της Python, η οποία επίσης χρησιμοποιείται ευρέως σε εργασίες ανάλυσης δεδομένων.Επίσης θεωρούμε το χρόνο που απαιτείται για τη έκφραση μιας εργασίας ανάλυσης παράμετρο χρηστικότητας.Οι προκλήσεις στην επεξεργασία των δεδομένων συνεχίζουν να αναδύονται με αμείωτο ρυθμό.Σε αυτό το περιβάλλον οι εφαρμογές λογισμικού απαιτούν λύσεις για την ανάλυση των δεδομένων των χρηστών αλλά και για την επίλυση προβλημάτων που αφορούν τις ίδιες. Η επεξεργασία των δεδομένων στην κύρια μνήμη μπορεί να φέρει σημαντικά οφέλη σε συνδυασμό με άλλες καινοτομίες. Σε αυτή την κατεύθυνση νέες αρχιτεκτονικές που οφελούν την αποδοτική επεξεργασία των δεδομένων μπορούν να παίξουν σημαντικό ρόλο. Ελπίζουμε ότι αυτή η διατριβή θα βοηθήσει στην αποδοτικότερη επεξεργασία και αποτελεσματικότερη ανάλυση των δεδομένων που προσδοκούν οι χρήστες

Hellenic National Archive of Doctoral Dissertations