Search CORE

2 research outputs found

An Efficient Decentralized Streaming Model

Author: GRIGORIOU EVANGELOS
ΓΡΗΓΟΡΙΟΥ ΕΥΑΓΓΕΛΟΣ
Publication venue
Publication date: 01/01/2019
Field of study

Πρόσφατα, όλο και μεγαλύτερα ποσά δεδομένων παράγονται από διάφορες πηγές. Τα πλαίσια λογισμικού ροής για Μεγάλα Δεδομένα βοηθούν στην αποθήκευση, ανάλυση και στην απόσπαση χρήσιμων πληροφοριών, από τέτοιου είδους δεδομένα που παράγονται συνεχώς. Υπάρχουν αρκετά τέτοια πλαίσια λογισμικού ροής, όπως το Apache Storm, το Apache Spark και το Apache Flume. Στηνπαρούσαπτυχιακήεργασίαπαρουσιάζουμεέναμοντέλοαποκεντρωμένηςεπεξεργσίαςροής.ΧρησιμοποιείέναπρωτόκολλοDHTγιαναεπιτευχθείμίααρχιτεκτονικήπολλών αφέντων-πολλών εργατών και σε κάθε εργασία να αντεθεί ο δικός της αφέντης. Για κάθε εργασία δημιουργούνται όμοιες ομάδες χρησιμοποιώντας τις ιδιότητες δρομολόγησης του συστήματος, με αποτέλεσμα τον σχηματισμό ενός ιεραρχικού δέντρου, αποτελόυμενοαπόκόμβουςμουσυμμετέχουνστοδίκτυο.Ηρίζααυτόυτουδέντρουενεργείως ο αφέντης της ομάδας και είναι υπεύθυνος για τον συγχρονισμό των μελών της ομάδας. Ο κάθε κόμβος καταναλώνει ζωντανά αρχεία καταγραφής δεδομένων, τα οποία αναλύονται σε μικρές παρτίδες και αποθηκεύονται σε μία δομή δεδομένων που χρησιμοποιεί την μνήμη αποδοτικά. Οι κόμβοι συγκεντρώνουν τα τοπικά τους δεδομένα και τα αποτελέσματα ανεβαίνουν προς τα πάνω στο δέντρο.Recently, increasingly large amounts of data are generated from a variety of sources. Streaming frameworks for Big Data applications help to store, analyze and extract useful informationfromsuchcontinuouslygenerateddata.Thereareseveralexistingstreaming frameworks, like Apache Storm, Apache Spark and Apache Flume. Inthisthesis,wepresentadecentralizedstreamprocessingmodel.ItusesaDHTprotocol toachieveamanymasters-manyworkersarchitectureandassigneachjobitsownmaster. Evengroupsarecreatedforeachjobbyutilizingthesystem’sroutingproperties,resulting in a hierarchical tree formation, consisted of agents that are participating in the network. The root of this tree acts as the master of the group and is responsible for synchronizing the group’s members. Each agent consumes live data logs, which are parsed into mini batches and stored in a memoryefficientdatastructure.Theagentsaggregatetheirlocaldataandtheresultsare rolled up the the aggregation tree

Pergamos : Unified Institutional Repository / Digital Library Platform of the National and Kapodistrian University of Athens

Memory-Efficient GroupBy-Aggregate using Compressed Buffer Trees

Author: Athula Balach
David G. Andersen
Erik Zawadzki
Hrishikesh Amur
Karsten Schwan
Michael Kaminsky
Wolfgang Richter
Publication venue
Publication date: 01/01/2012
Field of study

Memory is rapidly becoming a precious resource in many data processing environments. This paper introduces a new data structure called a Compressed Buffer Tree (CBT). Using a combination of buffering, compression, and lazy aggregation, CBTs can improve the memory efficiency of the GroupBy-Aggregate abstraction which forms the basis of many data processing models like MapReduce and databases. We evaluate CBTs in the context of MapReduce aggregation, and show that CBTs can provide significant advantages over existing hashbased aggregation techniques: up to 2 × less memory and 1.5 × the throughput, at the cost of 2.5 × CPU.

CiteSeerX

Scholarly Materials And Research @ Georgia Tech

Crossref