195 research outputs found

    Data-Driven Intelligent Scheduling For Long Running Workloads In Large-Scale Datacenters

    Get PDF
    Cloud computing is becoming a fundamental facility of society today. Large-scale public or private cloud datacenters spreading millions of servers, as a warehouse-scale computer, are supporting most business of Fortune-500 companies and serving billions of users around the world. Unfortunately, modern industry-wide average datacenter utilization is as low as 6% to 12%. Low utilization not only negatively impacts operational and capital components of cost efficiency, but also becomes the scaling bottleneck due to the limits of electricity delivered by nearby utility. It is critical and challenge to improve multi-resource efficiency for global datacenters. Additionally, with the great commercial success of diverse big data analytics services, enterprise datacenters are evolving to host heterogeneous computation workloads including online web services, batch processing, machine learning, streaming computing, interactive query and graph computation on shared clusters. Most of them are long-running workloads that leverage long-lived containers to execute tasks. We concluded datacenter resource scheduling works over last 15 years. Most previous works are designed to maximize the cluster efficiency for short-lived tasks in batch processing system like Hadoop. They are not suitable for modern long-running workloads of Microservices, Spark, Flink, Pregel, Storm or Tensorflow like systems. It is urgent to develop new effective scheduling and resource allocation approaches to improve efficiency in large-scale enterprise datacenters. In the dissertation, we are the first of works to define and identify the problems, challenges and scenarios of scheduling and resource management for diverse long-running workloads in modern datacenter. They rely on predictive scheduling techniques to perform reservation, auto-scaling, migration or rescheduling. It forces us to pursue and explore more intelligent scheduling techniques by adequate predictive knowledges. We innovatively specify what is intelligent scheduling, what abilities are necessary towards intelligent scheduling, how to leverage intelligent scheduling to transfer NP-hard online scheduling problems to resolvable offline scheduling issues. We designed and implemented an intelligent cloud datacenter scheduler, which automatically performs resource-to-performance modeling, predictive optimal reservation estimation, QoS (interference)-aware predictive scheduling to maximize resource efficiency of multi-dimensions (CPU, Memory, Network, Disk I/O), and strictly guarantee service level agreements (SLA) for long-running workloads. Finally, we introduced a large-scale co-location techniques of executing long-running and other workloads on the shared global datacenter infrastructure of Alibaba Group. It effectively improves cluster utilization from 10% to averagely 50%. It is far more complicated beyond scheduling that involves technique evolutions of IDC, network, physical datacenter topology, storage, server hardwares, operating systems and containerization. We demonstrate its effectiveness by analysis of newest Alibaba public cluster trace in 2017. We are the first of works to reveal the global view of scenarios, challenges and status in Alibaba large-scale global datacenters by data demonstration, including big promotion events like Double 11 . Data-driven intelligent scheduling methodologies and effective infrastructure co-location techniques are critical and necessary to pursue maximized multi-resource efficiency in modern large-scale datacenter, especially for long-running workloads

    Cross-Layer Rapid Prototyping and Synthesis of Application-Specific and Reconfigurable Many-accelerator Platforms

    Get PDF
    Technological advances of recent years laid the foundation consolidation of informatisationof society, impacting on economic, political, cultural and socialdimensions. At the peak of this realization, today, more and more everydaydevices are connected to the web, giving the term ”Internet of Things”. The futureholds the full connection and interaction of IT and communications systemsto the natural world, delimiting the transition to natural cyber systems and offeringmeta-services in the physical world, such as personalized medical care, autonomoustransportation, smart energy cities etc. . Outlining the necessities of this dynamicallyevolving market, computer engineers are required to implement computingplatforms that incorporate both increased systemic complexity and also cover awide range of meta-characteristics, such as the cost and design time, reliabilityand reuse, which are prescribed by a conflicting set of functional, technical andconstruction constraints. This thesis aims to address these design challenges bydeveloping methodologies and hardware/software co-design tools that enable therapid implementation and efficient synthesis of architectural solutions, which specifyoperating meta-features required by the modern market. Specifically, this thesispresents a) methodologies to accelerate the design flow for both reconfigurableand application-specific architectures, b) coarse-grain heterogeneous architecturaltemplates for processing and communication acceleration and c) efficient multiobjectivesynthesis techniques both at high abstraction level of programming andphysical silicon level.Regarding to the acceleration of the design flow, the proposed methodologyemploys virtual platforms in order to hide architectural details and drastically reducesimulation time. An extension of this framework introduces the systemicco-simulation using reconfigurable acceleration platforms as co-emulation intermediateplatforms. Thus, the development cycle of a hardware/software productis accelerated by moving from a vertical serial flow to a circular interactive loop.Moreover the simulation capabilities are enriched with efficient detection and correctiontechniques of design errors, as well as control methods of performancemetrics of the system according to the desired specifications, during all phasesof the system development. In orthogonal correlation with the aforementionedmethodological framework, a new architectural template is proposed, aiming atbridging the gap between design complexity and technological productivity usingspecialized hardware accelerators in heterogeneous systems-on-chip and networkon-chip platforms. It is presented a novel co-design methodology for the hardwareaccelerators and their respective programming software, including the tasks allocationto the available resources of the system/network. The introduced frameworkprovides implementation techniques for the accelerators, using either conventionalprogramming flows with hardware description language or abstract programmingmodel flows, using techniques from high-level synthesis. In any case, it is providedthe option of systemic measures optimization, such as the processing speed,the throughput, the reliability, the power consumption and the design silicon area.Finally, on addressing the increased complexity in design tools of reconfigurablesystems, there are proposed novel multi-objective optimization evolutionary algo-rithms which exploit the modern multicore processors and the coarse-grain natureof multithreaded programming environments (e.g. OpenMP) in order to reduce theplacement time, while by simultaneously grouping the applications based on theirintrinsic characteristics, the effectively explore the design space effectively.The efficiency of the proposed architectural templates, design tools and methodologyflows is evaluated in relation to the existing edge solutions with applicationsfrom typical computing domains, such as digital signal processing, multimedia andarithmetic complexity, as well as from systemic heterogeneous environments, suchas a computer vision system for autonomous robotic space navigation and manyacceleratorsystems for HPC and workstations/datacenters. The results strengthenthe belief of the author, that this thesis provides competitive expertise to addresscomplex modern - and projected future - design challenges.Οι τεχνολογικές εξελίξεις των τελευταίων ετών έθεσαν τα θεμέλια εδραίωσης της πληροφοριοποίησης της κοινωνίας, επιδρώντας σε οικονομικές,πολιτικές, πολιτιστικές και κοινωνικές διαστάσεις. Στο απόγειο αυτής τη ςπραγμάτωσης, σήμερα, ολοένα και περισσότερες καθημερινές συσκευές συνδέονται στο παγκόσμιο ιστό, αποδίδοντας τον όρο «Ίντερνετ των πραγμάτων».Το μέλλον επιφυλάσσει την πλήρη σύνδεση και αλληλεπίδραση των συστημάτων πληροφορικής και επικοινωνιών με τον φυσικό κόσμο, οριοθετώντας τη μετάβαση στα συστήματα φυσικού κυβερνοχώρου και προσφέροντας μεταυπηρεσίες στον φυσικό κόσμο όπως προσωποποιημένη ιατρική περίθαλψη, αυτόνομες μετακινήσεις, έξυπνες ενεργειακά πόλεις κ.α. . Σκιαγραφώντας τις ανάγκες αυτής της δυναμικά εξελισσόμενης αγοράς, οι μηχανικοί υπολογιστών καλούνται να υλοποιήσουν υπολογιστικές πλατφόρμες που αφενός ενσωματώνουν αυξημένη συστημική πολυπλοκότητα και αφετέρου καλύπτουν ένα ευρύ φάσμα μεταχαρακτηριστικών, όπως λ.χ. το κόστος σχεδιασμού, ο χρόνος σχεδιασμού, η αξιοπιστία και η επαναχρησιμοποίηση, τα οποία προδιαγράφονται από ένα αντικρουόμενο σύνολο λειτουργικών, τεχνολογικών και κατασκευαστικών περιορισμών. Η παρούσα διατριβή στοχεύει στην αντιμετώπιση των παραπάνω σχεδιαστικών προκλήσεων, μέσω της ανάπτυξης μεθοδολογιών και εργαλείων συνσχεδίασης υλικού/λογισμικού που επιτρέπουν την ταχεία υλοποίηση καθώς και την αποδοτική σύνθεση αρχιτεκτονικών λύσεων, οι οποίες προδιαγράφουν τα μετα-χαρακτηριστικά λειτουργίας που απαιτεί η σύγχρονη αγορά. Συγκεκριμένα, στα πλαίσια αυτής της διατριβής, παρουσιάζονται α) μεθοδολογίες επιτάχυνσης της ροής σχεδιασμού τόσο για επαναδιαμορφούμενες όσο και για εξειδικευμένες αρχιτεκτονικές, β) ετερογενή αδρομερή αρχιτεκτονικά πρότυπα επιτάχυνσης επεξεργασίας και επικοινωνίας και γ) αποδοτικές τεχνικές πολυκριτηριακής σύνθεσης τόσο σε υψηλό αφαιρετικό επίπεδο προγραμματισμού,όσο και σε φυσικό επίπεδο πυριτίου.Αναφορικά προς την επιτάχυνση της ροής σχεδιασμού, προτείνεται μια μεθοδολογία που χρησιμοποιεί εικονικές πλατφόρμες, οι οποίες αφαιρώντας τις αρχιτεκτονικές λεπτομέρειες καταφέρνουν να μειώσουν σημαντικά το χρόνο εξομοίωσης. Παράλληλα, εισηγείται η συστημική συν-εξομοίωση με τη χρήση επαναδιαμορφούμενων πλατφορμών, ως μέσων επιτάχυνσης. Με αυτόν τον τρόπο, ο κύκλος ανάπτυξης ενός προϊόντος υλικού, μετατεθειμένος από την κάθετη σειριακή ροή σε έναν κυκλικό αλληλεπιδραστικό βρόγχο, καθίσταται ταχύτερος, ενώ οι δυνατότητες προσομοίωσης εμπλουτίζονται με αποδοτικότερες μεθόδους εντοπισμού και διόρθωσης σχεδιαστικών σφαλμάτων, καθώς και μεθόδους ελέγχου των μετρικών απόδοσης του συστήματος σε σχέση με τις επιθυμητές προδιαγραφές, σε όλες τις φάσεις ανάπτυξης του συστήματος. Σε ορθογώνια συνάφεια με το προαναφερθέν μεθοδολογικό πλαίσιο, προτείνονται νέα αρχιτεκτονικά πρότυπα που στοχεύουν στη γεφύρωση του χάσματος μεταξύ της σχεδιαστικής πολυπλοκότητας και της τεχνολογικής παραγωγικότητας, με τη χρήση συστημάτων εξειδικευμένων επιταχυντών υλικού σε ετερογενή συστήματα-σε-ψηφίδα καθώς και δίκτυα-σε-ψηφίδα. Παρουσιάζεται κατάλληλη μεθοδολογία συν-σχεδίασης των επιταχυντών υλικού και του λογισμικού προκειμένου να αποφασισθεί η κατανομή των εργασιών στους διαθέσιμους πόρους του συστήματος/δικτύου. Το μεθοδολογικό πλαίσιο προβλέπει την υλοποίηση των επιταχυντών είτε με συμβατικές μεθόδους προγραμματισμού σε γλώσσα περιγραφής υλικού είτε με αφαιρετικό προγραμματιστικό μοντέλο με τη χρήση τεχνικών υψηλού επιπέδου σύνθεσης. Σε κάθε περίπτωση, δίδεται η δυνατότητα στο σχεδιαστή για βελτιστοποίηση συστημικών μετρικών, όπως η ταχύτητα επεξεργασίας, η ρυθμαπόδοση, η αξιοπιστία, η κατανάλωση ενέργειας και η επιφάνεια πυριτίου του σχεδιασμού. Τέλος, προκειμένου να αντιμετωπισθεί η αυξημένη πολυπλοκότητα στα σχεδιαστικά εργαλεία επαναδιαμορφούμενων συστημάτων, προτείνονται νέοι εξελικτικοί αλγόριθμοι πολυκριτηριακής βελτιστοποίησης, οι οποίοι εκμεταλλευόμενοι τους σύγχρονους πολυπύρηνους επεξεργαστές και την αδρομερή φύση των πολυνηματικών περιβαλλόντων προγραμματισμού (π.χ. OpenMP), μειώνουν το χρόνο επίλυσης του προβλήματος της τοποθέτησης των λογικών πόρων σε φυσικούς,ενώ ταυτόχρονα, ομαδοποιώντας τις εφαρμογές βάση των εγγενών χαρακτηριστικών τους, διερευνούν αποτελεσματικότερα το χώρο σχεδίασης.Η αποδοτικότητά των προτεινόμενων αρχιτεκτονικών προτύπων και μεθοδολογιών επαληθεύτηκε σε σχέση με τις υφιστάμενες λύσεις αιχμής τόσο σε αυτοτελής εφαρμογές, όπως η ψηφιακή επεξεργασία σήματος, τα πολυμέσα και τα προβλήματα αριθμητικής πολυπλοκότητας, καθώς και σε συστημικά ετερογενή περιβάλλοντα, όπως ένα σύστημα όρασης υπολογιστών για αυτόνομα διαστημικά ρομποτικά οχήματα και ένα σύστημα πολλαπλών επιταχυντών υλικού για σταθμούς εργασίας και κέντρα δεδομένων, στοχεύοντας εφαρμογές υψηλής υπολογιστικής απόδοσης (HPC). Τα αποτελέσματα ενισχύουν την πεποίθηση του γράφοντα, ότι η παρούσα διατριβή παρέχει ανταγωνιστική τεχνογνωσία για την αντιμετώπιση των πολύπλοκων σύγχρονων και προβλεπόμενα μελλοντικών σχεδιαστικών προκλήσεων

    Address Space Layout Randomization Next Generation

    Get PDF
    [EN] Systems that are built using low-power computationally-weak devices, which force developers to favor performance over security; which jointly with its high connectivity, continuous and autonomous operation makes those devices specially appealing to attackers. ASLR (Address Space Layout Randomization) is one of the most effective mitigation techniques against remote code execution attacks, but when it is implemented in a practical system its effectiveness is jeopardized by multiple constraints: the size of the virtual memory space, the potential fragmentation problems, compatibility limitations, etc. As a result, most ASLR implementations (specially in 32-bits) fail to provide the necessary protection. In this paper we propose a taxonomy of all ASLR elements, which categorizes the entropy in three dimensions: (1) how, (2) when and (3) what; and includes novel forms of entropy. Based on this taxonomy we have created, ASLRA, an advanced statistical analysis tool to assess the effectiveness of any ASLR implementation. Our analysis show that all ASLR implementations suffer from several weaknesses, 32-bit systems provide a poor ASLR, and OS X has a broken ASLR in both 32- and 64-bit systems. This is jeopardizing not only servers and end users devices as smartphones but also the whole IoT ecosystem. To overcome all these issues, we present ASLR-NG, a novel ASLR that provides the maximum possible absolute entropy and removes all correlation attacks making ASLR-NG the best solution for both 32- and 64-bit systems. We implemented ASLR-NG in the Linux kernel 4.15. The comparative evaluation shows that ASLR-NG overcomes PaX, Linux and OS X implementations, providing strong protection to prevent attackers from abusing weak ASLRs.Marco-Gisbert, H.; Ripoll-Ripoll, I. (2019). Address Space Layout Randomization Next Generation. Applied Sciences. 9(14):1-25. https://doi.org/10.3390/app9142928S125914Aga, M. T., & Austin, T. (2019). Smokestack: Thwarting DOP Attacks with Runtime Stack Layout Randomization. 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). doi:10.1109/cgo.2019.8661202Object Size Checking to Prevent (Some) Buffer Overflows (GCC FORTIFY) http://gcc.gnu.org/ml/gcc-patches/2004-09/msg02055.htmlShahriar, H., & Zulkernine, M. (2012). Mitigating program security vulnerabilities. ACM Computing Surveys, 44(3), 1-46. doi:10.1145/2187671.2187673Carlier, M., Steenhaut, K., & Braeken, A. (2019). Symmetric-Key-Based Security for Multicast Communication in Wireless Sensor Networks. Computers, 8(1), 27. doi:10.3390/computers8010027Choudhary, J., Balasubramanian, P., Varghese, D., Singh, D., & Maskell, D. (2019). Generalized Majority Voter Design Method for N-Modular Redundant Systems Used in Mission- and Safety-Critical Applications. Computers, 8(1), 10. doi:10.3390/computers8010010Shacham, H., Page, M., Pfaff, B., Goh, E.-J., Modadugu, N., & Boneh, D. (2004). On the effectiveness of address-space randomization. Proceedings of the 11th ACM conference on Computer and communications security - CCS ’04. doi:10.1145/1030083.1030124Marco-Gisbert, H., & Ripoll, I. (2013). Preventing Brute Force Attacks Against Stack Canary Protection on Networking Servers. 2013 IEEE 12th International Symposium on Network Computing and Applications. doi:10.1109/nca.2013.12Friginal, J., de Andres, D., Ruiz, J.-C., & Gil, P. (2010). Attack Injection to Support the Evaluation of Ad Hoc Networks. 2010 29th IEEE Symposium on Reliable Distributed Systems. doi:10.1109/srds.2010.11Jun Xu, Kalbarczyk, Z., & Iyer, R. K. (s. f.). Transparent runtime randomization for security. 22nd International Symposium on Reliable Distributed Systems, 2003. Proceedings. doi:10.1109/reldis.2003.1238076Zhan, X., Zheng, T., & Gao, S. (2014). Defending ROP Attacks Using Basic Block Level Randomization. 2014 IEEE Eighth International Conference on Software Security and Reliability-Companion. doi:10.1109/sere-c.2014.28Iyer, V., Kanitkar, A., Dasgupta, P., & Srinivasan, R. (2010). Preventing Overflow Attacks by Memory Randomization. 2010 IEEE 21st International Symposium on Software Reliability Engineering. doi:10.1109/issre.2010.22Van der Veen, V., dutt-Sharma, N., Cavallaro, L., & Bos, H. (2012). Memory Errors: The Past, the Present, and the Future. Lecture Notes in Computer Science, 86-106. doi:10.1007/978-3-642-33338-5_5PaX Address Space Layout Randomization (ASLR) http://pax.grsecurity.net/docs/aslr.txtKernel Address Space Layout Randomization https://lwn.net/Articles/569635Rahman, M. A., & Asyhari, A. T. (2019). The Emergence of Internet of Things (IoT): Connecting Anything, Anywhere. Computers, 8(2), 40. doi:10.3390/computers8020040Bojinov, H., Boneh, D., Cannings, R., & Malchev, I. (2011). Address space randomization for mobile devices. Proceedings of the fourth ACM conference on Wireless network security - WiSec ’11. doi:10.1145/1998412.1998434Hiser, J., Nguyen-Tuong, A., Co, M., Hall, M., & Davidson, J. W. (2012). ILR: Where’d My Gadgets Go? 2012 IEEE Symposium on Security and Privacy. doi:10.1109/sp.2012.39Xu, H., & Chapin, S. J. (2009). Address-space layout randomization using code islands. Journal of Computer Security, 17(3), 331-362. doi:10.3233/jcs-2009-0322Wartell, R., Mohan, V., Hamlen, K. W., & Lin, Z. (2012). Binary stirring. Proceedings of the 2012 ACM conference on Computer and communications security - CCS ’12. doi:10.1145/2382196.2382216Growable Maps Removal https://lwn.net/Articles/294001/Silent Stack-Heap Collision under GNU/Linux https://gcc.gnu.org/ml/gcc-help/2014-07/msg00076.htmlAMD Bulldozer Linux ASLR Weakness: Reducing Entropy by 87.5% http://hmarco.org/bugs/AMD-Bulldozer-linux-ASLR-weakness-reducing-mmaped-files-by-eight.htmlCVE-2015-1593—Linux ASLR Integer Overflow: Reducing Stack Entropy by Four http://hmarco.org/bugs/linux-ASLR-integer-overflow.htmlLinux ASLR Mmap Weakness: Reducing Entropy by Half http://hmarco.org/bugs/linux-ASLR-reducing-mmap-by-half.htmlLESNE, A. (2014). Shannon entropy: a rigorous notion at the crossroads between probability, information theory, dynamical systems and statistical physics. Mathematical Structures in Computer Science, 24(3). doi:10.1017/s0960129512000783Scraps of Notes on Remote Stack Overflow Exploitation http://www.phrack.org/issues.html?issue=67&id=13#articleUchenick, G. M., & Vanfleet, W. M. (2005). Multiple independent levels of safety and security: high assurance architecture for MSLS/MLS. MILCOM 2005 - 2005 IEEE Military Communications Conference. doi:10.1109/milcom.2005.1605749Lee, B., Lu, L., Wang, T., Kim, T., & Lee, W. (2014). From Zygote to Morula: Fortifying Weakened ASLR on Android. 2014 IEEE Symposium on Security and Privacy. doi:10.1109/sp.2014.34The Heartbleed Bug http://heartbleed.co

    Machine Learning for Microcontroller-Class Hardware -- A Review

    Full text link
    The advancements in machine learning opened a new opportunity to bring intelligence to the low-end Internet-of-Things nodes such as microcontrollers. Conventional machine learning deployment has high memory and compute footprint hindering their direct deployment on ultra resource-constrained microcontrollers. This paper highlights the unique requirements of enabling onboard machine learning for microcontroller class devices. Researchers use a specialized model development workflow for resource-limited applications to ensure the compute and latency budget is within the device limits while still maintaining the desired performance. We characterize a closed-loop widely applicable workflow of machine learning model development for microcontroller class devices and show that several classes of applications adopt a specific instance of it. We present both qualitative and numerical insights into different stages of model development by showcasing several use cases. Finally, we identify the open research challenges and unsolved questions demanding careful considerations moving forward.Comment: Accepted for publication at IEEE Sensors Journa

    High-Performance Concurrent Memory Allocation

    Get PDF
    Memory management takes a sequence of program-generated allocation/deallocation requests and attempts to satisfy them within a fixed-sized block of memory while minimizing the total amount of memory used. A general-purpose dynamic-allocation algorithm cannot anticipate future allocation requests so its output is rarely optimal. However, memory allocators do take advantage of regularities in allocation patterns for typical programs to produce excellent results, both in time and space (similar to LRU paging). In general, allocators use a number of similar techniques, each optimizing specific allocation patterns. Nevertheless, memory allocators are a series of compromises, occasionally with some static or dynamic tuning parameters to optimize specific program-request patterns. The goal of this thesis is to build a low-latency memory allocator for both kernel and user multi-threaded systems, which is competitive with the best current memory allocators, while extending the feature set of existing and new allocator routines. A new llheap memory-allocator is created that achieves all of these goals, while maintaining and managing sticky allocation properties for zero-filled and aligned allocations without a performance loss. Hence, it becomes possible to use @realloc@ frequently as a safe operation, rather than just occasionally, because it preserves sticky properties when enlarging storage requests. Furthermore, the ability to query sticky properties and information allows programmers to write safer programs, as it is possible to dynamically match allocation styles from unknown library routines that return allocations. The C allocation API is also extended with @resize@, advanced @realloc@, @aalloc@, @amemalign@, and @cmemalign@ so programmers do not make mistakes writing theses useful allocation operations. llheap is embedded into the uC++ and C-for-all runtime systems, both of which have user-level threading. The ability to use C-for-all's advanced type-system (and possibly C++'s too) to combine advanced memory operations into one allocation routine using named arguments shows how far the allocation API can be pushed, which increases safety and greatly simplifies programmer's use of dynamic allocation. The llheap allocator also provides comprehensive statistics for all allocation operations, which are invaluable in understanding and debugging a program's dynamic behaviour. No other memory allocator examined in the thesis provides such comprehensive statistics gathering. As well, llheap provides a debugging mode where allocations are checked with internal pre/post conditions and invariants. It is extremely useful, especially for students. While not as powerful as the @valgrind@ interpreter, a large number of allocations mistakes are detected. Finally, contention-free statistics gathering and debugging have a low enough cost to be used in production code. A micro-benchmark test-suite is started for comparing allocators, rather than relying on a suite of arbitrary programs. It has been an interesting challenge. These micro-benchmarks have adjustment knobs to simulate allocation patterns hard-coded into arbitrary test programs. Existing memory allocators, glibc, dlmalloc, hoard, jemalloc, ptmalloc3, rpmalloc, tbmalloc, and the new allocator llheap are all compared using the new micro-benchmark test-suite

    Energy-Efficient and Reliable Computing in Dark Silicon Era

    Get PDF
    Dark silicon denotes the phenomenon that, due to thermal and power constraints, the fraction of transistors that can operate at full frequency is decreasing in each technology generation. Moore’s law and Dennard scaling had been backed and coupled appropriately for five decades to bring commensurate exponential performance via single core and later muti-core design. However, recalculating Dennard scaling for recent small technology sizes shows that current ongoing multi-core growth is demanding exponential thermal design power to achieve linear performance increase. This process hits a power wall where raises the amount of dark or dim silicon on future multi/many-core chips more and more. Furthermore, from another perspective, by increasing the number of transistors on the area of a single chip and susceptibility to internal defects alongside aging phenomena, which also is exacerbated by high chip thermal density, monitoring and managing the chip reliability before and after its activation is becoming a necessity. The proposed approaches and experimental investigations in this thesis focus on two main tracks: 1) power awareness and 2) reliability awareness in dark silicon era, where later these two tracks will combine together. In the first track, the main goal is to increase the level of returns in terms of main important features in chip design, such as performance and throughput, while maximum power limit is honored. In fact, we show that by managing the power while having dark silicon, all the traditional benefits that could be achieved by proceeding in Moore’s law can be also achieved in the dark silicon era, however, with a lower amount. Via the track of reliability awareness in dark silicon era, we show that dark silicon can be considered as an opportunity to be exploited for different instances of benefits, namely life-time increase and online testing. We discuss how dark silicon can be exploited to guarantee the system lifetime to be above a certain target value and, furthermore, how dark silicon can be exploited to apply low cost non-intrusive online testing on the cores. After the demonstration of power and reliability awareness while having dark silicon, two approaches will be discussed as the case study where the power and reliability awareness are combined together. The first approach demonstrates how chip reliability can be used as a supplementary metric for power-reliability management. While the second approach provides a trade-off between workload performance and system reliability by simultaneously honoring the given power budget and target reliability

    Genetic Improvement of Software for Energy E ciency in Noisy and Fragmented Eco-Systems

    Get PDF
    Software has made its way to every aspect of our daily life. Users of smart devices expect almost continuous availability and uninterrupted service. However, such devices operate on restricted energy resources. As energy eficiency of software is relatively a new concern for software practitioners, there is a lack of knowledge and tools to support the development of energy eficient software. Optimising the energy consumption of software requires measuring or estimating its energy use and then optimising it. Generalised models of energy behaviour suffer from heterogeneous and fragmented eco-systems (i.e. diverse hardware and operating systems). The nature of such optimisation environments favours in-vivo optimisation which provides the ground-truth for energy behaviour of an application on a given platform. One key challenge in in-vivo energy optimisation is noisy energy readings. This is because complete isolation of the effects of software optimisation is simply infeasible, owing to random and systematic noise from the platform. In this dissertation we explore in-vivo optimisation using Genetic Improvement of Software (GI) for energy eficiency in noisy and fragmented eco-systems. First, we document expected and unexpected technical challenges and their solutions when conducting energy optimisation experiments. This can be used as guidelines for software practitioners when conducting energy related experiments. Second, we demonstrate the technical feasibility of in-vivo energy optimisation using GI on smart devices. We implement a new approach for mitigating noisy readings based on simple code rewrite. Third, we propose a new conceptual framework to determine the minimum number of samples required to show significant differences between software variants competing in tournaments. We demonstrate that the number of samples can vary drastically between different platforms as well as from one point of time to another within a single platform. It is crucial to take into consideration these observations when optimising in the wild or across several devices in a control environment. Finally, we implement a new validation approach for energy optimisation experiments. Through experiments, we demonstrate that the current validation approaches can mislead software practitioners to draw wrong conclusions. Our approach outperforms the current validation techniques in terms of specificity and sensitivity in distinguishing differences between validation solutions.Thesis (Ph.D.) -- University of Adelaide, School of Computer Science, 202

    Exploring Dynamic Compilation and Cross-Layer Object Management Policies for Managed Language Applications

    Get PDF
    Recent years have witnessed the widespread adoption of managed programming languages that are designed to execute on virtual machines. Virtual machine architectures provide several powerful software engineering advantages over statically compiled binaries, such as portable program representations, additional safety guarantees, automatic memory and thread management, and dynamic program composition, which have largely driven their success. To support and facilitate the use of these features, virtual machines implement a number of services that adaptively manage and optimize application behavior during execution. Such runtime services often require tradeoffs between efficiency and effectiveness, and different policies can have major implications on the system's performance and energy requirements. In this work, we extensively explore policies for the two runtime services that are most important for achieving performance and energy efficiency: dynamic (or Just-In-Time (JIT)) compilation and memory management. First, we examine the properties of single-tier and multi-tier JIT compilation policies in order to find strategies that realize the best program performance for existing and future machines. Our analysis performs hundreds of experiments with different compiler aggressiveness and optimization levels to evaluate the performance impact of varying if and when methods are compiled. We later investigate the issue of how to optimize program regions to maximize performance in JIT compilation environments. For this study, we conduct a thorough analysis of the behavior of optimization phases in our dynamic compiler, and construct a custom experimental framework to determine the performance limits of phase selection during dynamic compilation. Next, we explore innovative memory management strategies to improve energy efficiency in the memory subsystem. We propose and develop a novel cross-layer approach to memory management that integrates information and analysis in the VM with fine-grained management of memory resources in the operating system. Using custom as well as standard benchmark workloads, we perform detailed evaluation that demonstrates the energy-saving potential of our approach. We implement and evaluate all of our studies using the industry-standard Oracle HotSpot Java Virtual Machine to ensure that our conclusions are supported by widely-used, state-of-the-art runtime technology
    corecore