62 research outputs found

    Exploiting Adaptive Techniques to Improve Processor Energy Efficiency

    Get PDF
    Rapid device-miniaturization keeps on inducing challenges in building energy efficient microprocessors. As the size of the transistors continuously decreasing, more uncertainties emerge in their operations. On the other hand, integrating more and more transistors on a single chip accentuates the need to lower its supply-voltage. This dissertation investigates one of the primary device uncertainties - timing error, in microprocessor performance bottleneck in NTC era. Then it proposes various innovative techniques to exploit these opportunities to maintain processor energy efficiency, in the context of emerging challenges. Evaluated with the cross-layer methodology, the proposed approaches achieve substantial improvements in processor energy efficiency, compared to other start-of-art techniques

    Cross layer reliability estimation for digital systems

    Get PDF
    Forthcoming manufacturing technologies hold the promise to increase multifuctional computing systems performance and functionality thanks to a remarkable growth of the device integration density. Despite the benefits introduced by this technology improvements, reliability is becoming a key challenge for the semiconductor industry. With transistor size reaching the atomic dimensions, vulnerability to unavoidable fluctuations in the manufacturing process and environmental stress rise dramatically. Failing to meet a reliability requirement may add excessive re-design cost to recover and may have severe consequences on the success of a product. %Worst-case design with large margins to guarantee reliable operation has been employed for long time. However, it is reaching a limit that makes it economically unsustainable due to its performance, area, and power cost. One of the open challenges for future technologies is building ``dependable'' systems on top of unreliable components, which will degrade and even fail during normal lifetime of the chip. Conventional design techniques are highly inefficient. They expend significant amount of energy to tolerate the device unpredictability by adding safety margins to a circuit's operating voltage, clock frequency or charge stored per bit. Unfortunately, the additional cost introduced to compensate unreliability are rapidly becoming unacceptable in today's environment where power consumption is often the limiting factor for integrated circuit performance, and energy efficiency is a top concern. Attention should be payed to tailor techniques to improve the reliability of a system on the basis of its requirements, ending up with cost-effective solutions favoring the success of the product on the market. Cross-layer reliability is one of the most promising approaches to achieve this goal. Cross-layer reliability techniques take into account the interactions between the layers composing a complex system (i.e., technology, hardware and software layers) to implement efficient cross-layer fault mitigation mechanisms. Fault tolerance mechanism are carefully implemented at different layers starting from the technology up to the software layer to carefully optimize the system by exploiting the inner capability of each layer to mask lower level faults. For this purpose, cross-layer reliability design techniques need to be complemented with cross-layer reliability evaluation tools, able to precisely assess the reliability level of a selected design early in the design cycle. Accurate and early reliability estimates would enable the exploration of the system design space and the optimization of multiple constraints such as performance, power consumption, cost and reliability. This Ph.D. thesis is devoted to the development of new methodologies and tools to evaluate and optimize the reliability of complex digital systems during the early design stages. More specifically, techniques addressing hardware accelerators (i.e., FPGAs and GPUs), microprocessors and full systems are discussed. All developed methodologies are presented in conjunction with their application to real-world use cases belonging to different computational domains

    SoK: Understanding BFT Consensus in the Age of Blockchains

    Get PDF
    Blockchain as an enabler to current Internet infrastructure has provided many unique features and revolutionized current distributed systems into a new era. Its decentralization, immutability, and transparency have attracted many applications to adopt the design philosophy of blockchain and customize various replicated solutions. Under the hood of blockchain, consensus protocols play the most important role to achieve distributed replication systems. The distributed system community has extensively studied the technical components of consensus to reach agreement among a group of nodes. Due to trust issues, it is hard to design a resilient system in practical situations because of the existence of various faults. Byzantine fault-tolerant (BFT) state machine replication (SMR) is regarded as an ideal candidate that can tolerate arbitrary faulty behaviors. However, the inherent complexity of BFT consensus protocols and their rapid evolution makes it hard to practically adapt themselves into application domains. There are many excellent Byzantine-based replicated solutions and ideas that have been contributed to improving performance, availability, or resource efficiency. This paper conducts a systematic and comprehensive study on BFT consensus protocols with a specific focus on the blockchain era. We explore both general principles and practical schemes to achieve consensus under Byzantine settings. We then survey, compare, and categorize the state-of-the-art solutions to understand BFT consensus in detail. For each representative protocol, we conduct an in-depth discussion of its most important architectural building blocks as well as the key techniques they used. We aim that this paper can provide system researchers and developers a concrete view of the current design landscape and help them find solutions to concrete problems. Finally, we present several critical challenges and some potential research directions to advance the research on exploring BFT consensus protocols in the age of blockchains

    State-of-the-art Assessment For Simulated Forces

    Get PDF
    Summary of the review of the state of the art in simulated forces conducted to support the research objectives of Research and Development for Intelligent Simulated Forces

    Context-aware task scheduling in distributed computing systems

    Full text link
    These days, the popularity of technologies such as machine learning, augmented reality, and big data analytics is growing dramatically. This leads to a higher demand of computational power not only for IT professionals but also for ordinary device users who benefit from new applications. At the same time, the computational performance of end-user devices increases to meet the demands of these resource-hungry applications. As a result, there is a coexistence of a huge demand of computational power on the one side and a large pool of computational resources on the other side. Bringing these two sides together is the idea of computational resource sharing systems which allow applications to forward computationally intensive workload to remote resources. This technique is often used in cloud computing where customers can rent computational power. However, we argue that not only cloud resources can be used as offloading targets. Rather, idle CPU cycles from end-user administered devices at the edge of the network can be spontaneously leveraged as well. Edge devices, however, are not only heterogeneous in their hardware and software capabilities, they also do not provide any guarantees in terms of reliability or performance. Does it mean that either the applications that require further guarantees or the unpredictable resources need to be excluded from such a sharing system? In this thesis, we propose a solution to this problem by introducing the Tasklet system, our approach for a computational resource sharing system. The Tasklet system supports computation offloading to arbitrary types of devices, including stable cloud instances as well as unpredictable end-user owned edge resources. Therefore, the Tasklet system is structured into multiple layers. The lowest layer is a best-effort resource sharing system which provides lightweight task scheduling and execution. Here, best-effort means that in case of a failure, the task execution is dropped and that tasks are allocated to resources randomly. To provide execution guarantees such as a reliable or timely execution, we add a Quality of Computation (QoC) layer on top of the best-effort execution layer. The QoC layer enforces the guarantees for applications by using a context-aware task scheduler which monitors the available resources in the computing environment and performs the matchmaking between resources and tasks based on the current state of the system. As edge resources are controlled by individuals, we consider the fact that these users need to be able to decide with whom they want to share their resources and for which price. Thus, we add a social layer on top of the system that allows users to establish friendship connections which can then be leveraged for social-aware task allocation and accounting of shared computation

    Methodologies for Accelerated Analysis of the Reliability and the Energy Efficiency Levels of Modern Microprocessor Architectures

    Get PDF
    Η εξέλιξη της τεχνολογίας ημιαγωγών, της αρχιτεκτονικής υπολογιστών και της σχεδίασης οδηγεί σε αύξηση της απόδοσης των σύγχρονων μικροεπεξεργαστών, η οποία επίσης συνοδεύεται από αύξηση της ευπάθειας των προϊόντων. Οι σχεδιαστές εφαρμόζουν διάφορες τεχνικές κατά τη διάρκεια της ζωής των ολοκληρωμένων κυκλωμάτων με σκοπό να διασφαλίσουν τα υψηλά επίπεδα αξιοπιστίας των παραγόμενων προϊόντων και να τα προστατέψουν από διάφορες κατηγορίες σφαλμάτων διασφαλίζοντας την ορθή λειτουργία τους. Αυτή η διδακτορική διατριβή προτείνει καινούριες μεθόδους για να διασφαλίσει τα υψηλά επίπεδα αξιοπιστίας και ενεργειακής απόδοσης των σύγχρονων μικροεπεξεργαστών οι οποίες μπορούν να εφαρμοστούν κατά τη διάρκεια του πρώιμου σχεδιαστικού σταδίου, του σταδίου παραγωγής ή του σταδίου της κυκλοφορίας των ολοκληρωμένων κυκλωμάτων στην αγορά. Οι συνεισφορές αυτής της διατριβής μπορούν να ομαδοποιηθούν στις ακόλουθες δύο κατηγορίες σύμφωνα με το στάδιο της ζωής των μικροεπεξεργαστών στο οποίο εφαρμόζονται: • Πρώιμο σχεδιαστικό στάδιο: Η στατιστική εισαγωγή σφαλμάτων σε δομές που είναι μοντελοποιημένες σε προσομοιωτές οι οποίοι στοχεύουν στην μελέτη της απόδοσης είναι μια επιστημονικά καθιερωμένη μέθοδος για την ακριβή μέτρηση της αξιοπιστίας, αλλά υστερεί στον αργό χρόνο εκτέλεσης. Σε αυτή τη διατριβή, αρχικά παρουσιάζουμε ένα νέο πλήρως αυτοματοποιημένο εργαλείο εισαγωγής σφαλμάτων σε μικροαρχιτεκτονικό επίπεδο που στοχεύει στην ακριβή αξιολόγηση της αξιοπιστίας ενός μεγάλου πλήθους μονάδων υλικού σε σχέση με διάφορα μοντέλα σφαλμάτων (παροδικά, διακοπτόμενα, μόνιμα σφάλματα). Στη συνέχεια, χρησιμοποιώντας το ίδιο εργαλείο και στοχεύοντας τα παροδικά σφάλματα, παρουσιάζουμε διάφορες μελέτες σχετιζόμενες με την αξιοπιστία και την απόδοση, οι οποίες μπορούν να βοηθήσουν τις σχεδιαστικές αποφάσεις στα πρώιμα στάδια της ζωής των επεξεργαστών. Τελικά, προτείνουμε δύο μεθοδολογίες για να επιταχύνουμε τα μαζικά πειράματα στατιστικής εισαγωγής σφαλμάτων. Στην πρώτη, επιταχύνουμε τα πειράματα έπειτα από την πραγματική εισαγωγή των σφαλμάτων στις δομές του υλικού. Στη δεύτερη, επιταχύνουμε ακόμη περισσότερο τα πειράματα προτείνοντας τη μεθοδολογία με όνομα MeRLiN, η οποία βασίζεται στη μείωση της αρχικής λίστας σφαλμάτων μέσω της ομαδοποίησής τους σε ισοδύναμες ομάδες έπειτα από κατηγοριοποίηση σύμφωνα με την εντολή που τελικά προσπελαύνει τη δομή που φέρει το σφάλμα. • Παραγωγικό στάδιο και στάδιο κυκλοφορίας στην αγορά: Οι συνεισφορές αυτής της διδακτορικής διατριβής σε αυτά τα στάδια της ζωής των μικροεπεξεργαστών καλύπτουν δύο σημαντικά επιστημονικά πεδία. Αρχικά, χρησιμοποιώντας το ολοκληρωμένο κύκλωμα των 48 πυρήνων με ονομασία Intel SCC, προτείνουμε μια τεχνική επιτάχυνσης του εντοπισμού μονίμων σφαλμάτων που εφαρμόζεται κατά τη διάρκεια λειτουργίας αρχιτεκτονικών με πολλούς πυρήνες, η οποία εκμεταλλεύεται το δίκτυο υψηλής ταχύτητας μεταφοράς μηνυμάτων που διατίθεται στα ολοκληρωμένα κυκλώματα αυτού του είδους. Δεύτερον, προτείνουμε μια λεπτομερή στατιστική μεθοδολογία με σκοπό την ακριβή πρόβλεψη σε επίπεδο συστήματος των ασφαλών ορίων λειτουργίας της τάσης των πυρήνων τύπου ARMv8 που βρίσκονται πάνω στη CPU X-Gene 2.The evolution in semiconductor manufacturing technology, computer architecture and design leads to increase in performance of modern microprocessors, which is also accompanied by increase in products’ vulnerability to errors. Designers apply different techniques throughout microprocessors life-time in order to ensure the high reliability requirements of the delivered products that are defined as their ability to avoid service failures that are more frequent and more severe than is acceptable. This thesis proposes novel methods to guarantee the high reliability and energy efficiency requirements of modern microprocessors that can be applied during the early design phase, the manufacturing phase or after the chips release to the market. The contributions of this thesis can be grouped in the two following categories according to the phase of the CPUs lifecycle that are applied at: • Early design phase: Statistical fault injection using microarchitectural structures modeled in performance simulators is a state-of-the-art method to accurately measure the reliability, but suffers from low simulation throughput. In this thesis, we firstly present a novel fully-automated versatile microarchitecture-level fault injection framework (called MaFIN) for accurate characterization of a wide range of hardware components of an x86-64 microarchitecture with respect to various fault models (transient, intermittent, permanent faults). Next, using the same tool and focusing on transient faults, we present several reliability and performance related studies that can assist design decision in the early design phases. Moreover, we propose two methodologies to accelerate the statistical fault injection campaigns. In the first one, we accelerate the fault injection campaigns after the actual injection of the faults in the simulated hardware structures. In the second, we further accelerate the microarchitecture level fault injection campaigns by proposing MeRLiN a fault pre-processing methodology that is based on the pruning of the initial fault list by grouping the faults in equivalent classes according to the instruction access patterns to hardware entries. • Manufacturing phase and release to the market: The contributions of this thesis in these phases of microprocessors life-cycle cover two important aspects. Firstly, using the 48-core Intel’s SCC architecture, we propose a technique to accelerate online error detection of permanent faults for many-core architectures by exploiting their high-speed message passing on-chip network. Secondly, we propose a comprehensive statistical analysis methodology to accurately predict at the system level the safe voltage operation margins of the ARMv8 cores of the X- Gene 2 chip when it operates in scaled voltage conditions

    Investigating Client In their Relationships with Advertising

    Get PDF
    This study examines how characteristics of performance, relationship investments, general beliefs about relationships and conditions about the internal and external environment are associated with client tolerance toward their experiences in client-advertising agency relationships (CAR), particularly at the service encounter. A critical incident technique is adopted, collecting both negative incidents and positive incidents. The generic aspects of CAR above were conceptualised from a literature review incorporating buyer-seller behaviour, interorganisational relationships, and CAR research. The literature review was refined by qualitative research, involving depth interviews with 14 agencies and 11 client organisations. Respondents were key informants who had decision-making responsibility for their relationships. The depth interviews collectively sought to identify the range of negative and positive critical incidents experienced by clients and their agencies. Additionally, the points of contact between client and agency were identified from which critical incidents were experienced, together with characteristics considered to influence tolerance. These characteristics were screened to identify 32 independent variables and 5 investment variables for a postal survey of clients. The purpose was to identify the main predictors of tolerance. A unique feature of the study was the development of grouping variables designed to measure tolerance, based on seven dependent variables. The amount of incidents experienced per respondent were accounted for in making decisions about tolerance. Discriminant analysis was used to identify predictor variables for each set of grouping variables. This showed that a number of performance variables, investment criteria, client beliefs and environmental conditions were associated with tolerance. Performance predictors included consistent work processes, proactivity and stability of key account management. Additional predictors included beliefs in compatible working styles and less effort in making changes by clients, supporting the view that processes may become more important under stressful times because they facilitate governance. Procedures and processes might also reflect the need to ensure fairness in relationships

    Driving the Network-on-Chip Revolution to Remove the Interconnect Bottleneck in Nanoscale Multi-Processor Systems-on-Chip

    Get PDF
    The sustained demand for faster, more powerful chips has been met by the availability of chip manufacturing processes allowing for the integration of increasing numbers of computation units onto a single die. The resulting outcome, especially in the embedded domain, has often been called SYSTEM-ON-CHIP (SoC) or MULTI-PROCESSOR SYSTEM-ON-CHIP (MP-SoC). MPSoC design brings to the foreground a large number of challenges, one of the most prominent of which is the design of the chip interconnection. With a number of on-chip blocks presently ranging in the tens, and quickly approaching the hundreds, the novel issue of how to best provide on-chip communication resources is clearly felt. NETWORKS-ON-CHIPS (NoCs) are the most comprehensive and scalable answer to this design concern. By bringing large-scale networking concepts to the on-chip domain, they guarantee a structured answer to present and future communication requirements. The point-to-point connection and packet switching paradigms they involve are also of great help in minimizing wiring overhead and physical routing issues. However, as with any technology of recent inception, NoC design is still an evolving discipline. Several main areas of interest require deep investigation for NoCs to become viable solutions: • The design of the NoC architecture needs to strike the best tradeoff among performance, features and the tight area and power constraints of the onchip domain. • Simulation and verification infrastructure must be put in place to explore, validate and optimize the NoC performance. • NoCs offer a huge design space, thanks to their extreme customizability in terms of topology and architectural parameters. Design tools are needed to prune this space and pick the best solutions. • Even more so given their global, distributed nature, it is essential to evaluate the physical implementation of NoCs to evaluate their suitability for next-generation designs and their area and power costs. This dissertation performs a design space exploration of network-on-chip architectures, in order to point-out the trade-offs associated with the design of each individual network building blocks and with the design of network topology overall. The design space exploration is preceded by a comparative analysis of state-of-the-art interconnect fabrics with themselves and with early networkon- chip prototypes. The ultimate objective is to point out the key advantages that NoC realizations provide with respect to state-of-the-art communication infrastructures and to point out the challenges that lie ahead in order to make this new interconnect technology come true. Among these latter, technologyrelated challenges are emerging that call for dedicated design techniques at all levels of the design hierarchy. In particular, leakage power dissipation, containment of process variations and of their effects. The achievement of the above objectives was enabled by means of a NoC simulation environment for cycleaccurate modelling and simulation and by means of a back-end facility for the study of NoC physical implementation effects. Overall, all the results provided by this work have been validated on actual silicon layout

    Methods for Robust and Energy-Efficient Microprocessor Architectures

    Get PDF
    Σήμερα, η εξέλιξη της τεχνολογίας επιτρέπει τη βελτίωση τριών βασικών στοιχείων της σχεδίασης των επεξεργαστών: αυξημένες επιδόσεις, χαμηλότερη κατανάλωση ισχύος και χαμηλότερο κόστος παραγωγής του τσιπ, ενώ οι σχεδιαστές επεξεργαστών έχουν επικεντρωθεί στην παραγωγή επεξεργαστών με περισσότερες λειτουργίες σε χαμηλότερο κόστος. Οι σημερινοί επεξεργαστές είναι πολύ ταχύτεροι και διαθέτουν εξελιγμένες λειτουργικές μονάδες συγκριτικά με τους προκατόχους τους, ωστόσο, καταναλώνουν αρκετά μεγάλη ενέργεια. Τα ποσά ηλεκτρικής ισχύος που καταναλώνονται, και η επακόλουθη έκλυση θερμότητας, αυξάνονται παρά τη μείωση του μεγέθους των τρανζίστορ. Αναπτύσσοντας όλο και πιο εξελιγμένους μηχανισμούς και λειτουργικές μονάδες για την αύξηση της απόδοσης και βελτίωση της ενέργειας, σε συνδυασμό με τη μείωση του μεγέθους των τρανζίστορ, οι επεξεργαστές έχουν γίνει εξαιρετικά πολύπλοκα συστήματα, καθιστώντας τη διαδικασία της επικύρωσής τους σημαντική πρόκληση για τη βιομηχανία ολοκληρωμένων κυκλωμάτων. Συνεπώς, οι κατασκευαστές επεξεργαστών αφιερώνουν επιπλέον χρόνο, προϋπολογισμό και χώρο στο τσιπ για να διασφαλίσουν ότι οι επεξεργαστές θα λειτουργούν σωστά κατά τη διάθεσή τους στη αγορά. Για τους λόγους αυτούς, η εργασία αυτή παρουσιάζει νέες μεθόδους για την επιτάχυνση και τη βελτίωση της φάσης της επικύρωσης, καθώς και για τη βελτίωση της ενεργειακής απόδοσης των σύγχρονων επεξεργαστών. Στο πρώτο μέρος της διατριβής προτείνονται δύο διαφορετικές μέθοδοι για την επικύρωση του επεξεργαστή, οι οποίες συμβάλλουν στην επιτάχυνση αυτής της διαδικασίας και στην αποκάλυψη σπάνιων σφαλμάτων στους μηχανισμούς μετάφρασης διευθύνσεων των σύγχρονων επεξεργαστών. Και οι δύο μέθοδοι καθιστούν ευκολότερη την ανίχνευση και τη διάγνωση σφαλμάτων, και επιταχύνουν την ανίχνευση του σφάλματος κατά τη φάση της επικύρωσης. Στο δεύτερο μέρος της διατριβής παρουσιάζεται μια λεπτομερής μελέτη χαρακτηρισμού των περιθωρίων τάσης σε επίπεδο συστήματος σε δύο σύγχρονους ARMv8 επεξεργαστές. Η μελέτη του χαρακτηρισμού προσδιορίζει τα αυξημένα περιθώρια τάσης που έχουν προκαθοριστεί κατά τη διάρκεια κατασκευής του κάθε μεμονωμένου πυρήνα του επεξεργαστή και αναλύει τυχόν απρόβλεπτες συμπεριφορές που μπορεί να προκύψουν σε συνθήκες μειωμένης τάσης. Για την μελέτη και καταγραφή της συμπεριφοράς του συστήματος υπό συνθήκες μειωμένης τάσης, παρουσιάζεται επίσης σε αυτή τη διατριβή μια απλή και ενοποιημένη συνάρτηση: η συνάρτηση πυκνότητας-σοβαρότητας. Στη συνέχεια, παρουσιάζεται αναλυτικά η ανάπτυξη ειδικά σχεδιασμένων προγραμμάτων (micro-viruses) τα οποία υποβάλουν της θεμελιώδεις δομές του επεξεργαστή σε μεγάλο φορτίο εργασίας. Αυτά τα προγράμματα στοχεύουν στην γρήγορη αναγνώριση των ασφαλών περιθωρίων τάσης. Τέλος, πραγματοποιείται ο χαρακτηρισμός των περιθωρίων τάσης σε εκτελέσεις πολλαπλών πυρήνων, καθώς επίσης και σε διαφορετικές συχνότητες, και προτείνεται ένα πρόγραμμα το οποίο εκμεταλλεύεται όλες τις διαφορετικές πτυχές του προβλήματος της κατανάλωσης ενέργειας και παρέχει μεγάλη εξοικονόμηση ενέργειας διατηρώντας παράλληλα υψηλά επίπεδα απόδοσης. Αυτή η μελέτη έχει ως στόχο τον εντοπισμό και την ανάλυση της σχέσης μεταξύ ενέργειας και απόδοσης σε διαφορετικούς συνδυασμούς τάσης και συχνότητας, καθώς και σε διαφορετικό αριθμό νημάτων/διεργασιών που εκτελούνται στο σύστημα, αλλά και κατανομής των προγραμμάτων στους διαθέσιμους πυρήνες.Technology scaling has enabled improvements in the three major design optimization objectives: increased performance, lower power consumption, and lower die cost, while system design has focused on bringing more functionality into products at lower cost. While today's microprocessors, are much faster and much more versatile than their predecessors, they also consume much power. As operating frequency and integration density increase, the total chip power dissipation increases. This is evident from the fact that due to the demand for increased functionality on a single chip, more and more transistors are being packed on a single die and hence, the switching frequency increases in every technology generation. However, by developing aggressive and sophisticated mechanisms to boost performance and to enhance the energy efficiency in conjunction with the decrease of the size of transistors, microprocessors have become extremely complex systems, making the microprocessor verification and manufacturing testing a major challenge for the semiconductor industry. Manufacturers, therefore, choose to spend extra effort, time, budget and chip area to ensure that the delivered products are operating correctly. To meet high-dependability requirements, manufacturers apply a sequence of verification tasks throughout the entire life-cycle of the microprocessor to ensure the correct functionality of the microprocessor chips from the various types of errors that may occur after the products are released to the market. To this end, this work presents novel methods for ensuring the correctness of the microprocessor during the post-silicon validation phase and for improving the energy efficiency requirements of modern microprocessors. These methods can be applied during the prototyping phase of the microprocessors or after their release to the market. More specifically, in the first part of the thesis, we present and describe two different ISA-independent software-based post-silicon validation methods, which contribute to formalization and modeling as well as the acceleration of the post-silicon validation process and expose difficult-to-find bugs in the address translation mechanisms (ATM) of modern microprocessors. Both methods improve the detection and diagnosis of a hardware design bug in the ATM structures and significantly accelerate the bug detection during the post-silicon validation phase. In the second part of the thesis we present a detailed system-level voltage scaling characterization study for two state-of-the-art ARMv8-based multicore CPUs. We present an extensive characterization study which identifies the pessimistic voltage guardbands (the increased voltage margins set by the manufacturer) of each individual microprocessor core and analyze any abnormal behavior that may occur in off-nominal voltage conditions. Towards the formalization of the any abnormal behavior we also present a simple consolidated function; the Severity function, which aggregates the effects of reduced voltage operation. We then introduce the development of dedicated programs (diagnostic micro-viruses) that aim to accelerate the time-consuming voltage margins characterization studies by stressing the fundamental hardware components. Finally, we present a comprehensive exploration of how two server-grade systems behave in different frequency and core allocation configurations beyond nominal voltage operation in multicore executions. This analysis aims (1) to identify the best performance per watt operation points, (2) to reveal how and why the different core allocation options affect the energy consumption, and (3) to enhance the default Linux scheduler to take task allocation decisions for balanced performance and energy efficiency

    Software-based and regionally-oriented traffic management in Networks-on-Chip

    Get PDF
    Since the introduction of chip-multiprocessor systems, the number of integrated cores has been steady growing and workload applications have been adapted to exploit the increasing parallelism. This changed the importance of efficient on-chip communication significantly and the infrastructure has to keep step with these new requirements. The work at hand makes significant contributions to the state-of-the-art of the latest generation of such solutions, called Networks-on-Chip, to improve the performance, reliability, and flexible management of these on-chip infrastructures
    corecore