91 research outputs found

    Dynamic voltage and frequency scaling with multi-clock distribution systems on SPARC core

    Get PDF
    The current implementation of dynamic voltage and frequency scaling (DVS and DFS) in microprocessors is based on a single clock domain per core. In architectures that adopt Instruction Level Parallelism (ILP), multiple execution units may exist and operate concurrently. Performing DVS and DFS on such cores may result in low utilization and power efficiency. In this thesis, a methodology that implements DVFS with multi Clock distribution Systems (DCS) is applied on a processor core to achieve higher throughput and better power efficiency. DCS replaces the core single clock distribution tree with multi-clock domain systems which, along with dynamic voltage and frequency scaling, creates multiple clock-voltage domains. DCS implements a self-timed interface between the different domains to maintain functionality and ensure data integrity. DCS was implemented on a SPARC core of UltraSPARC T1 architecture, and synthesized targeting TSMC 120nm process technology. Two clock domains were used on SPARC core. The maximum achieved speedup relative to original core was 1.6X. The power consumed by DCS was 0.173mW compared to the core total power of ~ 10W

    Software-Based Self-Test of Set-Associative Cache Memories

    Get PDF
    Embedded microprocessor cache memories suffer from limited observability and controllability creating problems during in-system tests. This paper presents a procedure to transform traditional march tests into software-based self-test programs for set-associative cache memories with LRU replacement. Among all the different cache blocks in a microprocessor, testing instruction caches represents a major challenge due to limitations in two areas: 1) test patterns which must be composed of valid instruction opcodes and 2) test result observability: the results can only be observed through the results of executed instructions. For these reasons, the proposed methodology will concentrate on the implementation of test programs for instruction caches. The main contribution of this work lies in the possibility of applying state-of-the-art memory test algorithms to embedded cache memories without introducing any hardware or performance overheads and guaranteeing the detection of typical faults arising in nanometer CMOS technologie

    Fault Detection Methodology for Caches in Reliable Modern VLSI Microprocessors based on Instruction Set Architectures

    Get PDF
    Η παρούσα διδακτορική διατριβή εισάγει μία χαμηλού κόστους μεθοδολογία για την ανίχνευση ελαττωμάτων σε μικρές ενσωματωμένες κρυφές μνήμες που βασίζεται σε σύγχρονες Αρχιτεκτονικές Συνόλου Εντολών και εφαρμόζεται με λογισμικό αυτοδοκιμής. Η προτεινόμενη μεθοδολογία εφαρμόζει αλγορίθμους March μέσω λογισμικού για την ανίχνευση τόσο ελαττωμάτων αποθήκευσης όταν εφαρμόζεται σε κρυφές μνήμες που περιέχουν μόνο στατικές μνήμες τυχαίας προσπέλασης όπως για παράδειγμα κρυφές μνήμες επιπέδου 1, όσο και ελαττωμάτων σύγκρισης όταν εφαρμόζεται σε κρυφές μνήμες που περιέχουν εκτός από SRAM μνήμες και μνήμες διευθυνσιοδοτούμενες μέσω περιεχομένου, όπως για παράδειγμα πλήρως συσχετιστικές κρυφές μνήμες αναζήτησης μετάφρασης. Η προτεινόμενη μεθοδολογία εφαρμόζεται και στις τρεις οργανώσεις συσχετιστικότητας κρυφής μνήμης και είναι ανεξάρτητη της πολιτικής εγγραφής στο επόμενο επίπεδο της ιεραρχίας. Η μεθοδολογία αξιοποιεί υπάρχοντες ισχυρούς μηχανισμούς των μοντέρνων ISAs χρησιμοποιώντας ειδικές εντολές, που ονομάζονται στην παρούσα διατριβή Εντολές Άμεσης Προσπέλασης Κρυφής Μνήμης (Direct Cache Access Instructions - DCAs). Επιπλέον, η προτεινόμενη μεθοδολογία εκμεταλλεύεται τους έμφυτους μηχανισμούς καταγραφής απόδοσης και τους μηχανισμούς χειρισμού παγίδων που είναι διαθέσιμοι στους σύγχρονους επεξεργαστές. Επιπρόσθετα, η προτεινόμενη μεθοδολογία εφαρμόζει την λειτουργία σύγκρισης των αλγορίθμων March όταν αυτή απαιτείται (για μνήμες CAM) και επαληθεύει το αποτέλεσμα του ελέγχου μέσω σύντομης απόκρισης, ώστε να είναι συμβατή με τις απαιτήσεις του ελέγχου εντός λειτουργίας. Τέλος, στη διατριβή προτείνεται μία βελτιστοποίηση της μεθοδολογίας για πολυνηματικές, πολυπύρηνες αρχιτεκτονικές.The present PhD thesis introduces a low cost fault detection methodology for small embedded cache memories that is based on modern Instruction Set Architectures and is applied with Software-Based Self-Test (SBST) routines. The proposed methodology applies March tests through software to detect both storage faults when applied to caches that comprise Static Random Access Memories (SRAM) only, e.g. L1 caches, and comparison faults when applied to caches that apart from SRAM memories comprise Content Addressable Memories (CAM) too, e.g. Translation Lookaside Buffers (TLBs). The proposed methodology can be applied to all three cache associativity organizations: direct mapped, set-associative and full-associative and it does not depend on the cache write policy. The methodology leverages existing powerful mechanisms of modern ISAs by utilizing instructions that we call in this PhD thesis Direct Cache Access (DCA) instructions. Moreover, our methodology exploits the native performance monitoring hardware and the trap handling mechanisms which are available in modern microprocessors. Moreover, the proposed Methodology applies March compare operations when needed (for CAM arrays) and verifies the test result with a compact response to comply with periodic on-line testing needs. Finally, a multithreaded optimization of the proposed methodology that targets multithreaded, multicore architectures is also presented in this thesi

    Improving the effective use of multithreaded architectures : implications on compilation, thread assignment, and timing analysis

    Get PDF
    This thesis presents cross-domain approaches that improve the effective use of multithreaded architectures. The contributions of the thesis can be classified in three groups. First, we propose several methods for thread assignment of network applications running in multithreaded network servers. Second, we analyze the problem of graph partitioning that is a part of the compilation process of multithreaded streaming applications. Finally, we present a method that improves the measurement-based timing analysis of multithreaded architectures used in time-critical environments. The following sections summarize each of the contributions. (1) Thread assignment on multithreaded processors: State-of-the-art multithreaded processors have different level of resource sharing (e.g. between thread running on the same core and globally shared resources). Thus, the way that threads of a given workload are assigned to processors' hardware contexts determines which resources the threads share, which, in turn, may significantly affect the system performance. In this thesis, we demonstrate the importance of thread assignment for network applications running in multithreaded servers. We also present TSBSched and BlackBox scheduler, methods for thread assignment of multithreaded network applications running on processors with several levels of resource sharing. Finally, we propose a statistical approach to the thread assignment problem. In particular, we show that running a sample of several hundred or several thousand random thread assignments is sufficient to capture at least one out of 1% of the best-performing assignments with a very high probability. We also describe the method that estimates the optimal system performance for given workload. We successfull y applied TSBSched, BlackBox scheduler, and the presented statistical approach to a case study of thread assignment of multithreaded network applications running on the UltraSPARC T2 processor. (2) Kernel partitioning of streaming applications: An important step in compiling a stream program to multiple processors is kernel partitioning. Finding an optimal kernel partition is, however, an intractable problem. We propose a statistical approach to the kernel partitioning problem. We describe a method that statistically estimates the performance of the optimal kernel partition. We demonstrate that the sampling method is an important part of the analysis, and that not all methods that generate random samples provide good results. We also show that random sampling on its own can be used to find a good kernel partition, and that it could be an alternative to heuristics-based approaches. The presented statistical method is applied successfully to the benchmarks included in the StreamIt 2.1.1 suite. (3) Multithreaded processors in time-critical environments: Despite the benefits that multithreaded commercial-of-the-shelf (MT COTS) processors may offer in embedded real-time systems, the time-critical market has not yet embraced a shift toward these architectures. The main challenge with MT COTS architectures is the difficulty when predicting the execution time of concurrently-running (co-running) time-critical tasks. Providing a timing analysis for real industrial applications running on MT COTS processors becomes extremely difficult because the execution time of a task, and hence its worst-case execution time (WCET) depends on the interference with co-running tasks in shared processor resources. We show that the measurement-based timing analysis used for single-threaded processors cannot be directly extended for MT COTS architectures. Also, we propose a methodology that quantifies the slowdown that a task may experience because of collision with co-running tasks in shared resources of MT COTS processor. The methodology is applied to a case study in which different time-critical applications were executed on several MT COTS multithreaded processors

    Thermal modeling and analysis of 3D multi-processor chips

    Get PDF
    As 3D chip multi-processors (3D-CMPs) become the main trend in processor development, various thermal management strategies have been recently proposed to optimize system performance while controlling the temperature of the system to stay below a threshold. These thermal-aware policies require the envision of high-level models that capture the complex thermal behavior of (nano)structures that build the 3D stack. Moreover, the floorplanning of the chip strongly determines the thermal profile of the system and a quick exploration of the design space is required to minimize the damage of the thermal effects

    Architectures for dependable modern microprocessors

    Get PDF
    Η εξέλιξη των ολοκληρωμένων κυκλωμάτων σε συνδυασμό με τους αυστηρούς χρονικούς περιορισμούς καθιστούν την επαλήθευση της ορθής λειτουργίας των επεξεργαστών μία εξαιρετικά απαιτητική διαδικασία. Με κριτήριο το στάδιο του κύκλου ζωής ενός επεξεργαστή, από την στιγμή κατασκευής των πρωτοτύπων και έπειτα, οι τεχνικές ελέγχου ορθής λειτουργίας διακρίνονται στις ακόλουθες κατηγορίες: (1) Silicon Debug: Τα πρωτότυπα ολοκληρωμένα κυκλώματα ελέγχονται εξονυχιστικά, (2) Manufacturing Testing: ο τελικό ποιοτικός έλεγχος και (3) In-field verification: Περιλαμβάνει τεχνικές, οι οποίες διασφαλίζουν την λειτουργία του επεξεργαστή σύμφωνα με τις προδιαγραφές του. Η διδακτορική διατριβή προτείνει τα ακόλουθα: (1) Silicon Debug: Η εργασία αποσκοπεί στην επιτάχυνση της διαδικασίας ανίχνευσης σφαλμάτων και στον αυτόματο εντοπισμό τυχαίων προγραμμάτων που δεν περιέχουν νέα -χρήσιμη- πληροφορία σχετικά με την αίτια ενός σφάλματος. Η κεντρική ιδέα αυτής της μεθόδου έγκειται στην αξιοποίηση της έμφυτης ποικιλομορφίας των αρχιτεκτονικών συνόλου εντολών και στην δυνατότητα από-διαμόρφωσης τμημάτων του κυκλώματος, (2) Manufacturing Testing: προτείνεται μία μέθοδο για την βελτιστοποίηση του έλεγχου ορθής λειτουργίας των πολυνηματικών και πολυπύρηνων επεξεργαστών μέσω της χρήση λογισμικού αυτοδοκιμής, (3) Ιn-field verification: Αναλύθηκε σε βάθος η επίδραση που έχουν τα μόνιμα σφάλματα σε μηχανισμούς αύξησης της απόδοσης. Επιπρόσθετα, προτάθηκαν τεχνικές για την ανίχνευση και ανοχή μόνιμων σφαλμάτων υλικού σε μηχανισμούς πρόβλεψης διακλάδωσης.Technology scaling, extreme chip integration and the compelling requirement to diminish the time-to-market window, has rendered microprocessors more prone to design bugs and hardware faults. Microprocessor validation is grouped into the following categories, based on where they intervene in a microprocessor’s lifecycle: (a) Silicon debug: the first hardware prototypes are exhaustively validated, (b) Μanufacturing testing: the final quality control during massive production, and (c) In-field verification: runtime error detection techniques to guarantee correct operation. The contributions of this thesis are the following: (1) Silicon debug: We propose the employment of deconfigurable microprocessor architectures along with a technique to generate self-checking random test programs to avoid the simulation step and triage the redundant debug sessions, (2) Manufacturing testing: We propose a self-test optimization strategy for multithreaded, multicore microprocessors to speedup test program execution time and enhance the fault coverage of hard errors; and (3) In-field verification: We measure the effect of permanent faults performance components. Then, we propose a set of low-cost mechanisms for the detection, diagnosis and performance recovery in the front-end speculative structures. This thesis introduces various novel methodologies to address the validation challenges posed throughout the life-cycle of a chip

    Summary of multi-core hardware and programming model investigations

    Full text link

    How giving away CMT chip hardware implementations creates value for Sun microsystems

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, System Design and Management Program, 2009.Includes bibliographical references (p. 79-81).This thesis uses systems thinking and system dynamics modeling to explore how open source communities such as OpenSPARC can lead to enhancement of the performance of Sun's multithreaded systems and thereby increase its market share by increasing its share of the CMT ecosystem, and the share that CMT systems have within the overall computer server business ecosystem. This study explores Sun's motivation behind its investment in the OpenSPARC community, and explains how OpenSPARC creates value for Sun. The insight into Sun's value creation and capture strategy gained from this study can be generalized and applied to similar companies who are entering a new market where the ecosystem is not yet fully developed. The companies who benefit most from this study are ones which are strategically positioned to disclose relevant knowledge of a critical component within the ecosystem that enables its development without thereby compromising the full potential for value capture within the market. The key findings of this study include: a) Market adoption of multitcore and multithreaded servers is dependent on the rewriting of software applications with parallelization in order to boost the performance of multicore and multithreaded systems. b) The overhaul of these software applications will take the coordination and involvement of all the major players in the computer industry. c) The specific business structure of Sun allows for open sourcing the components of its systems while still gaining revenue on the sale of the system as a whole.(cont.) d) Factors attracting open source developers to write software for a particular platform include a developer's belief that his program will be successful in the market. e) The information leakage to competitors from open sourcing Sun's multithreaded processor implementation does not diminish Sun's value capture of the market. f) The benefits that open source communities can have on market adoption of multicore and multithreaded servers, provided that the disclosure of the chip hardware implementation actually improves the technical performance and economics of the application software. This thesis will explore the reasons that Sun believes the open source community can be a catalyst for the wide-spread adoption of multithreaded processors and multithreaded software required simultaneously by all the major players in the computer industry.by Julie Mitchell.S.M
    corecore