220 research outputs found
Exploiting Adaptive Techniques to Improve Processor Energy Efficiency
Rapid device-miniaturization keeps on inducing challenges in building energy efficient microprocessors. As the size of the transistors continuously decreasing, more uncertainties emerge in their operations. On the other hand, integrating more and more transistors on a single chip accentuates the need to lower its supply-voltage. This dissertation investigates one of the primary device uncertainties - timing error, in microprocessor performance bottleneck in NTC era. Then it proposes various innovative techniques to exploit these opportunities to maintain processor energy efficiency, in the context of emerging challenges. Evaluated with the cross-layer methodology, the proposed approaches achieve substantial improvements in processor energy efficiency, compared to other start-of-art techniques
Revamping Timing Error Resilience to Tackle Choke Points at NTC
The growing market of portable devices and smart wearables has contributed to innovation and development of systems with longer battery-life. While Near Threshold Computing (NTC) systems address the need for longer battery-life, they have certain limitations. NTC systems are prone to be significantly affected by variations in the fabrication process, commonly called process variation (PV). This dissertation explores an intriguing effect of PV, called choke points. Choke points are especially important due to their multifarious influence on the functional correctness of an NTC system. This work shows why novel research is required in this direction and proposes two techniques to resolve the problems created by choke points, while maintaining the reduced power needs
Revamping Timing Error Resilience to Tackle Choke Points at NTC
The growing market of portable devices and smart wearables has contributed to innovation and development of systems with longer battery-life. While Near Threshold Computing (NTC) systems address the need for longer battery-life, they have certain limitations. NTC systems are prone to be significantly affected by variations in the fabrication process, commonly called process variation (PV). This dissertation explores an intriguing effect of PV, called choke points. Choke points are especially important due to their multifarious influence on the functional correctness of an NTC system. This work shows why novel research is required in this direction and proposes two techniques to resolve the problems created by choke points, while maintaining the reduced power needs
REFU: Redundant Execution with Idle Functional Units, Fault Tolerant GPGPU architecture
The General-Purpose Graphics Processing Units (GPGPU) with energy efficient execution are increasingly used in wide range of applications due to high performance. These GPGPUs are fabricated with the cutting-edge technologies. Shrinking transistor feature size and aggressive voltage scaling has increased the susceptibility of devices to intrinsic and extrinsic noise leading to major reliability issues in the form of the transient faults. Therefore, it is essential to ensure the reliable operation of the GPGPUs in the presence of the transient faults. GPGPUs are designed for high throughput and execute the multiple threads in parallel, that brings a new challenge for the fault detection with minimum overheads across all threads. This paper proposes a new fault detection method called REFU, an architectural solution to detect the transient faults by temporal redundant re-execution of instructions using the idle functional execution units of the GPGPU. The performance of the REFU is evaluated with standard benchmarks, for fault free run across different workloads REFU shows mean performance overhead of 2%, average power overhead of 6%, and peak power overhead of 10%
Avoiding core's DUE & SDC via acoustic wave detectors and tailored error containment and recovery
The trend of downsizing transistors and operating voltage scaling has made the processor chip more sensitive against radiation phenomena making soft errors an important challenge. New reliability techniques for handling soft errors in the logic and memories that allow meeting the desired failures-in-time (FIT) target are key to keep harnessing the benefits of Moore's law. The failure to scale the soft error rate caused by particle strikes, may soon limit the total number of cores that one may have running at the same time. This paper proposes a light-weight and scalable architecture to eliminate silent data corruption errors (SDC) and detected unrecoverable errors (DUE) of a core. The architecture uses acoustic wave detectors for error detection. We propose to recover by confining the errors in the cache hierarchy, allowing us to deal with the relatively long detection latencies. Our results show that the proposed mechanism protects the whole core (logic, latches and memory arrays) incurring performance overhead as low as 0.60%. © 2014 IEEE.Peer ReviewedPostprint (author's final draft
Architectures for dependable modern microprocessors
Η εξέλιξη των ολοκληρωμένων κυκλωμάτων σε συνδυασμό με τους αυστηρούς χρονικούς
περιορισμούς καθιστούν την επαλήθευση της ορθής λειτουργίας των επεξεργαστών
μία εξαιρετικά απαιτητική διαδικασία. Με κριτήριο το στάδιο του κύκλου ζωής
ενός επεξεργαστή, από την στιγμή κατασκευής των πρωτοτύπων και έπειτα, οι
τεχνικές ελέγχου ορθής λειτουργίας διακρίνονται στις ακόλουθες κατηγορίες: (1)
Silicon Debug: Τα πρωτότυπα ολοκληρωμένα κυκλώματα ελέγχονται εξονυχιστικά, (2)
Manufacturing Testing: ο τελικό ποιοτικός έλεγχος και (3) In-field
verification: Περιλαμβάνει τεχνικές, οι οποίες διασφαλίζουν την λειτουργία του
επεξεργαστή σύμφωνα με τις προδιαγραφές του. Η διδακτορική διατριβή προτείνει
τα ακόλουθα: (1) Silicon Debug: Η εργασία αποσκοπεί στην επιτάχυνση της
διαδικασίας ανίχνευσης σφαλμάτων και στον αυτόματο εντοπισμό τυχαίων
προγραμμάτων που δεν περιέχουν νέα -χρήσιμη- πληροφορία σχετικά με την αίτια
ενός σφάλματος. Η κεντρική ιδέα αυτής της μεθόδου έγκειται στην αξιοποίηση της
έμφυτης ποικιλομορφίας των αρχιτεκτονικών συνόλου εντολών και στην δυνατότητα
από-διαμόρφωσης τμημάτων του κυκλώματος, (2) Manufacturing Testing: προτείνεται
μία μέθοδο για την βελτιστοποίηση του έλεγχου ορθής λειτουργίας των
πολυνηματικών και πολυπύρηνων επεξεργαστών μέσω της χρήση λογισμικού
αυτοδοκιμής, (3) Ιn-field verification: Αναλύθηκε σε βάθος η επίδραση που έχουν
τα μόνιμα σφάλματα σε μηχανισμούς αύξησης της απόδοσης. Επιπρόσθετα, προτάθηκαν
τεχνικές για την ανίχνευση και ανοχή μόνιμων σφαλμάτων υλικού σε μηχανισμούς
πρόβλεψης διακλάδωσης.Technology scaling, extreme chip integration and the compelling requirement to
diminish the time-to-market window, has rendered microprocessors more prone to
design bugs and hardware faults. Microprocessor validation is grouped into the
following categories, based on where they intervene in a microprocessor’s
lifecycle: (a) Silicon debug: the first hardware prototypes are exhaustively
validated, (b) Μanufacturing testing: the final quality control during massive
production, and (c) In-field verification: runtime error detection techniques
to guarantee correct operation. The contributions of this thesis are the
following: (1) Silicon debug: We propose the employment of deconfigurable
microprocessor architectures along with a technique to generate self-checking
random test programs to avoid the simulation step and triage the redundant
debug sessions, (2) Manufacturing testing: We propose a self-test optimization
strategy for multithreaded, multicore microprocessors to speedup test program
execution time and enhance the fault coverage of hard errors; and (3) In-field
verification: We measure the effect of permanent faults performance components.
Then, we propose a set of low-cost mechanisms for the detection, diagnosis and
performance recovery in the front-end speculative structures. This thesis
introduces various novel methodologies to address the validation challenges
posed throughout the life-cycle of a chip
- …