96 research outputs found

    On-line mass storage system functional design document

    Get PDF
    A functional system definition for an on-line high density magnetic tape data storage system is provided. This system can be implemented in a multi-purpose, multi-host environment, and satisfy the requirements of economical data storage in the range of 2 to 50 billion bytes

    Avoiding core's DUE & SDC via acoustic wave detectors and tailored error containment and recovery

    Get PDF
    The trend of downsizing transistors and operating voltage scaling has made the processor chip more sensitive against radiation phenomena making soft errors an important challenge. New reliability techniques for handling soft errors in the logic and memories that allow meeting the desired failures-in-time (FIT) target are key to keep harnessing the benefits of Moore's law. The failure to scale the soft error rate caused by particle strikes, may soon limit the total number of cores that one may have running at the same time. This paper proposes a light-weight and scalable architecture to eliminate silent data corruption errors (SDC) and detected unrecoverable errors (DUE) of a core. The architecture uses acoustic wave detectors for error detection. We propose to recover by confining the errors in the cache hierarchy, allowing us to deal with the relatively long detection latencies. Our results show that the proposed mechanism protects the whole core (logic, latches and memory arrays) incurring performance overhead as low as 0.60%. © 2014 IEEE.Peer ReviewedPostprint (author's final draft

    Space station automation of common module power management and distribution, volume 2

    Get PDF
    The new Space Station Module Power Management and Distribution System (SSM/PMAD) testbed automation system is described. The subjects discussed include testbed 120 volt dc star bus configuration and operation, SSM/PMAD automation system architecture, fault recovery and management expert system (FRAMES) rules english representation, the SSM/PMAD user interface, and the SSM/PMAD future direction. Several appendices are presented and include the following: SSM/PMAD interface user manual version 1.0, SSM/PMAD lowest level processor (LLP) reference, SSM/PMAD technical reference version 1.0, SSM/PMAD LLP visual control logic representation's (VCLR's), SSM/PMAD LLP/FRAMES interface control document (ICD) , and SSM/PMAD LLP switchgear interface controller (SIC) ICD

    Low-cost and efficient fault detection and diagnosis schemes for modern cores

    Get PDF
    Continuous improvements in transistor scaling together with microarchitectural advances have made possible the widespread adoption of high-performance processors across all market segments. However, the growing reliability threats induced by technology scaling and by the complexity of designs are challenging the production of cheap yet robust systems. Soft error trends are haunting, especially for combinational logic, and parity and ECC codes are therefore becoming insufficient as combinational logic turns into the dominant source of soft errors. Furthermore, experts are warning about the need to also address intermittent and permanent faults during processor runtime, as increasing temperatures and device variations will accelerate inherent aging phenomena. These challenges specially threaten the commodity segments, which impose requirements that existing fault tolerance mechanisms cannot offer. Current techniques based on redundant execution were devised in a time when high penalties were assumed for the sake of high reliability levels. Novel light-weight techniques are therefore needed to enable fault protection in the mass market segments. The complexity of designs is making post-silicon validation extremely expensive. Validation costs exceed design costs, and the number of discovered bugs is growing, both during validation and once products hit the market. Fault localization and diagnosis are the biggest bottlenecks, magnified by huge detection latencies, limited internal observability, and costly server farms to generate test outputs. This thesis explores two directions to address some of the critical challenges introduced by unreliable technologies and by the limitations of current validation approaches. We first explore mechanisms for comprehensively detecting multiple sources of failures in modern processors during their lifetime (including transient, intermittent, permanent and also design bugs). Our solutions embrace a paradigm where fault tolerance is built based on exploiting high-level microarchitectural invariants that are reusable across designs, rather than relying on re-execution or ad-hoc block-level protection. To do so, we decompose the basic functionalities of processors into high-level tasks and propose three novel runtime verification solutions that combined enable global error detection: a computation/register dataflow checker, a memory dataflow checker, and a control flow checker. The techniques use the concept of end-to-end signatures and allow designers to adjust the fault coverage to their needs, by trading-off area, power and performance. Our fault injection studies reveal that our methods provide high coverage levels while causing significantly lower performance, power and area costs than existing techniques. Then, this thesis extends the applicability of the proposed error detection schemes to the validation phases. We present a fault localization and diagnosis solution for the memory dataflow by combining our error detection mechanism, a new low-cost logging mechanism and a diagnosis program. Selected internal activity is continuously traced and kept in a memory-resident log whose capacity can be expanded to suite validation needs. The solution can catch undiscovered bugs, reducing the dependence on simulation farms that compute golden outputs. Upon error detection, the diagnosis algorithm analyzes the log to automatically locate the bug, and also to determine its root cause. Our evaluations show that very high localization coverage and diagnosis accuracy can be obtained at very low performance and area costs. The net result is a simplification of current debugging practices, which are extremely manual, time consuming and cumbersome. Altogether, the integrated solutions proposed in this thesis capacitate the industry to deliver more reliable and correct processors as technology evolves into more complex designs and more vulnerable transistors.El continuo escalado de los transistores junto con los avances microarquitectónicos han posibilitado la presencia de potentes procesadores en todos los segmentos de mercado. Sin embargo, varios problemas de fiabilidad están desafiando la producción de sistemas robustos. Las predicciones de "soft errors" son inquietantes, especialmente para la lógica combinacional: soluciones como ECC o paridad se están volviendo insuficientes a medida que dicha lógica se convierte en la fuente predominante de soft errors. Además, los expertos están alertando acerca de la necesidad de detectar otras fuentes de fallos (causantes de errores permanentes e intermitentes) durante el tiempo de vida de los procesadores. Los segmentos "commodity" son los más vulnerables, ya que imponen unos requisitos que las técnicas actuales de fiabilidad no ofrecen. Estas soluciones (generalmente basadas en re-ejecución) fueron ideadas en un tiempo en el que con tal de alcanzar altos nivel de fiabilidad se asumían grandes costes. Son por tanto necesarias nuevas técnicas que permitan la protección contra fallos en los segmentos más populares. La complejidad de los diseños está encareciendo la validación "post-silicon". Su coste excede el de diseño, y el número de errores descubiertos está aumentando durante la validación y ya en manos de los clientes. La localización y el diagnóstico de errores son los mayores problemas, empeorados por las altas latencias en la manifestación de errores, por la poca observabilidad interna y por el coste de generar las señales esperadas. Esta tesis explora dos direcciones para tratar algunos de los retos causados por la creciente vulnerabilidad hardware y por las limitaciones de los enfoques de validación. Primero exploramos mecanismos para detectar múltiples fuentes de fallos durante el tiempo de vida de los procesadores (errores transitorios, intermitentes, permanentes y de diseño). Nuestras soluciones son de un paradigma donde la fiabilidad se construye explotando invariantes microarquitectónicos genéricos, en lugar de basarse en re-ejecución o en protección ad-hoc. Para ello descomponemos las funcionalidades básicas de un procesador y proponemos tres soluciones de `runtime verification' que combinadas permiten una detección de errores a nivel global. Estas tres soluciones son: un verificador de flujo de datos de registro y de computación, un verificador de flujo de datos de memoria y un verificador de flujo de control. Nuestras técnicas usan el concepto de firmas y permiten a los diseñadores ajustar los niveles de protección a sus necesidades, mediante compensaciones en área, consumo energético y rendimiento. Nuestros estudios de inyección de errores revelan que los métodos propuestos obtienen altos niveles de protección, a la vez que causan menos costes que las soluciones existentes. A continuación, esta tesis explora la aplicabilidad de estos esquemas a las fases de validación. Proponemos una solución de localización y diagnóstico de errores para el flujo de datos de memoria que combina nuestro mecanismo de detección de errores, junto con un mecanismo de logging de bajo coste y un programa de diagnóstico. Cierta actividad interna es continuamente registrada en una zona de memoria cuya capacidad puede ser expandida para satisfacer las necesidades de validación. La solución permite descubrir bugs, reduciendo la necesidad de calcular los resultados esperados. Al detectar un error, el algoritmo de diagnóstico analiza el registro para automáticamente localizar el bug y determinar su causa. Nuestros estudios muestran un alto grado de localización y de precisión de diagnóstico a un coste muy bajo de rendimiento y área. El resultado es una simplificación de las prácticas actuales de depuración, que son enormemente manuales, incómodas y largas. En conjunto, las soluciones de esta tesis capacitan a la industria a producir procesadores más fiables, a medida que la tecnología evoluciona hacia diseños más complejos y más vulnerables

    Architectures for dependable modern microprocessors

    Get PDF
    Η εξέλιξη των ολοκληρωμένων κυκλωμάτων σε συνδυασμό με τους αυστηρούς χρονικούς περιορισμούς καθιστούν την επαλήθευση της ορθής λειτουργίας των επεξεργαστών μία εξαιρετικά απαιτητική διαδικασία. Με κριτήριο το στάδιο του κύκλου ζωής ενός επεξεργαστή, από την στιγμή κατασκευής των πρωτοτύπων και έπειτα, οι τεχνικές ελέγχου ορθής λειτουργίας διακρίνονται στις ακόλουθες κατηγορίες: (1) Silicon Debug: Τα πρωτότυπα ολοκληρωμένα κυκλώματα ελέγχονται εξονυχιστικά, (2) Manufacturing Testing: ο τελικό ποιοτικός έλεγχος και (3) In-field verification: Περιλαμβάνει τεχνικές, οι οποίες διασφαλίζουν την λειτουργία του επεξεργαστή σύμφωνα με τις προδιαγραφές του. Η διδακτορική διατριβή προτείνει τα ακόλουθα: (1) Silicon Debug: Η εργασία αποσκοπεί στην επιτάχυνση της διαδικασίας ανίχνευσης σφαλμάτων και στον αυτόματο εντοπισμό τυχαίων προγραμμάτων που δεν περιέχουν νέα -χρήσιμη- πληροφορία σχετικά με την αίτια ενός σφάλματος. Η κεντρική ιδέα αυτής της μεθόδου έγκειται στην αξιοποίηση της έμφυτης ποικιλομορφίας των αρχιτεκτονικών συνόλου εντολών και στην δυνατότητα από-διαμόρφωσης τμημάτων του κυκλώματος, (2) Manufacturing Testing: προτείνεται μία μέθοδο για την βελτιστοποίηση του έλεγχου ορθής λειτουργίας των πολυνηματικών και πολυπύρηνων επεξεργαστών μέσω της χρήση λογισμικού αυτοδοκιμής, (3) Ιn-field verification: Αναλύθηκε σε βάθος η επίδραση που έχουν τα μόνιμα σφάλματα σε μηχανισμούς αύξησης της απόδοσης. Επιπρόσθετα, προτάθηκαν τεχνικές για την ανίχνευση και ανοχή μόνιμων σφαλμάτων υλικού σε μηχανισμούς πρόβλεψης διακλάδωσης.Technology scaling, extreme chip integration and the compelling requirement to diminish the time-to-market window, has rendered microprocessors more prone to design bugs and hardware faults. Microprocessor validation is grouped into the following categories, based on where they intervene in a microprocessor’s lifecycle: (a) Silicon debug: the first hardware prototypes are exhaustively validated, (b) Μanufacturing testing: the final quality control during massive production, and (c) In-field verification: runtime error detection techniques to guarantee correct operation. The contributions of this thesis are the following: (1) Silicon debug: We propose the employment of deconfigurable microprocessor architectures along with a technique to generate self-checking random test programs to avoid the simulation step and triage the redundant debug sessions, (2) Manufacturing testing: We propose a self-test optimization strategy for multithreaded, multicore microprocessors to speedup test program execution time and enhance the fault coverage of hard errors; and (3) In-field verification: We measure the effect of permanent faults performance components. Then, we propose a set of low-cost mechanisms for the detection, diagnosis and performance recovery in the front-end speculative structures. This thesis introduces various novel methodologies to address the validation challenges posed throughout the life-cycle of a chip

    3rd EGEE User Forum

    Get PDF
    We have organized this book in a sequence of chapters, each chapter associated with an application or technical theme introduced by an overview of the contents, and a summary of the main conclusions coming from the Forum for the chapter topic. The first chapter gathers all the plenary session keynote addresses, and following this there is a sequence of chapters covering the application flavoured sessions. These are followed by chapters with the flavour of Computer Science and Grid Technology. The final chapter covers the important number of practical demonstrations and posters exhibited at the Forum. Much of the work presented has a direct link to specific areas of Science, and so we have created a Science Index, presented below. In addition, at the end of this book, we provide a complete list of the institutes and countries involved in the User Forum

    Framework for the Integration of Mobile Device Features in PLM

    Get PDF
    Currently, companies have covered their business processes with stationary workstations while mobile business applications have limited relevance. Companies can cover their overall business processes more time-efficiently and cost-effectively when they integrate mobile users in workflows using mobile device features. The objective is a framework that can be used to model and control business applications for PLM processes using mobile device features to allow a totally new user experience

    A flexible manufacturing system for lawnmower cutting cylinders

    Get PDF
    The thesis is concerned with the conception and design of a FLEXIBLE MANUFACTURING SYSTEM (FMS) for the automation of the manufacture of lawnmower cutting cylinders at Suffolk Lawnmowers Ltd. A review of FMS definitions, planning methods and current systems is carried out for the development of a suitable FMS configuration for the final stages of manufacture of grass cutting cylinders having 21 different design specifications. This involves examination of the capabilities of robotics and microcontrollers to automate the technologies used in cylinder production. The company's current manual batch production system is analysed to determine the suitable form and requirements of the FMS. This includes analyses of annual volumes, throughputs, batch sizes, product and process mixes. Long term objectives to automate the system are identified from which short term objectives are derived. The FMS recommended for immediate development encompasses the short term objectives for the welding, hardening, grinding and transfer processes of 8 cutting cylinder specifications. It is shown that the MIG (Argon/C02) are welding, progressive flame hardening and wide-face cylindrical grinding processes can be developed successfully to automate cylinder production. The recommended system integrates these processes into an FMS through the'automatic handling of cylinders (through three process routes) by a robotic manipulator utilising a double gripper. 'A robotic welding station, manually loaded, is also recommended. ' The system is controlled overall by a 32K microcontroller with the process machines individually controlled by programmahle logic controllers with up to 6K of memory each. The economic appraisal of the FMS indicates a 4.4 year payback based on direct labour and material cost savings. The company's application for grant aid to implement the FMS design has led to an offer of a Department of Industry grant to cover 50% of all capital and revenue costs. The grant of £166,943 reduces the payback period to 2.3 years

    Adaptive Distributed Architectures for Future Semiconductor Technologies.

    Full text link
    Year after year semiconductor manufacturing has been able to integrate more components in a single computer chip. These improvements have been possible through systematic shrinking in the size of its basic computational element, the transistor. This trend has allowed computers to progressively become faster, more efficient and less expensive. As this trend continues, experts foresee that current computer designs will face new challenges, in utilizing the minuscule devices made available by future semiconductor technologies. Today's microprocessor designs are not fit to overcome these challenges, since they are constrained by their inability to handle component failures by their lack of adaptability to a wide range of custom modules optimized for specific applications and by their limited design modularity. The focus of this thesis is to develop original computer architectures, that can not only survive these new challenges, but also leverage the vast number of transistors available to unlock better performance and efficiency. The work explores and evaluates new software and hardware techniques to enable the development of novel adaptive and modular computer designs. The thesis first explores an infrastructure to quantitatively assess the fallacies of current systems and their inadequacy to operate on unreliable silicon. In light of these findings, specific solutions are then proposed to strengthen digital system architectures, both through hardware and software techniques. The thesis culminates with the proposal of a radically new architecture design that can fully adapt dynamically to operate on the hardware resources available on chip, however limited or abundant those may be.PHDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/102405/1/apellegr_1.pd

    Fourth NASA Goddard Conference on Mass Storage Systems and Technologies

    Get PDF
    This report contains copies of all those technical papers received in time for publication just prior to the Fourth Goddard Conference on Mass Storage and Technologies, held March 28-30, 1995, at the University of Maryland, University College Conference Center, in College Park, Maryland. This series of conferences continues to serve as a unique medium for the exchange of information on topics relating to the ingestion and management of substantial amounts of data and the attendant problems involved. This year's discussion topics include new storage technology, stability of recorded media, performance studies, storage system solutions, the National Information infrastructure (Infobahn), the future for storage technology, and lessons learned from various projects. There also will be an update on the IEEE Mass Storage System Reference Model Version 5, on which the final vote was taken in July 1994
    corecore