8 research outputs found

    A Functional Verification Methodology for an Improved Coverage of System-on-Chips

    Get PDF
    The increasing popularity of System-on-Chip (SoC) circuits results in many new design challenges. One major challenge is to ensure the functional correctness of such complicated circuits. Functional verification is a verification technique used to verify the functional correctness of SoCs. Coverage Directed Test Generation (CDTG) is an essential part of functional verification, where the objective is to generate input stimulus that maximize the coverage of a design. Coverage helps to determine how well the input stimulus verified the design under verification. CDTG techniques analyze coverage results and adapt the input stimulus generation process to improve the coverage. One of the important component of CDTG based tools is the constraint solver. The time efficiency of the verification process depends on the efficiency of the solver. But the constraint solvers associated with CDTG tools require large amount of memory and time to generate input stimuli for SoCs. The solvers cannot generate solutions which are evenly distributed in search space, in order to attain the required coverage. The aim of this thesis is to provide a practical framework that enables the generation of evenly distributed input stimuli. A basic feature of the search space (data set) is that it contains k sub populations or clusters. Partitioning the search space into clusters and generating solutions from the partitions can improve the evenness of the solutions generated by the solver. Hence one of our main contribution is a novel domain partitioning algorithm. The domain partitioning algorithm relies on solution generated by a consistency search algorithm developed for our purpose. The number of partitions (required by the domain partitioning algorithm) is determined by using an algorithm which can find the optimal number of clusters present in a data set. To demonstrate the effectiveness of our approach, we apply our methodology on Constraint Satisfaction Problems (CSPs) and some real life applications

    An Effective Verification Solution for Modern Microprocessors.

    Full text link
    Over the past four decades microprocessors have come to be a vital and inseparable part of the modern world, becoming the digital brain of numerous electronic devices and gadgets that make today's lifestyle possible. Processors are capable of performing computation at astonishingly high speeds and are extremely integrated, occupying only a few square centimeters of silicon die. However, this computational power comes at a price: the task of verifying a modern microprocessor and guaranteeing correctness of its operation is increasingly challenging, even for most established processor vendors. Always attempting to deliver higher performance to end-users, processor manufacturers are forced to design progressively more complex circuits and employ immense verification teams to eliminate critical design bugs in a timely manner. Unfortunately, too often size doesn't seem to matter in verification, as schedules continue to slip and microprocessors find their way to the marketplace with design errors. This work describes a novel verification framework targeting specifically today's complex microprocessors. The scope of the work spans many levels of verification and different phases of the processor life-cycle, from validation of individual sub-modules to complete multi-core system, and from pre-silicon design verification to in-the-field hardware patching. In particular, our StressTest and MCjammer approaches enable efficient generation of high-quality tests at the pre-silicon level for individual cores and multi-core systems, respectively, using machine learning techniques and making the process as automatic as possible. On the other hand, Reversi and Dacota enable low cost validation in post-silicon, while delivering even higher coverage than pre-silicon techniques. Finally, the Field-repairable control logic (FRCL) and Caspar techniques allow designers to patch different classes of escaped errors in processors that are deployed in the field. The integrated set of solutions that we introduce with this thesis empowers processor vendors to drastically shorten their development timeline and, at the same time, to deliver more reliable and correct systems to their customers at a lower cost. Altogether, this work has the potential to solve the long-standing challenge of guaranteeing the complete functional correctness of modern microprocessors.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/61656/1/ivagner_1.pd

    New techniques for functional testing of microprocessor based systems

    Get PDF
    Electronic devices may be affected by failures, for example due to physical defects. These defects may be introduced during the manufacturing process, as well as during the normal operating life of the device due to aging. How to detect all these defects is not a trivial task, especially in complex systems such as processor cores. Nevertheless, safety-critical applications do not tolerate failures, this is the reason why testing such devices is needed so to guarantee a correct behavior at any time. Moreover, testing is a key parameter for assessing the quality of a manufactured product. Consolidated testing techniques are based on special Design for Testability (DfT) features added in the original design to facilitate test effectiveness. Design, integration, and usage of the available DfT for testing purposes are fully supported by commercial EDA tools, hence approaches based on DfT are the standard solutions adopted by silicon vendors for testing their devices. Tests exploiting the available DfT such as scan-chains manipulate the internal state of the system, differently to the normal functional mode, passing through unreachable configurations. Alternative solutions that do not violate such functional mode are defined as functional tests. In microprocessor based systems, functional testing techniques include software-based self-test (SBST), i.e., a piece of software (referred to as test program) which is uploaded in the system available memory and executed, with the purpose of exciting a specific part of the system and observing the effects of possible defects affecting it. SBST has been widely-studies by the research community for years, but its adoption by the industry is quite recent. My research activities have been mainly focused on the industrial perspective of SBST. The problem of providing an effective development flow and guidelines for integrating SBST in the available operating systems have been tackled and results have been provided on microprocessor based systems for the automotive domain. Remarkably, new algorithms have been also introduced with respect to state-of-the-art approaches, which can be systematically implemented to enrich SBST suites of test programs for modern microprocessor based systems. The proposed development flow and algorithms are being currently employed in real electronic control units for automotive products. Moreover, a special hardware infrastructure purposely embedded in modern devices for interconnecting the numerous on-board instruments has been interest of my research as well. This solution is known as reconfigurable scan networks (RSNs) and its practical adoption is growing fast as new standards have been created. Test and diagnosis methodologies have been proposed targeting specific RSN features, aimed at checking whether the reconfigurability of such networks has not been corrupted by defects and, in this case, at identifying the defective elements of the network. The contribution of my work in this field has also been included in the first suite of public-domain benchmark networks

    Observation mechanisms for in-field software-based self-test

    Get PDF
    When electronic systems are used in safety critical applications, as in the space, avionic, automotive or biomedical areas, it is required to maintain a very low probability of failures due to faults of any kind. Standards and regulations play a significant role, forcing companies to devise and adopt solutions able to achieve predefined targets in terms of dependability. Different techniques can be used to reduce fault occurrence or to minimize the probability that those faults produce critical failures (e.g., by introducing redundancy). Unfortunately, most of these techniques have a severe impact on the cost of the resulting product and, in some cases, the probability of failures is too large anyway. Hence, a solution commonly used in several scenarios lies on periodically performing a test able to detect the occurrence of any fault before it produces a failure (in-field test). This solution is normally based on forcing the processor inside the Device Under Test to execute a properly written test program, which is able to activate possible faults and to make their effects visible in some observable locations. This approach is also called Software-Based Self-Test, or SBST. If compared with testing in an end of manufacturing scenario, in-field testing has strong limitations in terms of access to the system inputs and outputs because Design for Testability structures and testing equipment are usually not available. As a consequence there are reduced possibilities to activate the faults and to observe their effects. This reduced observability particularly affects the ability to detect performance faults, i.e. faults that modify the timing but not the final value of computations. This kind of faults are hard to detect by only observing the final content of predefined memory locations, that is the usual test result observation method used in-field. Initially, the present work was focused on fault tolerance techniques against transient faults induced by ionizing radiation, the so called Single Event Upsets (SEUs). The main contribution of this early stage of the thesis lies in the experimental validation of the feasibility of achieving a safe system by using an architecture that combines task-level redundancy with already available IP cores, thus minimizing the development time. Task execution is replicated and Memory Protection is used to guarantee that any SEU may affect one and only one of the replicas. A proof of concept implementation was developed and validated using fault injection. Results outline the effectiveness of the architecture, and the overhead analysis shows that the proposed architecture is effective in reducing the resource occupation with respect to N-modular redundancy, at an affordable cost in terms of application execution time. The main part of the thesis is focused on in-field software-based self-test of permanent faults. A set of observation methods exploiting existing or ad-hoc hardware is proposed, aimed at obtaining a better coverage, in particular of performance faults. An extensive quantitative evaluation of the proposed methods is presented, including a comparison with the observation methods traditionally used in end of manufacturing and in-field testing. Results show that the proposed methods are a good complement to the traditionally used final memory content observation. Moreover, they show that an adequate combination of these complementary methods allows for achieving nearly the same fault coverage achieved when continuously observing all the processor outputs, which is an observation method commonly used for production test but usually not available in-field. A very interesting by-product of what is described above is a detailed description of how to compute the fault coverage achieved by functional in-field tests using a conventional fault simulator, a tool that is usually applied in an end of manufacturing testing scenario. Finally, another relevant result in the testing area is a method to detect permanent faults inside the cache coherence logic integrated in each cache controller of a multi-core system, based on the concurrent execution of a test program by the different cores in a coordinated manner. By construction, the method achieves full fault coverage of the static faults in the addressed logic.Cuando se utilizan sistemas electr贸nicos en aplicaciones cr铆ticas como en las 谩reas biom茅dica, aeroespacial o automotriz, se requiere mantener una muy baja probabilidad de malfuncionamientos debidos a cualquier tipo de fallas. Los est谩ndares y normas juegan un papel importante, forzando a los desarrolladores a dise帽ar y adoptar soluciones que sean capaces de alcanzar objetivos predefinidos en cuanto a seguridad y confiabilidad. Pueden utilizarse diferentes t茅cnicas para reducir la ocurrencia de fallas o para minimizar la probabilidad de que esas fallas produzcan mal funcionamientos cr铆ticos, por ejemplo a trav茅s de la incorporaci贸n de redundancia. Lamentablemente, muchas de esas t茅cnicas afectan en gran medida el costo de los productos y, en algunos casos, la probabilidad de malfuncionamiento sigue siendo demasiado alta. En consecuencia, una soluci贸n usada a menudo en varios escenarios consiste en realizar peri贸dicamente un test que sea capaz de detectar la ocurrencia de una falla antes de que esta produzca un mal funcionamiento (test en campo). En general, esta soluci贸n se basa en forzar a un procesador existente dentro del dispositivo bajo prueba a ejecutar un programa de test que sea capaz de activar las posibles fallas y de hacer que sus efectos sean visibles en puntos observables. A esta metodolog铆a tambi茅n se la llama auto-test basado en software, o en ingl茅s Software-Based Self-Test (SBST). Si se lo compara con un escenario de test de fin de fabricaci贸n, el test en campo tiene fuertes limitaciones en t茅rminos de posibilidad de acceso a las entradas y salidas del sistema, porque usualmente no se dispone de equipamiento de test ni de la infraestructura de Design for Testability. En consecuencia se tiene menos posibilidades de activar las fallas y de observar sus efectos. Esta observabilidad reducida afecta particularmente la habilidad para detectar fallas de performance, es decir fallas que modifican la temporizaci贸n pero no el resultado final de los c谩lculos. Este tipo de fallas es dif铆cil de detectar por la sola observaci贸n del contenido final de lugares de memoria, que es el m茅todo usual que se utiliza para observar los resultados de un test en campo. Inicialmente, el presente trabajo estuvo enfocado en t茅cnicas para tolerar fallas transitorias inducidas por radiaci贸n ionizante, llamadas en ingl茅s Single Event Upsets (SEUs). La principal contribuci贸n de esa etapa inicial de la tesis reside en la validaci贸n experimental de la viabilidad de obtener un sistema seguro, utilizando una arquitectura que combina redundancia a nivel de tareas con el uso de m贸dulos hardware (IP cores) ya disponibles, que minimiza en consecuencia el tiempo de desarrollo. Se replica la ejecuci贸n de las tareas y se utiliza protecci贸n de memoria para garantizar que un SEU pueda afectar a lo sumo a una sola de las r茅plicas. Se desarroll贸 una implementaci贸n para prueba de concepto que fue validada mediante inyecci贸n de fallas. Los resultados muestran la efectividad de la arquitectura, y el an谩lisis de los recursos utilizados muestra que la arquitectura propuesta es efectiva en reducir la ocupaci贸n con respecto a la redundancia modular con N r茅plicas, a un costo accesible en t茅rminos de tiempo de ejecuci贸n. La parte principal de esta tesis se enfoca en el 谩rea de auto-test en campo basado en software para la detecci贸n de fallas permanentes. Se propone un conjunto de m茅todos de observaci贸n utilizando hardware existente o ad-hoc, con el fin de obtener una mejor cobertura, en particular de las fallas de performance. Se presenta una extensa evaluaci贸n cuantitativa de los m茅todos propuestos, que incluye una comparaci贸n con los m茅todos tradicionalmente utilizados en tests de fin de fabricaci贸n y en campo. Los resultados muestran que los m茅todos propuestos son un buen complemento del m茅todo tradicionalmente usado que consiste en observar el valor final del contenido de memoria. Adem谩s muestran que una adecuada combinaci贸n de estos m茅todos complementarios permite alcanzar casi los mismos valores de cobertura de fallas que se obtienen mediante la observaci贸n continua de todas las salidas del procesador, m茅todo com煤nmente usado en tests de fin de fabricaci贸n, pero que usualmente no est谩 disponible en campo. Un subproducto muy interesante de lo arriba expuesto es la descripci贸n detallada del procedimiento para calcular la cobertura de fallas lograda mediante tests funcionales en campo por medio de un simulador de fallas convencional, una herramienta que usualmente se aplica en escenarios de test de fin de fabricaci贸n. Finalmente, otro resultado relevante en el 谩rea de test es un m茅todo para detectar fallas permanentes dentro de la l贸gica de coherencia de cache que est谩 integrada en el controlador de cache de cada procesador en un sistema multi procesador. El m茅todo est谩 basado en la ejecuci贸n de un programa de test en forma coordinada por parte de los diferentes procesadores. Por construcci贸n, el m茅todo cubre completamente las fallas de la l贸gica mencionad

    Low-cost and efficient fault detection and diagnosis schemes for modern cores

    Get PDF
    Continuous improvements in transistor scaling together with microarchitectural advances have made possible the widespread adoption of high-performance processors across all market segments. However, the growing reliability threats induced by technology scaling and by the complexity of designs are challenging the production of cheap yet robust systems. Soft error trends are haunting, especially for combinational logic, and parity and ECC codes are therefore becoming insufficient as combinational logic turns into the dominant source of soft errors. Furthermore, experts are warning about the need to also address intermittent and permanent faults during processor runtime, as increasing temperatures and device variations will accelerate inherent aging phenomena. These challenges specially threaten the commodity segments, which impose requirements that existing fault tolerance mechanisms cannot offer. Current techniques based on redundant execution were devised in a time when high penalties were assumed for the sake of high reliability levels. Novel light-weight techniques are therefore needed to enable fault protection in the mass market segments. The complexity of designs is making post-silicon validation extremely expensive. Validation costs exceed design costs, and the number of discovered bugs is growing, both during validation and once products hit the market. Fault localization and diagnosis are the biggest bottlenecks, magnified by huge detection latencies, limited internal observability, and costly server farms to generate test outputs. This thesis explores two directions to address some of the critical challenges introduced by unreliable technologies and by the limitations of current validation approaches. We first explore mechanisms for comprehensively detecting multiple sources of failures in modern processors during their lifetime (including transient, intermittent, permanent and also design bugs). Our solutions embrace a paradigm where fault tolerance is built based on exploiting high-level microarchitectural invariants that are reusable across designs, rather than relying on re-execution or ad-hoc block-level protection. To do so, we decompose the basic functionalities of processors into high-level tasks and propose three novel runtime verification solutions that combined enable global error detection: a computation/register dataflow checker, a memory dataflow checker, and a control flow checker. The techniques use the concept of end-to-end signatures and allow designers to adjust the fault coverage to their needs, by trading-off area, power and performance. Our fault injection studies reveal that our methods provide high coverage levels while causing significantly lower performance, power and area costs than existing techniques. Then, this thesis extends the applicability of the proposed error detection schemes to the validation phases. We present a fault localization and diagnosis solution for the memory dataflow by combining our error detection mechanism, a new low-cost logging mechanism and a diagnosis program. Selected internal activity is continuously traced and kept in a memory-resident log whose capacity can be expanded to suite validation needs. The solution can catch undiscovered bugs, reducing the dependence on simulation farms that compute golden outputs. Upon error detection, the diagnosis algorithm analyzes the log to automatically locate the bug, and also to determine its root cause. Our evaluations show that very high localization coverage and diagnosis accuracy can be obtained at very low performance and area costs. The net result is a simplification of current debugging practices, which are extremely manual, time consuming and cumbersome. Altogether, the integrated solutions proposed in this thesis capacitate the industry to deliver more reliable and correct processors as technology evolves into more complex designs and more vulnerable transistors.El continuo escalado de los transistores junto con los avances microarquitect贸nicos han posibilitado la presencia de potentes procesadores en todos los segmentos de mercado. Sin embargo, varios problemas de fiabilidad est谩n desafiando la producci贸n de sistemas robustos. Las predicciones de "soft errors" son inquietantes, especialmente para la l贸gica combinacional: soluciones como ECC o paridad se est谩n volviendo insuficientes a medida que dicha l贸gica se convierte en la fuente predominante de soft errors. Adem谩s, los expertos est谩n alertando acerca de la necesidad de detectar otras fuentes de fallos (causantes de errores permanentes e intermitentes) durante el tiempo de vida de los procesadores. Los segmentos "commodity" son los m谩s vulnerables, ya que imponen unos requisitos que las t茅cnicas actuales de fiabilidad no ofrecen. Estas soluciones (generalmente basadas en re-ejecuci贸n) fueron ideadas en un tiempo en el que con tal de alcanzar altos nivel de fiabilidad se asum铆an grandes costes. Son por tanto necesarias nuevas t茅cnicas que permitan la protecci贸n contra fallos en los segmentos m谩s populares. La complejidad de los dise帽os est谩 encareciendo la validaci贸n "post-silicon". Su coste excede el de dise帽o, y el n煤mero de errores descubiertos est谩 aumentando durante la validaci贸n y ya en manos de los clientes. La localizaci贸n y el diagn贸stico de errores son los mayores problemas, empeorados por las altas latencias en la manifestaci贸n de errores, por la poca observabilidad interna y por el coste de generar las se帽ales esperadas. Esta tesis explora dos direcciones para tratar algunos de los retos causados por la creciente vulnerabilidad hardware y por las limitaciones de los enfoques de validaci贸n. Primero exploramos mecanismos para detectar m煤ltiples fuentes de fallos durante el tiempo de vida de los procesadores (errores transitorios, intermitentes, permanentes y de dise帽o). Nuestras soluciones son de un paradigma donde la fiabilidad se construye explotando invariantes microarquitect贸nicos gen茅ricos, en lugar de basarse en re-ejecuci贸n o en protecci贸n ad-hoc. Para ello descomponemos las funcionalidades b谩sicas de un procesador y proponemos tres soluciones de `runtime verification' que combinadas permiten una detecci贸n de errores a nivel global. Estas tres soluciones son: un verificador de flujo de datos de registro y de computaci贸n, un verificador de flujo de datos de memoria y un verificador de flujo de control. Nuestras t茅cnicas usan el concepto de firmas y permiten a los dise帽adores ajustar los niveles de protecci贸n a sus necesidades, mediante compensaciones en 谩rea, consumo energ茅tico y rendimiento. Nuestros estudios de inyecci贸n de errores revelan que los m茅todos propuestos obtienen altos niveles de protecci贸n, a la vez que causan menos costes que las soluciones existentes. A continuaci贸n, esta tesis explora la aplicabilidad de estos esquemas a las fases de validaci贸n. Proponemos una soluci贸n de localizaci贸n y diagn贸stico de errores para el flujo de datos de memoria que combina nuestro mecanismo de detecci贸n de errores, junto con un mecanismo de logging de bajo coste y un programa de diagn贸stico. Cierta actividad interna es continuamente registrada en una zona de memoria cuya capacidad puede ser expandida para satisfacer las necesidades de validaci贸n. La soluci贸n permite descubrir bugs, reduciendo la necesidad de calcular los resultados esperados. Al detectar un error, el algoritmo de diagn贸stico analiza el registro para autom谩ticamente localizar el bug y determinar su causa. Nuestros estudios muestran un alto grado de localizaci贸n y de precisi贸n de diagn贸stico a un coste muy bajo de rendimiento y 谩rea. El resultado es una simplificaci贸n de las pr谩cticas actuales de depuraci贸n, que son enormemente manuales, inc贸modas y largas. En conjunto, las soluciones de esta tesis capacitan a la industria a producir procesadores m谩s fiables, a medida que la tecnolog铆a evoluciona hacia dise帽os m谩s complejos y m谩s vulnerables

    Kiel Declarative Programming Days 2013

    Get PDF
    This report contains the papers presented at the Kiel Declarative Programming Days 2013, held in Kiel (Germany) during September 11-13, 2013. The Kiel Declarative Programming Days 2013 unified the following events: * 20th International Conference on Applications of Declarative Programming and Knowledge Management (INAP 2013) * 22nd International Workshop on Functional and (Constraint) Logic Programming (WFLP 2013) * 27th Workshop on Logic Programming (WLP 2013) All these events are centered around declarative programming, an advanced paradigm for the modeling and solving of complex problems. These specification and implementation methods attracted increasing attention over the last decades, e.g., in the domains of databases and natural language processing, for modeling and processing combinatorial problems, and for high-level programming of complex, in particular, knowledge-based systems

    Improving the effective use of multithreaded architectures : implications on compilation, thread assignment, and timing analysis

    Get PDF
    This thesis presents cross-domain approaches that improve the effective use of multithreaded architectures. The contributions of the thesis can be classified in three groups. First, we propose several methods for thread assignment of network applications running in multithreaded network servers. Second, we analyze the problem of graph partitioning that is a part of the compilation process of multithreaded streaming applications. Finally, we present a method that improves the measurement-based timing analysis of multithreaded architectures used in time-critical environments. The following sections summarize each of the contributions. (1) Thread assignment on multithreaded processors: State-of-the-art multithreaded processors have different level of resource sharing (e.g. between thread running on the same core and globally shared resources). Thus, the way that threads of a given workload are assigned to processors' hardware contexts determines which resources the threads share, which, in turn, may significantly affect the system performance. In this thesis, we demonstrate the importance of thread assignment for network applications running in multithreaded servers. We also present TSBSched and BlackBox scheduler, methods for thread assignment of multithreaded network applications running on processors with several levels of resource sharing. Finally, we propose a statistical approach to the thread assignment problem. In particular, we show that running a sample of several hundred or several thousand random thread assignments is sufficient to capture at least one out of 1% of the best-performing assignments with a very high probability. We also describe the method that estimates the optimal system performance for given workload. We successfull y applied TSBSched, BlackBox scheduler, and the presented statistical approach to a case study of thread assignment of multithreaded network applications running on the UltraSPARC T2 processor. (2) Kernel partitioning of streaming applications: An important step in compiling a stream program to multiple processors is kernel partitioning. Finding an optimal kernel partition is, however, an intractable problem. We propose a statistical approach to the kernel partitioning problem. We describe a method that statistically estimates the performance of the optimal kernel partition. We demonstrate that the sampling method is an important part of the analysis, and that not all methods that generate random samples provide good results. We also show that random sampling on its own can be used to find a good kernel partition, and that it could be an alternative to heuristics-based approaches. The presented statistical method is applied successfully to the benchmarks included in the StreamIt 2.1.1 suite. (3) Multithreaded processors in time-critical environments: Despite the benefits that multithreaded commercial-of-the-shelf (MT COTS) processors may offer in embedded real-time systems, the time-critical market has not yet embraced a shift toward these architectures. The main challenge with MT COTS architectures is the difficulty when predicting the execution time of concurrently-running (co-running) time-critical tasks. Providing a timing analysis for real industrial applications running on MT COTS processors becomes extremely difficult because the execution time of a task, and hence its worst-case execution time (WCET) depends on the interference with co-running tasks in shared processor resources. We show that the measurement-based timing analysis used for single-threaded processors cannot be directly extended for MT COTS architectures. Also, we propose a methodology that quantifies the slowdown that a task may experience because of collision with co-running tasks in shared resources of MT COTS processor. The methodology is applied to a case study in which different time-critical applications were executed on several MT COTS multithreaded processors
    corecore