23 research outputs found
Delay Measurements and Self Characterisation on FPGAs
This thesis examines new timing measurement methods for self delay characterisation of Field-Programmable Gate Arrays (FPGAs) components and delay measurement of complex circuits
on FPGAs. Two novel measurement techniques based on analysis of a circuit's output failure
rate and transition probability is proposed for accurate, precise and efficient measurement of
propagation delays. The transition probability based method is especially attractive, since
it requires no modifications in the circuit-under-test and requires little hardware resources,
making it an ideal method for physical delay analysis of FPGA circuits.
The relentless advancements in process technology has led to smaller and denser transistors
in integrated circuits. While FPGA users benefit from this in terms of increased hardware
resources for more complex designs, the actual productivity with FPGA in terms of timing
performance (operating frequency, latency and throughput) has lagged behind the potential
improvements from the improved technology due to delay variability in FPGA components
and the inaccuracy of timing models used in FPGA timing analysis. The ability to measure
delay of any arbitrary circuit on FPGA offers many opportunities for on-chip characterisation
and physical timing analysis, allowing delay variability to be accurately tracked and variation-aware optimisations to be developed, reducing the productivity gap observed in today's FPGA
designs.
The measurement techniques are developed into complete self measurement and characterisation platforms in this thesis, demonstrating their practical uses in actual FPGA hardware for
cross-chip delay characterisation and accurate delay measurement of both complex combinatorial and sequential circuits, further reinforcing their positions in solving the delay variability
problem in FPGAs
Observation mechanisms for in-field software-based self-test
When electronic systems are used in safety critical applications, as in the space,
avionic, automotive or biomedical areas, it is required to maintain a very low
probability of failures due to faults of any kind. Standards and regulations play
a significant role, forcing companies to devise and adopt solutions able to achieve
predefined targets in terms of dependability. Different techniques can be used to
reduce fault occurrence or to minimize the probability that those faults produce
critical failures (e.g., by introducing redundancy).
Unfortunately, most of these techniques have a severe impact on the cost of
the resulting product and, in some cases, the probability of failures is too large
anyway. Hence, a solution commonly used in several scenarios lies on periodically
performing a test able to detect the occurrence of any fault before it produces
a failure (in-field test). This solution is normally based on forcing the processor
inside the Device Under Test to execute a properly written test program, which is
able to activate possible faults and to make their effects visible in some observable
locations. This approach is also called Software-Based Self-Test, or SBST.
If compared with testing in an end of manufacturing scenario, in-field testing
has strong limitations in terms of access to the system inputs and outputs
because Design for Testability structures and testing equipment are usually not
available. As a consequence there are reduced possibilities to activate the faults
and to observe their effects.
This reduced observability particularly affects the ability to detect performance
faults, i.e. faults that modify the timing but not the final value of computations.
This kind of faults are hard to detect by only observing the final content of
predefined memory locations, that is the usual test result observation method used
in-field.
Initially, the present work was focused on fault tolerance techniques against
transient faults induced by ionizing radiation, the so called Single Event Upsets
(SEUs). The main contribution of this early stage of the thesis lies in the experimental
validation of the feasibility of achieving a safe system by using an
architecture that combines task-level redundancy with already available IP cores,
thus minimizing the development time. Task execution is replicated and Memory
Protection is used to guarantee that any SEU may affect one and only one
of the replicas. A proof of concept implementation was developed and validated
using fault injection. Results outline the effectiveness of the architecture, and the
overhead analysis shows that the proposed architecture is effective in reducing the
resource occupation with respect to N-modular redundancy, at an affordable cost
in terms of application execution time.
The main part of the thesis is focused on in-field software-based self-test of
permanent faults. A set of observation methods exploiting existing or ad-hoc
hardware is proposed, aimed at obtaining a better coverage, in particular of performance
faults. An extensive quantitative evaluation of the proposed methods
is presented, including a comparison with the observation methods traditionally
used in end of manufacturing and in-field testing.
Results show that the proposed methods are a good complement to the traditionally
used final memory content observation. Moreover, they show that an
adequate combination of these complementary methods allows for achieving nearly
the same fault coverage achieved when continuously observing all the processor
outputs, which is an observation method commonly used for production test but
usually not available in-field.
A very interesting by-product of what is described above is a detailed description
of how to compute the fault coverage achieved by functional in-field tests
using a conventional fault simulator, a tool that is usually applied in an end of
manufacturing testing scenario.
Finally, another relevant result in the testing area is a method to detect permanent
faults inside the cache coherence logic integrated in each cache controller
of a multi-core system, based on the concurrent execution of a test program by
the different cores in a coordinated manner. By construction, the method achieves
full fault coverage of the static faults in the addressed logic.Cuando se utilizan sistemas electr贸nicos en aplicaciones cr铆ticas como en las 谩reas biom茅dica, aeroespacial o automotriz, se requiere mantener una muy baja probabilidad de malfuncionamientos debidos a cualquier tipo de fallas. Los est谩ndares y normas juegan un papel importante, forzando a los desarrolladores a dise帽ar y adoptar soluciones que sean capaces de alcanzar objetivos predefinidos en cuanto a seguridad y confiabilidad. Pueden utilizarse diferentes t茅cnicas para reducir la ocurrencia de fallas o para minimizar la probabilidad de que esas fallas produzcan mal funcionamientos cr铆ticos, por ejemplo a trav茅s de la incorporaci贸n de redundancia. Lamentablemente, muchas de esas t茅cnicas afectan en gran medida el costo de los productos y, en algunos casos, la probabilidad de malfuncionamiento sigue siendo demasiado alta. En consecuencia, una soluci贸n usada a menudo en varios escenarios consiste en realizar peri贸dicamente un test que sea capaz de detectar la ocurrencia de una falla antes de que esta produzca un mal funcionamiento (test en campo). En general, esta soluci贸n se basa en forzar a un procesador existente dentro del dispositivo bajo prueba a ejecutar un programa de test que sea capaz de activar las posibles fallas y de hacer que sus efectos sean visibles en puntos observables. A esta metodolog铆a tambi茅n se la llama auto-test basado en software, o en ingl茅s Software-Based Self-Test (SBST). Si se lo compara con un escenario de test de fin de fabricaci贸n, el test en campo tiene fuertes limitaciones en t茅rminos de posibilidad de acceso a las entradas y salidas del sistema, porque usualmente no se dispone de equipamiento de test ni de la infraestructura de Design for Testability. En consecuencia se tiene menos posibilidades de activar las fallas y de observar sus efectos. Esta observabilidad reducida afecta particularmente la habilidad para detectar fallas de performance, es decir fallas que modifican la temporizaci贸n pero no el resultado final de los c谩lculos. Este tipo de fallas es dif铆cil de detectar por la sola observaci贸n del contenido final de lugares de memoria, que es el m茅todo usual que se utiliza para observar los resultados de un test en campo. Inicialmente, el presente trabajo estuvo enfocado en t茅cnicas para tolerar fallas transitorias inducidas por radiaci贸n ionizante, llamadas en ingl茅s Single Event Upsets (SEUs). La principal contribuci贸n de esa etapa inicial de la tesis reside en la validaci贸n experimental de la viabilidad de obtener un sistema seguro, utilizando una arquitectura que combina redundancia a nivel de tareas con el uso de m贸dulos hardware (IP cores) ya disponibles, que minimiza en consecuencia el tiempo de desarrollo. Se replica la ejecuci贸n de las tareas y se utiliza protecci贸n de memoria para garantizar que un SEU pueda afectar a lo sumo a una sola de las r茅plicas. Se desarroll贸 una implementaci贸n para prueba de concepto que fue validada mediante inyecci贸n de fallas. Los resultados muestran la efectividad de la arquitectura, y el an谩lisis de los recursos utilizados muestra que la arquitectura propuesta es efectiva en reducir la ocupaci贸n con respecto a la redundancia modular con N r茅plicas, a un costo accesible en t茅rminos de tiempo de ejecuci贸n. La parte principal de esta tesis se enfoca en el 谩rea de auto-test en campo basado en software para la detecci贸n de fallas permanentes. Se propone un conjunto de m茅todos de observaci贸n utilizando hardware existente o ad-hoc, con el fin de obtener una mejor cobertura, en particular de las fallas de performance. Se presenta una extensa evaluaci贸n cuantitativa de los m茅todos propuestos, que incluye una comparaci贸n con los m茅todos tradicionalmente utilizados en tests de fin de fabricaci贸n y en campo. Los resultados muestran que los m茅todos propuestos son un buen complemento del m茅todo tradicionalmente usado que consiste en observar el valor final del contenido de memoria. Adem谩s muestran que una adecuada combinaci贸n de estos m茅todos complementarios permite alcanzar casi los mismos valores de cobertura de fallas que se obtienen mediante la observaci贸n continua de todas las salidas del procesador, m茅todo com煤nmente usado en tests de fin de fabricaci贸n, pero que usualmente no est谩 disponible en campo. Un subproducto muy interesante de lo arriba expuesto es la descripci贸n detallada del procedimiento para calcular la cobertura de fallas lograda mediante tests funcionales en campo por medio de un simulador de fallas convencional, una herramienta que usualmente se aplica en escenarios de test de fin de fabricaci贸n. Finalmente, otro resultado relevante en el 谩rea de test es un m茅todo para detectar fallas permanentes dentro de la l贸gica de coherencia de cache que est谩 integrada en el controlador de cache de cada procesador en un sistema multi procesador. El m茅todo est谩 basado en la ejecuci贸n de un programa de test en forma coordinada por parte de los diferentes procesadores. Por construcci贸n, el m茅todo cubre completamente las fallas de la l贸gica mencionad