29 research outputs found

    Real-Time Trace Decoding and Monitoring for Safety and Security in Embedded Systems

    Get PDF
    Integrated circuits and systems can be found almost everywhere in today’s world. As their use increases, they need to be made safer and more perfor mant to meet current demands in processing power. FPGA integrated SoCs can provide the ideal trade-off between performance, adaptability, and energy usage. One of today’s vital challenges lies in updating existing fault tolerance techniques for these new systems while utilizing all available processing capa bilities, such as multi-core and heterogeneous processing units. Control-flow monitoring is one of the primary mechanisms described for error detection at the software architectural level for the highest grade of hazard level clas sifications (e.g., ASIL D) described in industry safety standards ISO-26262. Control-flow errors are also known to compose the majority of detected errors for ICs and embedded systems in safety-critical and risk-susceptible environ ments [5]. Software-based monitoring methods remain the most popular [6–8]. However, recent studies show that the overheads they impose make actual reliability gains negligible [9, 10]. This work proposes and demonstrates a new control flow checking method implemented in FPGA for multi-core embedded systems called control-flow trace checker (CFTC). CFTC uses existing trace and debug subsystems of modern processors to rebuild their execution states. It can iden tify any errors in real-time by comparing executed states to a set of permitted state transitions determined statically. This novel implementation weighs hardware resource trade-offs to target mul tiple independent tasks in multi-core embedded applications, as well as single core systems. The proposed system is entirely implemented in hardware and isolated from all monitored software components, requiring 2.4% of the target FPGA platform resources to protect an execution unit in its entirety. There fore, it avoids undesired overheads and maintains deterministic error detection latencies, which guarantees reliability improvements without impairing the target software system. Finally, CFTC is evaluated under different software i Resumo fault-injection scenarios, achieving detection rates of 100% of all control-flow errors to wrong destinations and 98% of all injected faults to program binaries. All detection times are further analyzed and precisely described by a model based on the monitor’s resources and speed and the software application’s control-flow structure and binary characteristics.Circuitos integrados estão presentes em quase todos sistemas complexos do mundo moderno. Conforme sua frequência de uso aumenta, eles precisam se tornar mais seguros e performantes para conseguir atender as novas demandas em potência de processamento. Sistemas em Chip integrados com FPGAs conseguem prover o balanço perfeito entre desempenho, adaptabilidade, e uso de energia. Um dos maiores desafios agora é a necessidade de atualizar técnicas de tolerância à falhas para estes novos sistemas, aproveitando os novos avanços em capacidade de processamento. Monitoramento de fluxo de controle é um dos principais mecanismos para a detecção de erros em nível de software para sistemas classificados como de alto risco (e.g. ASIL D), descrito em padrões de segurança como o ISO-26262. Estes erros são conhecidos por compor a maioria dos erros detectados em sistemas integrados [5]. Embora métodos de monitoramento baseados em software continuem sendo os mais populares [6–8], estudos recentes mostram que seus custos adicionais, em termos de performance e área, diminuem consideravelmente seus ganhos reais em confiabilidade [9, 10]. Propomos aqui um novo método de monitora mento de fluxo de controle implementado em FPGA para sistemas embarcados multi-core. Este método usa subsistemas de trace e execução de código para reconstruir o estado atual do processador, identificando erros através de com parações entre diferentes estados de execução da CPU. Propomos uma implementação que considera trade-offs no uso de recuros de sistema para monitorar múltiplas tarefas independetes. Nossa abordagem suporta o monitoramento de sistemas simples e também de sistemas multi-core multitarefa. Por fim, nossa técnica é totalmente implementada em hardware, evitando o uso de unidades de processamento de software que possa adicionar custos indesejáveis à aplicação em perda de confiabilidade. Propomos, assim, um mecanismo de verificação de fluxo de controle, escalável e extensível, para proteção de sistemas embarcados críticos e multi-core

    Real-time trace decoding and monitoring for safety and security in embedded systems

    Get PDF
    Integrated circuits and systems can be found almost everywhere in today’s world. As their use increases, they need to be made safer and more perfor mant to meet current demands in processing power. FPGA integrated SoCs can provide the ideal trade-off between performance, adaptability, and energy usage. One of today’s vital challenges lies in updating existing fault tolerance techniques for these new systems while utilizing all available processing capa bilities, such as multi-core and heterogeneous processing units. Control-flow monitoring is one of the primary mechanisms described for error detection at the software architectural level for the highest grade of hazard level clas sifications (e.g., ASIL D) described in industry safety standards ISO-26262. Control-flow errors are also known to compose the majority of detected errors for ICs and embedded systems in safety-critical and risk-susceptible environ ments [5]. Software-based monitoring methods remain the most popular [6–8]. However, recent studies show that the overheads they impose make actual reliability gains negligible [9, 10]. This work proposes and demonstrates a new control flow checking method implemented in FPGA for multi-core embedded systems called control-flow trace checker (CFTC). CFTC uses existing trace and debug subsystems of modern processors to rebuild their execution states. It can iden tify any errors in real-time by comparing executed states to a set of permitted state transitions determined statically. This novel implementation weighs hardware resource trade-offs to target mul tiple independent tasks in multi-core embedded applications, as well as single core systems. The proposed system is entirely implemented in hardware and isolated from all monitored software components, requiring 2.4% of the target FPGA platform resources to protect an execution unit in its entirety. There fore, it avoids undesired overheads and maintains deterministic error detection latencies, which guarantees reliability improvements without impairing the target software system. Finally, CFTC is evaluated under different software i Resumo fault-injection scenarios, achieving detection rates of 100% of all control-flow errors to wrong destinations and 98% of all injected faults to program binaries. All detection times are further analyzed and precisely described by a model based on the monitor’s resources and speed and the software application’s control-flow structure and binary characteristics.Circuitos integrados estão presentes em quase todos sistemas complexos do mundo moderno. Conforme sua frequência de uso aumenta, eles precisam se tornar mais seguros e performantes para conseguir atender as novas demandas em potência de processamento. Sistemas em Chip integrados com FPGAs conseguem prover o balanço perfeito entre desempenho, adaptabilidade, e uso de energia. Um dos maiores desafios agora é a necessidade de atualizar técnicas de tolerância à falhas para estes novos sistemas, aproveitando os novos avanços em capacidade de processamento. Monitoramento de fluxo de controle é um dos principais mecanismos para a detecção de erros em nível de software para sistemas classificados como de alto risco (e.g. ASIL D), descrito em padrões de segurança como o ISO-26262. Estes erros são conhecidos por compor a maioria dos erros detectados em sistemas integrados [5]. Embora métodos de monitoramento baseados em software continuem sendo os mais populares [6–8], estudos recentes mostram que seus custos adicionais, em termos de performance e área, diminuem consideravelmente seus ganhos reais em confiabilidade [9, 10]. Propomos aqui um novo método de monitora mento de fluxo de controle implementado em FPGA para sistemas embarcados multi-core. Este método usa subsistemas de trace e execução de código para reconstruir o estado atual do processador, identificando erros através de com parações entre diferentes estados de execução da CPU. Propomos uma implementação que considera trade-offs no uso de recuros de sistema para monitorar múltiplas tarefas independetes. Nossa abordagem suporta o monitoramento de sistemas simples e também de sistemas multi-core multitarefa. Por fim, nossa técnica é totalmente implementada em hardware, evitando o uso de unidades de processamento de software que possa adicionar custos indesejáveis à aplicação em perda de confiabilidade. Propomos, assim, um mecanismo de verificação de fluxo de controle, escalável e extensível, para proteção de sistemas embarcados críticos e multi-core

    Técnicas híbridas de tolerancia a fallos en microprocesadores

    Get PDF
    Este trabajo de tesis doctoral presenta tres nuevas técnicas de detección de errores de control de flujo en microprocesadores. Estas técnicas han sido implementadas de manera no intrusiva en un módulo hardware externo y se han combinado con técnicas de software para obtener una elevada capacidad de detección de errores. Se ha utilizado la interfaz de traza como medio de observación de la ejecución permitiendo la detección de errores de flujo de una manera no intrusiva y sin penalización en las prestaciones. La primera técnica propuesta ha sido denominada Predicción del Contador de Programa. Está técnica está basada en el cálculo del contador de programa a partir del código de instrucción actual y el valor previo del contador de programa. Esta técnica es capaz de detectar el subconjunto de errores que afectan al contador de programa de una manera muy eficiente y con un coste en términos de recursos necesarios para su implementación muy reducido. Adicionalmente, es importante destacar que la técnica Predicción del Contador de Programa es complementaria a las otras dos técnicas descritas en la tesis, Monitorización de firmas y Monitorización dual, y es necesaria para que éstas dos alcancen elevadas tasas de detección de errores de control de flujo. La técnica de Monitorización de Firmas realiza el cálculo de una firma online con el objetivo de verificar la ejecución de un bloque de instrucciones. Cada bloque tiene asignada una firma de referencia que es calculada en tiempo de compilación. La firma de referencia y la calculada en tiempo de ejecución son comparadas al final de la ejecución de cada bloque. Una de las mejoras que presenta esta técnica respecto a técnicas basadas en el cálculo de firmas propuestas por otros autores es que consigue reducir el tamaño de la memoria necesaria para almacenar las firmas de referencia utilizando una tabla denominada CFC-ST. Se han propuesto dos métodos de almacenamiento y acceso a la tabla CFC-ST, un método estático y un método dinámico. En ambos casos el impacto sobre el rendimiento del sistema es reducido al conseguir reducir el tamaño de la tabla necesaria para almacenar las firmas de referencia. La tercera técnica desarrollada en esta tesis doctoral es la técnica de Monitorización Dual. Esta técnica monitoriza la ejecución utilizando dos puntos de observación diferentes, la interfaz de traza y el bus de memoria. Gracias a estos dos puntos de observación se obtiene información de la ejecución en diferentes etapas del pipeline del microprocesador. En concreto se realiza la comparación del contador de programa y el código de instrucción obtenidos en la etapa de búsqueda con el valor de los mismos justo después de la etapa de ejecución. Uno de los puntos a destacar de las técnicas Monitorización Dual y Predicción del Contador de Programa es que pueden ser implementadas sin necesidad de almacenar información de la ejecución calculada en tiempo de compilación, reduciendo de una manera considerable el impacto sobre el rendimiento del sistema, ya que el número de recursos necesarios para su implementación es muy reducido. Se han realizado extensas campañas de inyección de fallos utilizando la herramienta AMUSE, lo cual nos ha permitido evaluar la efectividad de las tres técnicas propuestas en esta tesis doctoral. Los resultados obtenidos en cada una de las campañas de inyección demuestran que las técnicas presentadas en esta tesis detectan de una manera eficiente aquellos errores que afectan al flujo de programa alcanzando una buena solución de compromiso entre la cobertura a fallos y el impacto sobre el rendimiento del sistema.Programa Oficial de Doctorado en Ingeniería Eléctrica, Electrónica y AutomáticaPresidente: María Luisa López Vallejo.- Secretario: Enrique San Millán Heredia.- Vocal: Eduardo de la Torre Arnan

    E-QED: Electrical Bug Localization During Post-Silicon Validation Enabled by Quick Error Detection and Formal Methods

    Full text link
    During post-silicon validation, manufactured integrated circuits are extensively tested in actual system environments to detect design bugs. Bug localization involves identification of a bug trace (a sequence of inputs that activates and detects the bug) and a hardware design block where the bug is located. Existing bug localization practices during post-silicon validation are mostly manual and ad hoc, and, hence, extremely expensive and time consuming. This is particularly true for subtle electrical bugs caused by unexpected interactions between a design and its electrical state. We present E-QED, a new approach that automatically localizes electrical bugs during post-silicon validation. Our results on the OpenSPARC T2, an open-source 500-million-transistor multicore chip design, demonstrate the effectiveness and practicality of E-QED: starting with a failed post-silicon test, in a few hours (9 hours on average) we can automatically narrow the location of the bug to (the fan-in logic cone of) a handful of candidate flip-flops (18 flip-flops on average for a design with ~ 1 Million flip-flops) and also obtain the corresponding bug trace. The area impact of E-QED is ~2.5%. In contrast, deter-mining this same information might take weeks (or even months) of mostly manual work using traditional approaches

    PARALLEL EXECUTION TRACING: AN ALTERNATIVE SOLUTION TO EXPLOIT UNDER-UTILIZED RESOURCES IN MULTI-CORE ARCHITECTURES FOR CONTROL-FLOW CHECKING

    Get PDF
    In this paper, a software behavior-based technique is presented to detect control-flow errors in multi-core architectures. The analysis of a key point leads to introduce the proposed technique: employing under-utilized CPU resources in multi-core processors to check the execution flow of the programs concurrently and in parallel with the main executions. To evaluate the proposed technique, a quad-core processor system was used as the simulation environment, and the behavior of SPEC CPU2006 benchmarks were studied as the target to compare with conventional techniques. The experimental results, with regard to both detection coverage and performance overhead, demonstrate that on average about 94% of the control-flow errors can be detected by the proposed technique, more efficiently. This article has been retracted. Link to the retraction: http://casopisi.junis.ni.ac.rs/index.php/FUElectEnerg/article/view/337

    Using Context to Improve Network-based Exploit Kit Detection

    Get PDF
    Today, our computers are routinely compromised while performing seemingly innocuous activities like reading articles on trusted websites (e.g., the NY Times). These compromises are perpetrated via complex interactions involving the advertising networks that monetize these sites. Web-based compromises such as exploit kits are similar to any other scam -- the attacker wants to lure an unsuspecting client into a trap to steal private information, or resources -- generating 10s of millions of dollars annually. Exploit kits are web-based services specifically designed to capitalize on vulnerabilities in unsuspecting client computers in order to install malware without a user's knowledge. Sadly, it only takes a single successful infection to ruin a user's financial life, or lead to corporate breaches that result in millions of dollars of expense and loss of customer trust. Exploit kits use a myriad of techniques to obfuscate each attack instance, making current network-based defenses such as signature-based network intrusion detection systems far less effective than in years past. Dynamic analysis or honeyclient analysis on these exploits plays a key role in identifying new attacks for signature generation, but provides no means of inspecting end-user traffic on the network to identify attacks in real time. As a result, defenses designed to stop such malfeasance often arrive too late or not at all resulting in high false positive and false negative (error) rates. In order to deal with these drawbacks, three new detection approaches are presented. To deal with the issue of a high number of errors, a new technique for detecting exploit kit interactions on a network is proposed. The technique capitalizes on the fact that an exploit kit leads its potential victim through a process of exploitation by forcing the browser to download multiple web resources from malicious servers. This process has an inherent structure that can be captured in HTTP traffic and used to significantly reduce error rates. The approach organizes HTTP traffic into tree-like data structures, and, using a scalable index of exploit kit traces as samples, models the detection process as a subtree similarity search problem. The technique is evaluated on 3,800 hours of web traffic on a large enterprise network, and results show that it reduces false positive rates by four orders of magnitude over current state-of-the-art approaches. While utilizing structure can vastly improve detection rates over current approaches, it does not go far enough in helping defenders detect new, previously unseen attacks. As a result, a new framework that applies dynamic honeyclient analysis directly on network traffic at scale is proposed. The framework captures and stores a configurable window of reassembled HTTP objects network wide, uses lightweight content rendering to establish the chain of requests leading up to a suspicious event, then serves the initial response content back to the honeyclient in an isolated network. The framework is evaluated on a diverse collection of exploit kits as they evolve over a 1 year period. The empirical evaluation suggests that the approach offers significant operational value, and a single honeyclient can support a campus deployment of thousands of users. While the above approaches attempt to detect exploit kits before they have a chance to infect the client, they cannot protect a client that has already been infected. The final technique detects signs of post infection behavior by intrusions that abuses the domain name system (DNS) to make contact with an attacker. Contemporary detection approaches utilize the structure of a domain name and require hundreds of DNS messages to detect such malware. As a result, these detection mechanisms cannot detect malware in a timely manner and are susceptible to high error rates. The final technique, based on sequential hypothesis testing, uses the DNS message patterns of a subset of DNS traffic to detect malware in as little as four DNS messages, and with orders of magnitude reduction in error rates. The results of this work can make a significant operational impact on network security analysis, and open several exciting future directions for network security research.Doctor of Philosoph

    Low-cost and efficient fault detection and diagnosis schemes for modern cores

    Get PDF
    Continuous improvements in transistor scaling together with microarchitectural advances have made possible the widespread adoption of high-performance processors across all market segments. However, the growing reliability threats induced by technology scaling and by the complexity of designs are challenging the production of cheap yet robust systems. Soft error trends are haunting, especially for combinational logic, and parity and ECC codes are therefore becoming insufficient as combinational logic turns into the dominant source of soft errors. Furthermore, experts are warning about the need to also address intermittent and permanent faults during processor runtime, as increasing temperatures and device variations will accelerate inherent aging phenomena. These challenges specially threaten the commodity segments, which impose requirements that existing fault tolerance mechanisms cannot offer. Current techniques based on redundant execution were devised in a time when high penalties were assumed for the sake of high reliability levels. Novel light-weight techniques are therefore needed to enable fault protection in the mass market segments. The complexity of designs is making post-silicon validation extremely expensive. Validation costs exceed design costs, and the number of discovered bugs is growing, both during validation and once products hit the market. Fault localization and diagnosis are the biggest bottlenecks, magnified by huge detection latencies, limited internal observability, and costly server farms to generate test outputs. This thesis explores two directions to address some of the critical challenges introduced by unreliable technologies and by the limitations of current validation approaches. We first explore mechanisms for comprehensively detecting multiple sources of failures in modern processors during their lifetime (including transient, intermittent, permanent and also design bugs). Our solutions embrace a paradigm where fault tolerance is built based on exploiting high-level microarchitectural invariants that are reusable across designs, rather than relying on re-execution or ad-hoc block-level protection. To do so, we decompose the basic functionalities of processors into high-level tasks and propose three novel runtime verification solutions that combined enable global error detection: a computation/register dataflow checker, a memory dataflow checker, and a control flow checker. The techniques use the concept of end-to-end signatures and allow designers to adjust the fault coverage to their needs, by trading-off area, power and performance. Our fault injection studies reveal that our methods provide high coverage levels while causing significantly lower performance, power and area costs than existing techniques. Then, this thesis extends the applicability of the proposed error detection schemes to the validation phases. We present a fault localization and diagnosis solution for the memory dataflow by combining our error detection mechanism, a new low-cost logging mechanism and a diagnosis program. Selected internal activity is continuously traced and kept in a memory-resident log whose capacity can be expanded to suite validation needs. The solution can catch undiscovered bugs, reducing the dependence on simulation farms that compute golden outputs. Upon error detection, the diagnosis algorithm analyzes the log to automatically locate the bug, and also to determine its root cause. Our evaluations show that very high localization coverage and diagnosis accuracy can be obtained at very low performance and area costs. The net result is a simplification of current debugging practices, which are extremely manual, time consuming and cumbersome. Altogether, the integrated solutions proposed in this thesis capacitate the industry to deliver more reliable and correct processors as technology evolves into more complex designs and more vulnerable transistors.El continuo escalado de los transistores junto con los avances microarquitectónicos han posibilitado la presencia de potentes procesadores en todos los segmentos de mercado. Sin embargo, varios problemas de fiabilidad están desafiando la producción de sistemas robustos. Las predicciones de "soft errors" son inquietantes, especialmente para la lógica combinacional: soluciones como ECC o paridad se están volviendo insuficientes a medida que dicha lógica se convierte en la fuente predominante de soft errors. Además, los expertos están alertando acerca de la necesidad de detectar otras fuentes de fallos (causantes de errores permanentes e intermitentes) durante el tiempo de vida de los procesadores. Los segmentos "commodity" son los más vulnerables, ya que imponen unos requisitos que las técnicas actuales de fiabilidad no ofrecen. Estas soluciones (generalmente basadas en re-ejecución) fueron ideadas en un tiempo en el que con tal de alcanzar altos nivel de fiabilidad se asumían grandes costes. Son por tanto necesarias nuevas técnicas que permitan la protección contra fallos en los segmentos más populares. La complejidad de los diseños está encareciendo la validación "post-silicon". Su coste excede el de diseño, y el número de errores descubiertos está aumentando durante la validación y ya en manos de los clientes. La localización y el diagnóstico de errores son los mayores problemas, empeorados por las altas latencias en la manifestación de errores, por la poca observabilidad interna y por el coste de generar las señales esperadas. Esta tesis explora dos direcciones para tratar algunos de los retos causados por la creciente vulnerabilidad hardware y por las limitaciones de los enfoques de validación. Primero exploramos mecanismos para detectar múltiples fuentes de fallos durante el tiempo de vida de los procesadores (errores transitorios, intermitentes, permanentes y de diseño). Nuestras soluciones son de un paradigma donde la fiabilidad se construye explotando invariantes microarquitectónicos genéricos, en lugar de basarse en re-ejecución o en protección ad-hoc. Para ello descomponemos las funcionalidades básicas de un procesador y proponemos tres soluciones de `runtime verification' que combinadas permiten una detección de errores a nivel global. Estas tres soluciones son: un verificador de flujo de datos de registro y de computación, un verificador de flujo de datos de memoria y un verificador de flujo de control. Nuestras técnicas usan el concepto de firmas y permiten a los diseñadores ajustar los niveles de protección a sus necesidades, mediante compensaciones en área, consumo energético y rendimiento. Nuestros estudios de inyección de errores revelan que los métodos propuestos obtienen altos niveles de protección, a la vez que causan menos costes que las soluciones existentes. A continuación, esta tesis explora la aplicabilidad de estos esquemas a las fases de validación. Proponemos una solución de localización y diagnóstico de errores para el flujo de datos de memoria que combina nuestro mecanismo de detección de errores, junto con un mecanismo de logging de bajo coste y un programa de diagnóstico. Cierta actividad interna es continuamente registrada en una zona de memoria cuya capacidad puede ser expandida para satisfacer las necesidades de validación. La solución permite descubrir bugs, reduciendo la necesidad de calcular los resultados esperados. Al detectar un error, el algoritmo de diagnóstico analiza el registro para automáticamente localizar el bug y determinar su causa. Nuestros estudios muestran un alto grado de localización y de precisión de diagnóstico a un coste muy bajo de rendimiento y área. El resultado es una simplificación de las prácticas actuales de depuración, que son enormemente manuales, incómodas y largas. En conjunto, las soluciones de esta tesis capacitan a la industria a producir procesadores más fiables, a medida que la tecnología evoluciona hacia diseños más complejos y más vulnerables

    Virginia Commonwealth University Undergraduate Bulletin

    Get PDF
    Undergraduate bulletin for Virginia Commonwealth University for the academic year 2005-2006. It includes information on academic regulations, degree requirements, course offerings, faculty, academic calendar, and tuition and expenses for undergraduate programs
    corecore