142 research outputs found

    Precise and Scalable Detection of Double-Fetch Bugs in OS Kernels

    Get PDF
    During system call execution, it is common for operating system kernels to read userspace memory multiple times (multi-reads). A critical bug may exist if the fetched userspace memory is subject to change across these reads, i.e., a race condition, which is known as a double-fetch bug. Prior works have attempted to detect these bugs both statically and dynamically. However, due to their improper assumptions and imprecise definitions regarding double-fetch bugs, their multiread detection is inherently limited and suffers from significant false positives and false negatives. For example, their approach is unable to support device emulation, inter-procedural analysis, loop handling, etc. More importantly, they completely leave the task of finding real double-fetch bugs from the haystack of multireads to manual verification, which is expensive if possible at all. In this paper, we first present a formal and precise definition of double-fetch bugs and then implement a static analysis system— DEADLINE—to automatically detect double-fetch bugs in OS kernels. DEADLINE uses static program analysis techniques to systematically find multi-reads throughout the kernel and employs specialized symbolic checking to vet each multi-read for double-fetch bugs. We apply DEADLINE to Linux and FreeBSD kernels and find 23 new bugs in Linux and one new bug in FreeBSD. We further propose four generic strategies to patch and prevent double-fetch bugs based on our study and the discussion with kernel maintainers

    Cali: Compiler Assisted Library Isolation

    Get PDF
    Software libraries can freely access the program's entire address space, and also inherit its system-level privileges. This lack of separation regularly leads to security-critical incidents once libraries contain vulnerabilities or turn rogue. We present Cali, a compiler-assisted library isolation system that fully automatically shields a program from a given library. Cali is fully compatible with mainline Linux and does not require supervisor privileges to execute. We compartmentalize libraries into their own process with well-defined security policies. To preserve the functionality of the interactions between program and library, Cali uses a Program Dependence Graph to track data flow between the program and the library during link time. We evaluate our open-source prototype against three popular libraries: Ghostscript, OpenSSL, and SQLite. Cali successfully reduced the amount of memory that is shared between the program and library to 0.08% (ImageMagick) - 0.4% (Socat), while retaining an acceptable program performance

    Low-cost and efficient fault detection and diagnosis schemes for modern cores

    Get PDF
    Continuous improvements in transistor scaling together with microarchitectural advances have made possible the widespread adoption of high-performance processors across all market segments. However, the growing reliability threats induced by technology scaling and by the complexity of designs are challenging the production of cheap yet robust systems. Soft error trends are haunting, especially for combinational logic, and parity and ECC codes are therefore becoming insufficient as combinational logic turns into the dominant source of soft errors. Furthermore, experts are warning about the need to also address intermittent and permanent faults during processor runtime, as increasing temperatures and device variations will accelerate inherent aging phenomena. These challenges specially threaten the commodity segments, which impose requirements that existing fault tolerance mechanisms cannot offer. Current techniques based on redundant execution were devised in a time when high penalties were assumed for the sake of high reliability levels. Novel light-weight techniques are therefore needed to enable fault protection in the mass market segments. The complexity of designs is making post-silicon validation extremely expensive. Validation costs exceed design costs, and the number of discovered bugs is growing, both during validation and once products hit the market. Fault localization and diagnosis are the biggest bottlenecks, magnified by huge detection latencies, limited internal observability, and costly server farms to generate test outputs. This thesis explores two directions to address some of the critical challenges introduced by unreliable technologies and by the limitations of current validation approaches. We first explore mechanisms for comprehensively detecting multiple sources of failures in modern processors during their lifetime (including transient, intermittent, permanent and also design bugs). Our solutions embrace a paradigm where fault tolerance is built based on exploiting high-level microarchitectural invariants that are reusable across designs, rather than relying on re-execution or ad-hoc block-level protection. To do so, we decompose the basic functionalities of processors into high-level tasks and propose three novel runtime verification solutions that combined enable global error detection: a computation/register dataflow checker, a memory dataflow checker, and a control flow checker. The techniques use the concept of end-to-end signatures and allow designers to adjust the fault coverage to their needs, by trading-off area, power and performance. Our fault injection studies reveal that our methods provide high coverage levels while causing significantly lower performance, power and area costs than existing techniques. Then, this thesis extends the applicability of the proposed error detection schemes to the validation phases. We present a fault localization and diagnosis solution for the memory dataflow by combining our error detection mechanism, a new low-cost logging mechanism and a diagnosis program. Selected internal activity is continuously traced and kept in a memory-resident log whose capacity can be expanded to suite validation needs. The solution can catch undiscovered bugs, reducing the dependence on simulation farms that compute golden outputs. Upon error detection, the diagnosis algorithm analyzes the log to automatically locate the bug, and also to determine its root cause. Our evaluations show that very high localization coverage and diagnosis accuracy can be obtained at very low performance and area costs. The net result is a simplification of current debugging practices, which are extremely manual, time consuming and cumbersome. Altogether, the integrated solutions proposed in this thesis capacitate the industry to deliver more reliable and correct processors as technology evolves into more complex designs and more vulnerable transistors.El continuo escalado de los transistores junto con los avances microarquitectónicos han posibilitado la presencia de potentes procesadores en todos los segmentos de mercado. Sin embargo, varios problemas de fiabilidad están desafiando la producción de sistemas robustos. Las predicciones de "soft errors" son inquietantes, especialmente para la lógica combinacional: soluciones como ECC o paridad se están volviendo insuficientes a medida que dicha lógica se convierte en la fuente predominante de soft errors. Además, los expertos están alertando acerca de la necesidad de detectar otras fuentes de fallos (causantes de errores permanentes e intermitentes) durante el tiempo de vida de los procesadores. Los segmentos "commodity" son los más vulnerables, ya que imponen unos requisitos que las técnicas actuales de fiabilidad no ofrecen. Estas soluciones (generalmente basadas en re-ejecución) fueron ideadas en un tiempo en el que con tal de alcanzar altos nivel de fiabilidad se asumían grandes costes. Son por tanto necesarias nuevas técnicas que permitan la protección contra fallos en los segmentos más populares. La complejidad de los diseños está encareciendo la validación "post-silicon". Su coste excede el de diseño, y el número de errores descubiertos está aumentando durante la validación y ya en manos de los clientes. La localización y el diagnóstico de errores son los mayores problemas, empeorados por las altas latencias en la manifestación de errores, por la poca observabilidad interna y por el coste de generar las señales esperadas. Esta tesis explora dos direcciones para tratar algunos de los retos causados por la creciente vulnerabilidad hardware y por las limitaciones de los enfoques de validación. Primero exploramos mecanismos para detectar múltiples fuentes de fallos durante el tiempo de vida de los procesadores (errores transitorios, intermitentes, permanentes y de diseño). Nuestras soluciones son de un paradigma donde la fiabilidad se construye explotando invariantes microarquitectónicos genéricos, en lugar de basarse en re-ejecución o en protección ad-hoc. Para ello descomponemos las funcionalidades básicas de un procesador y proponemos tres soluciones de `runtime verification' que combinadas permiten una detección de errores a nivel global. Estas tres soluciones son: un verificador de flujo de datos de registro y de computación, un verificador de flujo de datos de memoria y un verificador de flujo de control. Nuestras técnicas usan el concepto de firmas y permiten a los diseñadores ajustar los niveles de protección a sus necesidades, mediante compensaciones en área, consumo energético y rendimiento. Nuestros estudios de inyección de errores revelan que los métodos propuestos obtienen altos niveles de protección, a la vez que causan menos costes que las soluciones existentes. A continuación, esta tesis explora la aplicabilidad de estos esquemas a las fases de validación. Proponemos una solución de localización y diagnóstico de errores para el flujo de datos de memoria que combina nuestro mecanismo de detección de errores, junto con un mecanismo de logging de bajo coste y un programa de diagnóstico. Cierta actividad interna es continuamente registrada en una zona de memoria cuya capacidad puede ser expandida para satisfacer las necesidades de validación. La solución permite descubrir bugs, reduciendo la necesidad de calcular los resultados esperados. Al detectar un error, el algoritmo de diagnóstico analiza el registro para automáticamente localizar el bug y determinar su causa. Nuestros estudios muestran un alto grado de localización y de precisión de diagnóstico a un coste muy bajo de rendimiento y área. El resultado es una simplificación de las prácticas actuales de depuración, que son enormemente manuales, incómodas y largas. En conjunto, las soluciones de esta tesis capacitan a la industria a producir procesadores más fiables, a medida que la tecnología evoluciona hacia diseños más complejos y más vulnerables

    Ginkgo: A Modern Linear Operator Algebra Framework for High Performance Computing

    Get PDF
    © ACM, YYYY. This is the author's version of the work "Anzt, H., Cojean, T., Flegar, G., Göbel, F., Grützmacher, T., Nayak, P., ... & Quintana-Ortí, E. S. (2022). Ginkgo: A modern linear operator algebra framework for high performance computing. ACM Transactions on Mathematical Software (TOMS), 48(1), 1-33". It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM Transactions on Mathematical Software, {VOL48, ISS 1, (MAR 2022)} http://doi.acm.org/10.1145/3480935"[EN] In this article, we present GINKGO, a modern C++ math library for scientific high performance computing. While classical linear algebra libraries act on matrix and vector objects, Gnswo's design principle abstracts all functionality as linear operators," motivating the notation of a "linear operator algebra library" GINKGO'S current focus is oriented toward providing sparse linear algebra functionality for high performance graphics processing unit (GPU) architectures, but given the library design, this focus can be easily extended to accommodate other algorithms and hardware architectures. We introduce this sophisticated software architecture that separates core algorithms from architecture-specific backends and provide details on extensibility and sustainability measures. We also demonstrate GINKGO'S usability by providing examples on how to use its functionality inside the MFEM and deal.ii finite element ecosystems. Finally, we offer a practical demonstration of GINKGO'S high performance on state-of-the-art GPU architectures.This work was supported by the "Impuls und Vernetzungsfond of the Helmholtz Association" under grant VH-NG-1241. G. Flegar and E. S. Quintana-Orti were supported by project TIN2017-82972-R of the MINECO and FEDER and the H2020 EU FETHPC Project 732631 "OPRECOMP". This researchwas also supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration. The experiments on the NVIDIA A100 GPU were performed on the HAICORE@KIT partition, funded by the "Impuls und Vernetzungsfond" of the Helmholtz Association. The experiments on the AMD MI100 GPU were performed on Tulip, an early-access platform hosted by HPE.Anzt, H.; Cojean, T.; Flegar, G.; Göbel, F.; Grützmacher, T.; Nayak, P.; Ribizel, T.... (2022). Ginkgo: A Modern Linear Operator Algebra Framework for High Performance Computing. ACM Transactions on Mathematical Software. 48(1):1-33. https://doi.org/10.1145/348093513348

    Time is money, friend! Timing Side-channel Attack against Garbled Circuit Constructions

    Get PDF
    With the advent of secure function evaluation (SFE), distrustful parties can jointly compute on their private inputs without disclosing anything besides the results. Yao’s garbled circuit protocol has become an integral part of secure computation thanks to considerable efforts made to make it feasible, practical, and more efficient. These efforts have resulted in multiple optimizations on this primitive to enhance its performance by orders of magnitude over the last years. The advancement in protocols has also led to the development of general-purpose compilers and tools made available to academia and industry. For decades, the security of protocols offered in those tools has been assured with regard to sound proofs and the promise that during the computation, no information on parties’ input would be leaking. In a parallel effort, however, side-channel analysis (SCA) has gained momentum in connection with the real-world implementation of cryptographic primitives. Timing side-channel attacks have proven themselves effective in retrieving secrets from implementations, even through remote access to them. Nevertheless, the vulnerability of garbled circuit frameworks to timing attacks has, surprisingly, never been discussed in the literature. This paper introduces Goblin, the first timing attack against commonly employed garbled circuit frameworks. Goblin is a machine learning-assisted, non-profiling, single-trace timing SCA, which successfully recovers the garbler’s input during the computation under different scenarios, including various GC frameworks, benchmark functions, and the number of garbler’s input bits. Furthermore, we discuss Gob- lin’s success factors and countermeasures against that. In doing so, Goblin hopefully paves the way for further research in this matter
    • …
    corecore