10 research outputs found

    Towards providing low-overhead data race detection for large OpenMP applications

    Get PDF
    pre-printNeither static nor dynamic data race detection methods, by themselves, have proven to be sufficient for large HPC applications, as they often result in high runtime overheads and/or low race-checking accuracy. While combined static and dynamic approaches can fare better, creating such combinations, in practice, requires attention to many details. Specifically, existing state of the art dynamic race detectors are aimed at low level threading models, and cannot handle high level models such as OpenMP. Further, they do not provide mechanisms by which static analysis methods can target selected regions of code with sufficient precision. In this paper, we present our solutions to both challenges. Specifically, we identify patterns within OpenMP run times that tend to mislead existing dynamic race checkers and provide mechanisms that help establish an explicit happens before relation to prevent such misleading checks. We also implement a fine-grained blacklist mechanism to allow a runtime analyzer to exclude regions of code at line number granularity. We support race checking by adapting Thread Sanitizer, a mature data-race checker developed at Google that is now an integral part of Clang and GCC; and we have implemented our techniques within the state-of-the-art Intel OpenMP Runtime. Our results demonstrate that these techniques can significantly improve run time analysis accuracy and overhead in the context of data race checking of Open MP applications

    A toolchain to verify the parallelization of OmpSs-2 applications

    Get PDF
    Programming models for task-based parallelization based on compile-time directives are very effective at uncovering the parallelism available in HPC applications. Despite that, the process of correctly annotating complex applications is error-prone and may hinder the general adoption of these models. In this paper, we target the OmpSs-2 programming model and present a novel toolchain able to detect parallelization errors coming from non-compliant OmpSs-2 applications. Our toolchain verifies the compliance with the OmpSs-2 programming model using local task analysis to deal with each task separately, and structural induction to extend the analysis to the whole program. To improve the effectiveness of our tools, we also introduce some ad-hoc verification annotations, which can be used manually or automatically to disable the analysis of specific code regions. Experiments run on a sample of representative kernels and applications show that our toolchain can be successfully used to verify the parallelization of complex real-world applications.This project is supported by the European Union’s Horizon 2021 research and innovation programme under grant agreement No 754304 (DEEP-EST), by the European Union’s Horizon 2020 research and innovation programme under grant agreement No 871669 (AMPERE) and the Project HPCEUROPA3 (INFRAIA-2016-1-730897), by the Ministry of Economy of Spain through the Severo Ochoa Center of Excellence Program (SEV-2015-0493), by the Spanish Ministry of Science and Innovation (contract TIN2015-65316-P), and by the Generalitat de Catalunya (2017-SGR-1481).Peer ReviewedPostprint (author's final draft

    Hybrid Data Race Detection for Multicore Software

    Get PDF
    Multithreaded programs are prone to concurrency errors such as deadlocks, race conditions and atomicity violations. These errors are notoriously difficult to detect due to the non-deterministic nature of concurrent software running on multicore hardware. Data races result from the concurrent access of shared data by multiple threads and can result in unexpected program behaviors. Main dynamic data race detection techniques in the literature are happens-before and lockset algorithms which suffer from high execution time and memory overhead, miss many data races or produce a high number of false alarms. Our goal is to improve the performance of dynamic data race detection, while at the same time improving its accuracy by generating fewer false alarms. We develop a hybrid data race detection algorithm that is a combination of the happens-before and lockset algorithms in a tool. Rather than focusing on individual memory accesses by each thread, we focus on sequence of memory accesses by each thread, called a segment. This allows us to improve the performance of data race detection. We implement several optimizations on our hybrid data race detector and compare our technique with traditional happens-before and lockset detectors. The experiments are performed with C/C++ multithreaded benchmarks using Pthreads library from PARSEC suite and large applications such as Apache web server. Our experiments showed that our hybrid detector is 15 % faster than the happens-before detector and produces 50 % less potential data races than the lockset detector. Ultimately, a hybrid data race detector can improve the performance and accuracy of data race detection, enhancing its usability in practice

    CARISMA: a context-sensitive approach to race-condition sample-instance selection for multithreaded applications

    Get PDF
    Dynamic race detectors can explore multiple thread schedules of a multithreaded program over the same input to detect data races. Although existing sampling-based precise race detectors reduce overheads effectively so that lightweight precise race detection can be performed in testing or post-deployment environments, they are ineffective in detecting races if the sampling rates are low. This paper presents CARISMA to address this problem. CARISMA exploits the insight that along an execution trace, a program may potentially handle many accesses to the memory locations created at the same site for similar purposes. Iterating over multiple execution trials of the same input, CARISMA estimates and distributes the sampling budgets among such location creation sites, and probabilistically collects a fraction of all accesses to the memory locations associated with such sites for subsequent race detection. Our experiment shows that, compared with PACER on the same platform and at the same sampling rate (such as 1%), CARISMA is significantly more effective. © 2012 ACM.postprin

    TachoRace: Exploiting Performance Counters for Run-Time Race Detection

    Get PDF

    Doctor of Philosophy

    Get PDF
    dissertationHigh Performance Computing (HPC) on-node parallelism is of extreme importance to guarantee and maintain scalability across large clusters of hundreds of thousands of multicore nodes. HPC programming is dominated by the hybrid model "MPI + X", with MPI to exploit the parallelism across the nodes, and "X" as some shared memory parallel programming model to accomplish multicore parallelism across CPUs or GPUs. OpenMP has become the "X" standard de-facto in HPC to exploit the multicore architectures of modern CPUs. Data races are one of the most common and insidious of concurrent errors in shared memory programming models and OpenMP programs are not immune to them. The OpenMP-provided ease of use to parallelizing programs can often make it error-prone to data races which become hard to find in large applications with thousands lines of code. Unfortunately, prior tools are unable to impact practice owing to their poor coverage or poor scalability. In this work, we develop several new approaches for low overhead data race detection. Our approaches aim to guarantee high precision and accuracy of race checking while maintaining a low runtime and memory overhead. We present two race checkers for C/C++ OpenMP programs that target two different classes of programs. The first, ARCHER, is fast but requires large amount of memory, so it ideally targets applications that require only a small portion of the available on-node memory. On the other hand, SWORD strikes a balance between fast zero memory overhead data collection followed by offline analysis that can take a long time, but it often report most races quickly. Given that race checking was impossible for large OpenMP applications, our contributions are the best available advances in what is known to be a difficult NP-complete problem. We performed an extensive evaluation of the tools on existing OpenMP programs and HPC benchmarks. Results show that both tools guarantee to identify all the races of a program in a given run without reporting any false alarms. The tools are user-friendly, hence serve as an important instrument for the daily work of programmers to help them identify data races early during development and production testing. Furthermore, our demonstrated success on real-world applications puts these tools on the top list of debugging tools for scientists at large

    Veröffentlichungen und Vorträge 2009 der Mitglieder der Fakultät für Informatik

    Get PDF

    High-level compiler analysis for OpenMP

    Get PDF
    Nowadays, applications from dissimilar domains, such as high-performance computing and high-integrity systems, require levels of performance that can only be achieved by means of sophisticated heterogeneous architectures. However, the complex nature of such architectures hinders the production of efficient code at acceptable levels of time and cost. Moreover, the need for exploiting parallelism adds complications of its own (e.g., deadlocks, race conditions,...). In this context, compiler analysis is fundamental for optimizing parallel programs. There is however a trade-off between complexity and profit: low complexity analyses (e.g., reaching definitions) provide information that may be insufficient for many relevant transformations, and complex analyses based on mathematical representations (e.g., polyhedral model) give accurate results at a high computational cost. A range of parallel programming models providing different levels of programmability, performance and portability enable the exploitation of current architectures. However, OpenMP has proved many advantages over its competitors: 1) it delivers levels of performance comparable to highly tunable models such as CUDA and MPI, and better robustness than low level libraries such as Pthreads; 2) the extensions included in the latest specification meet the characteristics of current heterogeneous architectures (i.e., the coupling of a host processor to one or more accelerators, and the capability of expressing fine-grained, both structured and unstructured, and highly-dynamic task parallelism); 3) OpenMP is widely implemented by several chip (e.g., Kalray MPPA, Intel) and compiler (e.g., GNU, Intel) vendors; and 4) although currently the model lacks resiliency and reliability mechanisms, many works, including this thesis, pursue their introduction in the specification. This thesis addresses the study of compiler analysis techniques for OpenMP with two main purposes: 1) enhance the programmability and reliability of OpenMP, and 2) prove OpenMP as a suitable model to exploit parallelism in safety-critical domains. Particularly, the thesis focuses on the tasking model because it offers the flexibility to tackle the parallelization of algorithms with load imbalance, recursiveness and uncountable loop based kernels. Additionally, current works have proved the time-predictability of this model, shortening the distance towards its introduction in safety-critical domains. To enable the analysis of applications using the OpenMP tasking model, the first contribution of this thesis is the extension of a set of classic compiler techniques with support for OpenMP. As a basis for including reliability mechanisms, the second contribution consists of the development of a series of algorithms to statically detect situations involving OpenMP tasks, which may lead to a loss of performance, non-deterministic results or run-time failures. A well-known problem of parallel processing related to compilers is the static scheduling of a program represented by a directed graph. Although the literature is extensive in static scheduling techniques, the work related to the generation of the task graph at compile-time is very scant. Compilers are limited by the knowledge they can extract, which depends on the application and the programming model. The third contribution of this thesis is the generation of a predicated task dependency graph for OpenMP that can be interpreted by the runtime in such a way that the cost of solving dependences is reduced to the minimum. With the previous contributions as a basis for determining the functional safety of OpenMP, the final contribution of this thesis is the adaptation of OpenMP to the safety-critical domain considering two directions: 1) indicating how OpenMP can be safely used in such a domain, and 2) integrating OpenMP into Ada, a language widely used in the safety-critical domain.Actualment, aplicacions de dominis diversos com la computació d'altes prestacions i els sistemes d'alta integritat, requereixen nivells de rendiment assolibles només mitjançant arquitectures heterogènies sofisticades. No obstant, la natura complexa d'aquestes dificulta la producció de codi eficient en un temps i cost acceptables. A més, la necessitat d’explotar paral·lelisme introdueix complicacions en sí mateixa (p. ex. bloqueig mutu, condicions de carrera,...). En aquest context, l'anàlisi de compiladors és fonamental per optimitzar programes paral·lels. Existeix però un equilibri entre complexitat i beneficis: la informació obtinguda amb anàlisis simples (p. ex. definicions abastables) pot ser insuficient per moltes transformacions rellevants, i anàlisis complexos basats en models matemàtics (p. ex. model polièdric) faciliten resultats acurats a un alt cost computacional. Existeixen molts models de programació paral·lela que proporcionen diferents nivells de programabilitat, rendiment i portabilitat per l'explotació de les arquitectures actuals. En aquest marc, OpenMP ha demostrat molts avantatges respecte dels seus competidors: 1) el seu nivell de rendiment és comparable a models molt ajustables com CUDA i MPI, i proporciona més robustesa que llibreries de baix nivell com Pthreads; 2) les extensions que inclou la darrera especificació satisfan les característiques de les actuals arquitectures heterogènies (és a dir, l’acoblament d’un processador principal i un o més acceleradors, i la capacitat d'expressar paral·lelisme de tasques de gra fi, ja sigui estructurat o sense estructura; 3) OpenMP és àmpliament implementat per venedors de xips (p. ex. Kalray MPPA, Intel) i compiladors (p. ex. GNU, Intel); i 4) tot i que el model actual manca de mecanismes de resiliència i fiabilitat, molts treballs, incloent aquesta tesi, busquen la seva introducció a l'especificació. Aquesta tesi adreça l'estudi de tècniques d’anàlisi de compiladors amb dos objectius: 1) millorar la programabilitat i la fiabilitat de OpenMP, i 2) provar que OpenMP és un model adequat per explotar paral·lelisme en sistemes crítics. En particular, la tesi es centra en el model de tasques per què aquest ofereix la flexibilitat per abordar aplicacions amb problemes de balanceig de càrrega, recursivitat i bucles incomptables. A més, treballs recents han provat la predictibilitat en qüestió de temps del model, escurçant la distància cap a la seva introducció en sistemes crítics. Per a poder analitzar aplicacions que utilitzen el model de tasques d’OpenMP, la primera contribució d’aquesta tesi consisteix en l’extensió d'un conjunt de tècniques clàssiques de compilació per suportar OpenMP. Com a base per incloure mecanismes de fiabilitat, la segona contribució consisteix en el desenvolupament duna sèrie d'algorismes per detectar de forma estàtica situacions que involucren tasques d’OpenMP, i que poden conduir a una pèrdua de rendiment, resultats no deterministes, o fallades en temps d’execució. Un problema ben conegut del processament paral·lel relacionat amb els compiladors és la planificació estàtica d’un programa representat mitjançant un graf dirigit. Tot i que la literatura sobre planificació estàtica és extensa, aquella relacionada amb la generació del graf en temps de compilació és molt escassa. Els compiladors estan limitats pel coneixement que poden extreure, que depèn de l’aplicació i del model de programació. La tercera contribució de la tesi és la generació d’un graf de dependències enriquit que pot ser interpretat pel sistema en temps d’execució de manera que el cost de resoldre les dependències sigui mínim. Amb les anteriors contribucions com a base per a determinar la seguretat funcional de OpenMP, la darrera contribució de la tesi consisteix en adaptar OpenMP a sistemes crítics, explorant dues direccions: 1) indicar com OpenMP es pot utilitzar de forma segura en un domini com, i 2) integrar OpenMP en Ada, un llenguatge molt utilitzat en el domini de seguretat.Postprint (published version

    Lightweight Web-Tool for C Concurrent Programming

    Get PDF
    [ES] El uso de herramientas a la hora de enseñar una determinada disciplina aporta múltiples beneficios desde el punto de vista de la actividad docente pues permite enfatizar o ilustrar determinados cuestiones que a veces resultan difíciles de enfatizar sin tal apoyo. Ese es también el caso de las herramientas que permiten detectar si ha habido algún tipo de problema en un programa escrito en C- concurrente. Dichas herramientas ofrecen interfaces que pueden complementar la información dada por un compilador con información adicional sobre diferentes tipos de condiciones de carrera o fugas de memoria que aparecen en el código. El presente trabajo tiene por objetivo ver cómo se ha integrado un núcleo de validación para C ya existente como aplicación web, lo que le permite estar accesible a través de la red. Dicha herramienta ha sido evaluada en un curso de programación ya existente, donde ha mostrado que es capaz aportar información adicional de utilidad para el discente y el docente. También se han realizado una serie de mediciones de rendimiento para establecer los límites operativos de la herramienta diseñada dentro de los límites de una asignatura donde se enseña C concurrente.[EN] Tools for computer-aided teaching and learning provide multiple benefits from the point of view of teaching because it allows emphasizing or illustrating certain issues that are sometimes difficult to emphasize without such type of support. This is exactly the case for the tools to detect if there is any type of problem in a concurrent-C program. These tools provide interfaces that can complement the information given by a compiler with additional information about different types of race conditions and memory leaks that appear in the code. This article aims to address how to integrate a core validation tools for concurrent-C as a web application, allowing you to be accessible through the Internet. This tool has been evaluated in an existing programming course, which has shown to be able to provide additional information useful to the learner and the teacher. There have also been a number of performance measures to establish operational limits designed tool within a course that teaches concurrent-C programming.Parcialmente financiado por ARTEMIS JTU y el Ministerio de Industria, Comercio y Turismo español y también de forma parcial por REM4VSS (TIN2011- 28339) del Ministerio de Ciencia e Innovación y e-Madrid (S2009/TIC-1650).Basanta Val, P.; García Valls, M.; López Anastasio, P. (2013). Herramienta Web Ligera para La Programación en C-Concurrente. Revista Iberoamericana de Automática e Informática industrial. 10(4):465-476. https://doi.org/10.1016/j.riai.2013.05.010OJS465476104Alonso, D., Pastor, J. & Álvarez, B. 2004, “Real–Time Teaching with Java: JPR 3” in On the Move to Meaningful Internet Systems 2004: OTM 2004 Workshops, eds. R. Meersman, Z. Tari & A. Corsaro, Springer Berlin Heidelberg,, pp. 246-255.Basanta Val, P. & Garcia-Valls, M. 2013, “A Distributed Real-Time Java- centric Architecture for Industrial Systems”, Industrial Informatics, IEEE Transactions on, vol. PP, no. 99, pp. 1-1.Basanta-Val .P, García-Valls, M., Estévez-Ayres, I. & Martin-Gutiérrez, M.J. 2012, “Módulo Empresarial para la Validación Formal de Ejercicios aplicado a la Programación Concurrente en Java”, Revista Iberoamericana de Automática e Informática Industrial RIAI, vol. 9, no. 3, pp. 209-299.Bouyssounouse, B. & Sifakis, J. 2005, Embedded systems design: the ARTIST roadmap for research and development, Springer, Verlag, NJ, USA.Caspi, P., Folher, G., Garcia-Valls, M., Kopetz, H., Lakhnech, Y., Laroussinie, F., Lavagno, L., Lipari, G., Maraninchi, F., Peti, P., Puente, J.d.l., Sangiovanni-Vincentelli, A., Scaife, N., Sifakis, J., de Simone, R., Torngren, M., Veríssimo, P., Wellings, A.J., Wilhelm, R., Willemse, T., Yi, W., Almeida, L., Benveniste, A., Bouyssounouse, B., Buttazzo, G., Crnkovic, I., Damm, W. & Engblom, J. 2005, “Guidelines for a graduate curriculum on embedded software and systems”, ACM Transactions on Embedded Computing Systems, vol. 4, no. 3.Committee, P.A.S. 2003, POSIX Realtime and Embedded application Support, IEEE Standard for Information Technology.Crenshaw, T. L. A. (2013). Using Robots and Contract Learning to Teach Cyber-Physical Systems to Undergraduates. IEEE Transactions on Education, 56(1), 116-120. doi:10.1109/te.2012.2217967Cuevas, C., Barros, L., Martínez, P. L., & Drake, J. M. (2013). Beneficios que aporta la metodología MDE a los entornos de desarrollo de sistemas de tiempo real. Revista Iberoamericana de Automática e Informática Industrial RIAI, 10(2), 216-227. doi:10.1016/j.riai.2013.03.011Estevez-Avres, I., Basanta-Val P. & García-Valls, M. 2004, “Docencia de programación concurrente. Experiencias de Laboratorio.”, VII Jornadas de Tiempo Real.Hamblen, J. O., & van Bekkum, G. M. E. (2013). An Embedded Systems Laboratory to Support Rapid Prototyping of Robotics and the Internet of Things. IEEE Transactions on Education, 56(1), 121-128. doi:10.1109/te.2012.2227320Havelund, K., & Pressburger, T. (2000). Model checking JAVA programs using JAVA PathFinder. International Journal on Software Tools for Technology Transfer (STTT), 2(4), 366-381. doi:10.1007/s100090050043Ihantola, P. 2006, “Test data generation for programming exercises with symbolic execution in Java PathFinder”, Proceedings of the 6th Baltic Sea conference on Computing education research: Koli Calling 2006ACM, New York, NY, USA, pp. 87.Jannesari, A., Kaibin Bao, Pankratius, V. & Tichy, W.F. 2009, “Helgrind+: An efficient dynamic race detector”, Parallel Distributed Processing, 2009. IPDPS 2009. IEEE International Symposium on, may, pp. 1.Kim, S. H., & Jeon, J. W. (2009). Introduction for Freshmen to Embedded Systems Using LEGO Mindstorms. IEEE Transactions on Education, 52(1), 99-108. doi:10.1109/te.2008.919809Lawrence Livermore National Laboratory, POSIX Threads Programming Exercise [2012,8].Lee, J.W., Kester, M.S. & Schulzrinne, H. 2011, “Follow the river and you will find the C”, Proceedings of the 42nd ACM technical symposium on Computer science educationACM, New York, NY, USA, pp. 411.Muñoz-Merino, P. J., Fernández Molina, M., Muñoz-Organero, M., & Delgado Kloos, C. (2012). An adaptive and innovative question-driven competition-based intelligent tutoring system for learning. Expert Systems with Applications, 39(8), 6932-6948. doi:10.1016/j.eswa.2012.01.020Nethercote, N., & Seward, J. (2007). Valgrind. ACM SIGPLAN Notices, 42(6), 89-100. doi:10.1145/1273442.1250746Pardo, A., & Kloos, C. D. (2010). SubCollaboration: large-scale group management in collaborative learning. Software: Practice and Experience, 41(4), 449-465. doi:10.1002/spe.1023Sáez, S., & Crespo, A. (2013). Mejora de los Test de Planificabilidad para Asignación Incremental de Tareas en Sistemas Multiprocesadores de Tiempo Real. Revista Iberoamericana de Automática e Informática Industrial RIAI, 10(2), 197-203. doi:10.1016/j.riai.2013.03.006Salido, J., Lillo, A., Déniz, Ó., & Bueno, M. G. (2011). CtrWeb: Una herramienta de programación para telecontrol de sistemas físicos educativos. Revista Iberoamericana de Automática e Informática Industrial RIAI, 8(1), 89-99. doi:10.1016/s1697-7912(11)70011-5Savage, S., Burrows, M., Nelson, G., Sobalvarro, P., & Anderson, T. (1997). Eraser. ACM Transactions on Computer Systems, 15(4), 391-411. doi:10.1145/265924.265927Serebryany, K., & Iskhodzhanov, T. (2009). ThreadSanitizer. Proceedings of the Workshop on Binary Instrumentation and Applications - WBIA ’09. doi:10.1145/1791194.1791203Sierra, A.J., Ariza, T., Fernandez, F.J. & Madinabeitia, G. 2012, “TVSP: A Tool for Validation Software Projects in programming labs”, Global Engineering Education Conference (EDUCON), 2012 IEEE, april, pp. 1.Sun Microsystems. 2005, Online [2005] at http://jcp.org/aboutJava/communityprocess/pr/jsr220/index.html-last update, Enterprise Java Beans [Homepage of SUN],.[Online].Weber, J. & Rehkopf, A. 2009, “A Java-based remote GUI concept for distributed automation systems”, Emerging Technologies Factory Automation, 2009. ETFA 2009. IEEE Conference on, sept., pp. 1
    corecore