22 research outputs found
Worst case execution time and power estimation of multicore and GPU software: a pedestrian detection use case
Worst Case Execution Time estimation of software running on parallel platforms is a challenging task, due to resource interference of other tasks and the complexity of the underlying CPU and GPU hardware architectures. Similarly, the increased complexity of the hardware, challenges the estimation of worst case power consumption. In this paper, we employ Measurement Based Probabilistic Timing Analysis (MBPTA), which is capable of managing complex architectures such as multicores. We enable its use by software randomisation, which we show for the first time that is also possible on GPUs. We demonstrate our method on a pedestrian detection use case on an embedded multicore and GPU platform for the automotive domain, the NVIDIA Xavier. Moreover, we extend our measurement based probabilistic method in order to predict the worst case power consumption of the software on the same platform.This work was funded by the Ministerio de Ciencia e Innovación - Agencia Estatal de Investigación (PID2019-107255GB- C21/AEI/10.13039/501100011033 and IJC-2020-045931-I), the European Commission’s Horizon 2020 programme under the UP2DATE project (grant agreement 871465), an ERC grant (No. 772773) and the HiPEAC Network of ExcellencePeer ReviewedPostprint (author's final draft
Evaluation of the parallel computational capabilities of embedded platforms for critical systems
Modern critical systems need higher performance which cannot be delivered by the simple architectures used so far. Latest embedded architectures feature multi-cores and GPUs, which can be used to satisfy this need. In this thesis we parallelise relevant applications from multiple critical domains represented in the GPU4S benchmark suite, and perform a comparison of the parallel capabilities of candidate platforms for use in critical systems. In particular, we port the open source GPU4S Bench benchmarking suite in the OpenMP programming model, and we benchmark the candidate embedded heterogeneous multi-core platforms of the H2020 UP2DATE project, NVIDIA TX2, NVIDIA Xavier and Xilinx Zynq Ultrascale+, in order to drive the selection of the research platform which will be used in the next phases of the project. Our result indicate that in terms of CPU and GPU performance, the NVIDIA Xavier is the highest performing platform
Real-time high-performance computing for embedded control systems
The real-time control systems industry is moving towards the consolidation of multiple computing systems into fewer and more powerful ones, aiming for a reduction in size, weight, and power. The increasing demand for higher performance in other critical domains like autonomous driving has led the industry to recently include embedded GPUs for the implementation of advanced functionalities. The highly parallel architecture of GPUs could also be leveraged in the control systems industry to develop more advanced, energy-efficient, and scalable control systems. However, the closed-source and non-deterministic nature of GPUs complicates the resource provisioning analysis required for the implementation of critical real-time systems. On the other hand, there is no indication of the integration of GPUs in the traditional development cycle of control systems, which is oriented to the use of a model-based design approach. Recently, some model-based design tools vendors have extended their development frameworks with GPU code generation capabilities targeting hybrid computing platforms, so that the model-based design environment now enables the concurrent analysis of more complex and diverse functions by simulation and automating the deployment to the final target. However, there is no indication whether these tools are well-suited for the design and development of time-sensitive systems.
Motivated by these challenges, in this thesis, we contribute to the state of the art of real-time control systems towards the adoption of embedded GPUs by providing tools to facilitate the resource provisioning analysis and the integration in the model-based design development cycle. First, we present a methodology and an automated tool to extract the properties of GPU memory allocators. This tool allows the computation of the real amount of memory used by GPU applications, facilitating a correct resource provisioning analysis. Then, we present a library which allows the characterization of the use of dynamic memory in GPU applications. We use this library to characterize GPU benchmarks and we identify memory allocation patterns that could be modified to improve performance and memory consumption when targeting embedded GPUs. Based on these results, we present a tool to optimize the use of dynamic memory in legacy GPU applications executed on embedded platforms. This tool allows us to minimize the memory consumption and memory management overhead of GPU applications without rewriting them. Afterwards, we analyze the timing of control algorithms executed in embedded GPUs and we identify techniques to achieve an acceptable real-time behavior. Finally, we evaluate model-based design tools in terms of integration with GPU hardware and GPU code generation, and we propose improvements for the model-based generated GPU code. Then, we present a source-to-source transformation tool to automatically apply the proposed improvements.La industria de los sistemas de control en tiempo real avanza hacia la consolidación de múltiples sistemas informáticos en menos y más potentes sistemas, con el objetivo de reducir el tamaño, el peso y el consumo. La creciente demanda de un mayor rendimiento en otros dominios críticos, como la conducción autónoma, ha llevado a la industria a incluir recientemente GPU embebidas para la implementación de funcionalidades avanzadas. La arquitectura altamente paralela de las GPU también podría aprovecharse en la industria de los sistemas de control para desarrollar sistemas de control más avanzados, eficientes energéticamente y escalables. Sin embargo, la naturaleza privativa y no determinista de las GPUs complica el análisis de aprovisionamiento de recursos requerido para la implementación de sistemas críticos en tiempo real. Por otro lado, no hay indicios de la integración de las GPU en el ciclo de desarrollo tradicional de los sistemas de control, que está orientado al uso de un enfoque de diseño basado en modelos. Recientemente, algunos proveedores de herramientas de diseño basado en modelos han ampliado sus entornos de desarrollo con capacidades de generación de código de GPU dirigidas a plataformas informáticas híbridas, de modo que el entorno de diseño basado en modelos ahora permite el análisis simultáneo de funciones más complejas y diversas mediante la simulación y la automatización de la implementación para el objetivo final. Sin embargo, no hay indicación de si estas herramientas son adecuadas para el diseño y desarrollo de sistemas sensibles al tiempo. Motivados por estos desafíos, en esta tesis contribuimos al estado del arte de los sistemas de control en tiempo real hacia la adopción de GPUs integradas al proporcionar herramientas para facilitar el análisis de aprovisionamiento de recursos y la integración en el ciclo de desarrollo de diseño basado en modelos. Primero, presentamos una metodología y una herramienta automatizada para extraer las propiedades de los asignadores de memoria en GPUs. Esta herramienta permite el cómputo de la cantidad real de memoria utilizada por las aplicaciones GPU, facilitando un correcto análisis del aprovisionamiento de recursos. Luego, presentamos una librería que permite la caracterización del uso de memoria dinámica en aplicaciones de GPU. Usamos esta librería para caracterizar una serie de benchmarks GPU e identificamos patrones de asignación de memoria que podrían modificarse para mejorar el rendimiento y el consumo de memoria al utilizar GPUs embebidas. Con base en estos resultados, presentamos también una herramienta para optimizar el uso de la memoria dinámica en aplicaciones de GPU heredadas al ser ejecutadas en plataformas embebidas. Esta herramienta nos permite minimizar el consumo de memoria y la sobrecarga de administración de memoria de las aplicaciones GPU sin necesidad de reescribirlas. Posteriormente, analizamos el tiempo de los algoritmos de control ejecutados en GPUs embebidas e identificamos técnicas para lograr un comportamiento de tiempo real aceptable. Finalmente, evaluamos las herramientas de diseño basadas en modelos en términos de integración con hardware GPU y generación de código GPU, y proponemos mejoras para el código GPU generado por las herramientas basadas en modelos. Luego, presentamos una herramienta de transformación de código fuente para aplicar automáticamente al código generado las mejoras propuestas.Postprint (published version
Evaluation of SYCL’s suitability for high-performance critical systems
Upcoming safety critical systems require high performance processing, which can be provided by multi-cores and embedded GPUs found in several Systems-on-chip (SoC) targeting these domains. So far, only low-level programming models and APIs, such as CUDA or OpenCL have been evaluated. In this paper, we evaluate the effectiveness of a higher level programming model, SYCL, for critical applications executed in such embedded platforms. In particular, we are interested in two aspects: performance and programmability. In order to conduct our study, we use the open source GPU4S Bench benchmarking suite for space and an open source pedestrian detection application representing the automotive sector, which we port into SYCL and analyse their behavior. We perform our evaluation on a high-performance platform featuring an NVIDIA GTX 1080Ti as well as a representative embedded platform, the NVIDIA Xavier AGX which is considered a good candidate for future safety critical systems in both domains and we compare our results with other programming models. Our results show that in several cases SYCL is able to obtain performance close to highly optimised code using CUDA or NVIDIA libraries, with significantly lower development effort and complexity, which confirms the suitability of SYCL for programming high-performance safety critical systems.This work was funded by the Ministerio de Ciencia e Innovacion - Agencia Estatal de Investigacion (PID2019-107255GB-C21 and IJC-2020-045931-I MCIN/AEI/10.13039/501100011033), the European Commission’s Horizon 2020 programme under the UP2DATE project (grant agreement 871465) and the HiPEAC Network of Excellence.Peer ReviewedPostprint (author's final draft
Evaluation of the Ada-SPARK Language Effectiveness in Graphics Processing Units for Safety Critical Systems
Modern safety critical systems require high levels of performance for the implementation of advanced functionalities, which are not possible with the simple conventional architectures currently used in them. Embedded General Purpose Graphics Processing Units (GPGPUs) are among the hardware technologies which can provide the high performance required in these domains. However, their massively parallel nature complicates the verification of their software and increases its cost because it usually involves code coverage through extensive human-driven testing. The Ada-SPARK language has traditionally been used in highly-critical environments for its formal verification capabilities and powerful type system. The use of such tools, especially those being backed up by theorem provers, has significantly lowered the amount of effort needed to validate functionality of safety-critical systems. In this work, we utilize AdaCore's CUDA backend for Ada ---currently in closed beta--- in conjunction with the SPARK language subset to assess the state of static verification for GPU kernels. We show how common programming mistakes in GPU kernels can be prevented, formulate a pattern for buffer overflow detection, and close with a few GPU case studies
Hardware-software co-design for low-cost AI processing in space processors
In the recent years there has been an increasing interest in artificial intelligence (AI) and machine learning (ML). The advantages of such applications are widespread across many areas and have drawn the attention of different sectors, such as aerospace. However, these applications require much more performance than the one provided by space processors. In space the environment is not ideal for high-performance cutting-edge processors, due to radiation. For this reason, radiation hardened or radiation tolerant processors are required, which use older technologies and redundant logic, reducing the available die resources that can be exploited. In order to accelerate demanding AI applications in space processors, this thesis presents SPARROW, a low-cost SIMD accelerator for AI operations. SPARROW has been designed following a hardware-software co-design approach by analyzing the requirements of common AI applications in order to improve the efficiency of the module. The design of such module does not use any existing vector extension and instead has in its portability one of the key advantages over other implementations. Furthermore, SPARROW reuses the integer register file of the processor avoiding complex managing of the data while reducing significantly the hardware cost of the module, which is specially interesting in the space domain due to the constraints in the processor area. SPARROW operates with 8-bit integer vector components in two different stages, performing parallel computations in the first and reduction operations in the second. This design is integrated within the baseline processor not requiring any additional pipeline stage nor a modification of the processor frequency. SPARROW also includes swizzling and masking capabilities for the input vectors as well as saturation to work with 8 bits without overflow. SPARROW has been integrated with the LEON3 and NOEL-V space-grade processors, both distributed by Cobham Gaisler. Since each of the baseline processors has a different architecture set, software support for SPARROW has been provided for both SPARC v8 and RISC-V ISAs, showing the portability of the design. Software support been developed using two well established compilers, LLVM and GCC allowing for a comparison of the cost of developing support for each of them. The modifications have included the SPARROW instructions in the assembly language of each architecture and with the use of inline assembly and macros allow a programming model similar to SIMD intrinsics. LEON3 and NOEL-V extended with SPARROW have been implemented on a FPGA to evaluate the performance increase provided by our proposal. In order to compare the performance with the scalar version of the processor, different AI related applications have been tested such as matrix multiplication and image filters, which are essential building blocks for convolutional neural networks. With the use of SPARROW a speed-ups of 6x and up to 15x have been achieved
Iodkarten aus der Spektral Detektor Computertomographie: Definition von Referenzwerten zum Einsatz in der onkologischen Bildgebung
Dissertation zur physiologischen Verteilung von iodhaltigem Kontrastmittel in "gesundem" Gewebe mit dem Ziel der Definition von Referenzwerten und der Identifikation von Einflussfaktoren. Über die Normierung der Daten auf die abdominelle Aorta kann der BMI als Einflussgröße für die Iodkonzentrationen vernachlässigt und die Streuung reduziert werden. Alters- und geschlechtsspezifische Unterschiede können somit isoliert betrachtet werden
Digitale Interventionen bei Internetbezogenen Störungen
The term Internet use disorders (IUD) covers addictive behaviors related to the use of Internet applications and contents such as online computer games, social networks, online pornography and shopping platforms. Research findings have demonstrated that IUD lead to significant psychological, societal and social impairments. Based on scientific evidence and clinical requirements, Internet Gaming Disorder (IGD) has been included in the 11th revision of the International Statistical Classification of Diseases and Related Health Problems (ICD-11). The formal recognition of IGD as a disorder due to behavioral addiction has been highlighted as an achievement regarding the nosological classification and the development of treatment approaches. In addition to IGD, other IUD appear to be of comparable clinical relevance considering their addictive potential and the aversive consequences. Scientific research on interventions for IUD is still at an early stage and further studies are required. Although, systematic reviews and meta-analyses suggest that Cognitive Behavioral Therapy (CBT) may be an effective treatment approach for IUD. Even though the availability of specific interventions for the treatment of IUD is steadily increasing, access barriers and a low utilization of outpatient services lead to shortages of sufficient health care. Digital interventions have the potential to facilitate access to healthcare services and to address treatment barriers.
Based on the current state of research, digital interventions for IUD were investigated in the cumulative dissertation. The research project was embedded in the online short-term therapy of the randomized controlled trial Stepped Care Approach for Problematic Internet use Treatment (SCAPIT; German: Stepped Care Ansatz zur Versorgung Internetbezogener Störungen, SCAVIS). The first study synthesized the systematic evidence for treatment interventions for IUD in order to critically appraise the quality of reporting according to the Preferred Reporting Items for Systematic Reviews and Meta- Analyses (PRISMA) guidelines. The evaluation of systematic reviews and meta-analyses of interventions for IUD revealed considerable heterogeneity in terms of classification of addictive Internet behaviors and treatment approaches. In addition, a lack of studies on IUD manifestations other than IGD was highlighted. The assessment of the quality of reporting indicated deviations from the PRISMA guidelines in terms of missing information on the methodological and scientific approach. The second study was conducted as part of the scientific monitoring of the pilot phase of the online short-term therapy. Baseline data collected from the pilot sample were analyzed to provide preliminary information on clinical characteristics of treatment recipients. A descriptive analysis of sociodemographic data, psychopathological and individual variables was performed and correlations with symptoms of IUD were analyzed. Results of the study revealed that comorbid psychopathologies were associated with INS. The majority of participants reported at least one comorbid mental disorder at the time of inclusion in the online short-term therapy, while affective disorders were most prevalent. In the third study effects of psychopathological symptoms and impairments of functioning on symptom severity of IUD and the duration of Internet use was examined in the sample of patients. Based on a dichotomous classification of the symptom severity of IUD, differences among participants presenting moderate compared to severe addictive Internet behavior were analyzed. Besides psychosocial impairments, depressive, social phobic and somatoform symptoms were assessed. Results of the regression analyses confirmed that psychopathological symptoms and impairments of functioning had a significant effect on symptoms of IUD. Participants with severe addictive Internet behavior exhibited significantly higher psychopathological strains compared to patients with moderate addictive Internet behavior.
In summary, findings of the three studies contribute to the scientific evidence base of digital interventions for IUD. Results of the first study highlight that the current evidence on treatment interventions for IUD is limited by considerable heterogeneity and shortcomings in reporting. Further studies are urgently required to investigate digital treatment approaches. Findings of the second study underline the relevance of studies on certain manifestations of IUD and confirm that those affected commonly exhibit additional psychopathologies in clinical setting. Further, results of the third study imply opportunities for the expansion of the health care system by considering digital interventions as a promising way to minimize access barriers, and to provide evidence-based treatments for those affected. Additional research studies with high methodological quality should be conducted to confirm the findings and to promote the evidence base for digital interventions for the treatment of IUD
Rotierendes oder fixiertes Tibiainlay? Randomisierter klinischer und radiologischer Vergleich rotierender und fixierter Tibiainlays bei computerassistiertem bikondylärem Kniegelenkoberflächenersatz
Hintergrund:
Mobilen Tibiainlays in der Knieendoprothetik werden theoretische Eigenschaften bescheinigt, die zu verbesserten funktionellen Ergebnissen und auf längere Sicht zu reduziertem Polyethylenabrieb führen sollen. Diese Studie untersuchte kurzfristige klinische Resultate von zwei Patientengruppen, die sich systematisch lediglich in der Art der verwendeten Tibiaplattform unterschieden. Eine Gruppe erhielt ein fixiertes Inlay, während der anderen ein mobiles implantiert wurde.
Methoden:
100 Kniegelenke bei 97 Patienten wurden gemäß ihres Alters sowie nach ihrem Geschlecht stratifiziert und in zwei Gruppen randomisiert: fixed-bearing (FB) mit 52 Kniegelenken und rotating platform (RP) mit 48 Kniegelenken. Alle Patienten erhielten den das hintere Kreuzband erhaltenden bikondylären Oberflächenersatz Columbus® (B. Braun Aesculap, Tuttlingen, Germany) mit entweder festem oder rotierendem Polyethylen-Inlay. Die Prozeduren wurden von zwei erfahrenen Operateuren durchgeführt, und die Gruppen folgten einem identischen Rehabilitationsschema. Klinische Nachuntersuchungen fanden in doppelt verblindeter Art vor der Operation sowie drei, sechs und zwölf Monate danach statt. Zur klinischen Bewertung wurden der wissenschaftlich etablierte Knee Society Score (KSS) und der Oxford Knee Score (OKS) verwendet. Bei der statistischen Auswertung kamen der Mann-Whitney-U-Test mit alpha = 0,05 und beta = 0,15 für den primären Endpunkt eines als klinisch relevant angenommen KSS-Unterschiedes von mehr als acht Punkten zur Anwendung sowie Varianzanalysen (ANOVA) für das explorative Testen des Einflusses der Ausgangswerte der Scores als Kovariaten und der Nachuntersuchungszeitpunkte auf die Resultate.
Ergebnisse:
Der primäre Endpunkt des KSS sowie die sekundären Endpunkte des OKS und der range of motion (ROM) waren zwischen den Gruppen nicht statisch signifikant unterschiedlich. Sowohl die demographischen Daten der Untersuchungsgruppen als auch deren radiologisches Alignment prä- und postoperativ zeigten sich homogen. Wurden bei der statistischen Analyse die präoperativen Ausgangswerte der Scores als Kovariaten mit einbezogen und deren Einfluss untersucht, so ließen sich ebenfalls keine signifikanten Differenzen zwischen den Gruppen nachweisen.
Schlussfolgerung:
In unserer Untersuchung zeigte sich kein Unterschied im KSS von mehr als acht Punkten im ersten Jahr nach der Operation zwischen der FB- und der RP-Gruppe beim bikondylären Kniegelenkoberflächenersatz. Das Studiendesign kontrollierte darüber hinaus auf diverse andere Differenzen zwischen den Gruppen sowie den Einfluss zusätzlicher Variablen neben der Art des Tibiainlays auf die Ergebnisse. Auch dabei ließen sich keine signifikanten Unterschiede nachweisen. Hinsichtlich des ersten postoperativen Jahres und für Patienten, die unsere Einschlusskriterien erfüllten, konnte dementsprechend kein Vorteil für weder die fixierte noch die mobile Tibiaplattform herausgestellt werden. Es liegt nach den aktuellen wissenschaftlichen Erkenntnissen demnach kein schlüssiges Argument vor, eines der beiden Implantatdesigns zu bevorzugen
Validierung der Trainingseffektivität und Einfluss auf subjektive Sicherheit, Versorgungsstrategien und Kommunikation in der prähospitalen Schwerverletzten-Versorgung. Eine prospektive longitudinale mixed-methods Studie
Ziel der Arbeit war es in einer prospektiven, longitudinalen mixed-methods Studie die objektiven und subjektiven Veränderungen der Teilnehmer von Teamtraining am Beispiel von Pre-Hospital Trauma Life Support (PHTLS)-Kursen zu untersuchen.
Die Lehraussagen von PHTLS und die Kernaussagen der S3-Polytraumaleitlinie DGU stimmen in 88 % weitestgehend überein. In 236 Datensätzen von Fragebogen konnte gezeigt werden, dass die Erwartungen der Teilnehmer des Kurses voll erfüllt wurden (p = 0,002). Die subjektive Sicherheit in der Traumaversorgung war in der longitudinalen Analyse signifikant besser (p < 0,001). Struktur in der Versorgung war hierfür entscheidend (p = 0,036), ebenso wie die Sicherheit bei seltenen und häufigen Fertigkeiten (p < 0,001). Aus den Videos wurde zunächst die „Performance Assessment of Emergency Teams and Communication in Trauma Care“-(PERFECT) Checklist zur Beurteilung der Aufnahmen entwickelt. Die Inter-Rater-Reliabilität (ICC = 0,99) und die interne Konsistenz (α = 0,99) waren hoch. Die Übereinstimmungsvalidität war moderat bis hoch (r = 0,65 - 0,93, p < 0,001). Alle Experten bewerteten die aufgezeichneten Szenarien zu t0 mit dem niedrigsten Summenwert (Mittelwert 31 ± 8) und zu t1 mit einer deutlich besseren Leistung der Teams (Mittelwert 69 ± 7). 640 analysierte Einsatzprotokolle zeigen, dass die Schulung zu einem signifikanten Anstieg der Dokumentationsqualität (p < 0,001) führt. Die Untergruppenanalyse von "Allergien" (+ 47,2 %), "Dauermedikation" (+ 38,1 %) und "Anamnese" (+ 27,8 %) vor und nach dem PHTLS-Kurs zeigten eine signifikante Zunahme der Information.
Die Untersuchungen zeigten eine hohe Übereinstimmung der Lehraussagen von PHTLS mit der S3-Polytraumaleitlinie und damit eine gute Anwendbarkeit der Schulungen als Trainingskonzept in Deutschland. Hinsichtlich der subjektiven Sicherheit konnte die signifikante Steigerung auch in der longitudinalen Analyse gezeigt und die Bedeutung von ausreichendem Training der Fertigkeiten und die Bedeutung Schulung von Struktur in der Patientenversorgung demonstriert werden. Für die objektive Beurteilung von Trainingseffekten wurde die PERFECT-Checkliste mit hoher Reliabilität und Validität entwickelt, welche in ersten Analysen eine objektive Verbesserung der Versorgung traumatologischer Simulationspatienten darlegt. Der verbesserte Surrogat-Endpunkt Dokumentationsqualität in der Einsatzdokumentation bestätigt als Indikator ein effektives Training und eine Sensibilisierung der Teilnehmer