73 research outputs found

    MPI-Semantic Memory Checking Tools fĂĽr Parallel Applikationen

    Get PDF
    The Message Passing Interface (MPI) is a language-independent application interface that provides a standard for communication among the processes of programs running on parallel computers, clusters or heterogeneous networks. However, writing correct and portable MPI applications is difficult: inconsistent or incorrect use of parameters may occur; the subtle semantic differences of various MPI calls may be used inconsistently or incorrectly even by expert programmers. The MPI implementations typically implement only minimal sanity checks to achieve the highest possible performance. Although many interactive debuggers have been developed or extended to handle the concurrent processes of MPI applications, there are still numerous classes of bugs which are hard or even impossible to find with a conventional debugger. There are many cases of memory conflicts or errors, for example, overlapping access or segmentation fault, does not provide enough and useful information for programmer to solve the problem. That is even worse for MPI applications, due to the flexibility and high-frequency of using memory parallel in MPI standard, which makes it more difficult to observe the memory problems in the traditional way. Currently, there is no available debugger helpful especially for MPI semantic memory errors, i.e. detecting memory problem or potential errors according to the standard. For this specific c purpose, in this dissertation memory checking tools have been implemented. And the corresponding frameworks in Open MPI for parallel applications based on MPI semantics have been developed, using different existing memory debugging tool interfaces. Developers are able to detect hard to find bugs, such as memory violations, buffer overrun, inconsistent parameters and so on. This memory checking tool provides detailed comprehensible error messages that will be most helpful for MPI developers. Furthermore, the memory checking frameworks may also help improve the performance of MPI based parallel applications by detecting whether the communicated data is used or not. The new memory checking tools may also be used in other projects or debuggers to perform different memory checks. The memory checking tools do not only apply to MPI parallel applications, but may also be used in other kind of applications that require memory checking. The technology allows programmers to handle and implement their own memory checking functionalities in a flexible way, which means they may define what information they want to know about the memory and how the memory in the application should be checked and reported. The world of high performance computing is Linux-dominated and open source based. However Microsoft is becoming also a more important role in this domain, establishing its foothold with Windows HPC Server 2008 R2. In this work, the advantages and disadvantages of these two HPC operating systems will be discussed. To amend programmability and portability, we introduce a version of Open MPI for Windows with several newly developed key components. Correspondingly, an implementation of memory checking tool on Windows will also be introduced. This dissertation has five main chapters: after an introduction of state of the art, the development of the Open MPI for Windows platform is described, including the work of InfiniBand network support. Chapter four presents the methods explored and opportunities for error analysis of memory accesses. Moreover, it also describes the two implemented tools for this work based on the Intel PIN and the Valgrind tool, as well as their integration into the Open MPI library. In chapter five, the methods are based on several benchmarks (NetPIPE, IMB and NPB) and evaluated using real applications (heat conduction application, and the MD package Gromacs). It is shown that the instrumentation generated by the tool has no significant overhead (NetPIPE with 1.2% to 2.5% for the latency) and accordingly no impact on application benchmarks such as NPB or Gromacs. If the application is executed to analyze with the memory access tools, it extends naturally the execution time by up to 30x, and using the presented MemPin is only half the rate of dropdown. The methods prove successful in the sense that unnecessary data communicated can be found in the heat conduction application and in Gromacs, resulting in the first case, the communication time of the application is reduced by 12%.Das Message Passing Interface (MPI) ist eine standardisierte, programmiersprachenunabhängige Anwendungsschnittstelle zur Ausführung von paralleler Software auf Höchstleistungsrechnern, wie z. B. Cluster- und Supercomputer. MPI-Implementierungen haben aus Gründen der Geschwindigkeit nur minimale Prüffunktionalität, wie bspw. Prüfung auf inkonsistente und inkorrekte Verwendung von Parametern. Zur korrekten und portablen Programmierung von MPI-Anwendungen bedarf es jedoch einer weitaus intensiveren Prüffunktionalität zur Gewährleistung der Korrektheit bei der Verwendung von MPI-Funktionen, sowie zur Entwicklung von komplexen wissenschaftlichen Softwareprojekten, welche Höchstleistung erzielen sollen. Trotz mehreren vorhandenen Lösungen zur interaktiven Fehlersuche (Debugger) von MPI-parallelen Prozessen, gibt es zahlreiche Klassen von Fehlern, deren Identifizierung mit herkömmlichen Debuggern schwierig oder gar unmöglich ist. Hierzu gehören Speicherzugriffskonflikte und Fehler wie überlappender Zugri , deren Resultat im guten Fall als so-genannter Segmentation Fault sichtbar wird, der Programmierer allerdings kaum nützliche Informationen zur Ursache oder besser zur Behebung bekommt. Das Problem wird für MPI-parallele Anwendungen aufgrund der Flexibilität von MPI, sowie der parallelen Verarbeitung noch erheblich verschlimmert. Die traditionellen Analyseverfahren sind kaum anwendbar zum Auffinden dieser Fehler. Gegenwärtig gibt es keine Debugger, die dem Programmierer bei der Erkennung und Behebung der Speicherprobleme in MPI-Codes behilflich sein können. Zu diesem Zweck wurde in dieser Dissertation eine Memory Checking Methode entwickelt und hierfür zwei Tools implementiert, sowie deren Einsatzzwecke erforscht. Dies ist durch ein spezialisiertes Framework realisiert, das mehrere nützliche Debugging Technologien und Tools integriert und dem Benutzer zur Verfügung stellt. Das Framework wurde umgesetzt in Open MPI, eine der am meisten verbreitete Open-Source MPI-Implementierung. Anhand dem vorgeschlagenen Frameworks und Tools können Entwickler zahlreiche MPI Fehlertypen identifizieren und beheben, wie z. B. Speicher-Verletzungen, Pufferüberlauf, inkonsistente MPI-Parameter, welche sonst kaum nachweisbar wären. Darüber hinaus wird gezeigt, wie mit den Tools ein Beitrag zur Verbesserung der Kommunikationsleistung erzielt werden kann, indem kommunizierter, aber nicht für die Berechnung verwendeter Speicher identifiziert wird. Die hier vorgestellten Tools zur Speicherzugriffskontrolle können sowohl für MPI-parallele als auch für andere Arten von Anwendungen verwendet, sowie in andere Debugger integriert werden. Die Technologie gibt dem Programmierer die Möglichkeit, die Art der Überprüfung selbst festzulegen, d. h. sie können definieren, welche Informationen sie über den Speicher wissen wollen und wie der Speicher in der Anwendung überprüft und das Ergebnis geliefert werden soll. Die Welt des Höchstleistungsrechnens ist dominiert von Linux-basierten Systemen. Dennoch spielt auch Microsoft eine wichtige Rolle seit der Einführung von Windows HPC Server 2008 R2. Um Programmierbarkeit und Portabilität für bestehende Nutzer von Windows Systemen zu novellieren, führen wir eine Version von Open MPI für Windows ein. Entsprechend stellt diese Arbeit ein für Windows entwickeltes Tool zur Speicherzugriffskontrolle vor. Diese Dissertation besteht aus fünf wesentlichen Kapiteln: nach einer Einführung und dem Stand der Technik wird die Entwicklung der Komponenten von Open MPI für die Windows-Plattform beschrieben, inklusive der Arbeiten für das InfiniBand-Netzwerk. Kapitel vier stellt die hier erforschten Methoden und Möglichkeiten zur Fehleranalyse von Speicherzugriffen vor. Darüberhinaus werden die beiden für diese Arbeit implementierten Tools basierend auf dem Intel Pin-, sowie dem Valgrind-Tool beschrieben, sowie deren Integration in die Open MPI-Bibliothek. In Kapitel fünf werden die Methoden anhand mehrerer Benchmarks (NetPIPE, IMB, NPB) evaluiert und mittels echter Anwendungen (Wärmeleitungsapplikation, sowie das MD Paket Gromacs) auf ihren Nutzen analysiert. Hierbei zeigt sich, daß die Instrumentierung durch das Tool keinen nennenswerten Overhead generiert (NetPIPE mit 1,2% bis 2,5% bei der Latenz) und entsprechend keine große Auswirkung auf Applikationsbenchmarks wie NPB oder Gromacs hat. Wird die Anwendung zur Analyse mit den Speicherzugriffstools ausgeführt, verlängert sich naturgemäß die Ausführungszeit um bis zu 30x, mittels dem hier vorgestellten MemPin ist der Einbruch nur halb so stark. Die Methoden erweisen sich erfolgreich, in dem Sinn, daß in der Wärmeleitungsapplikation sowie in Gromacs unnötig kommunizierte Daten gefunden werden, woraus sich im ersten Fall die Kommunikationszeit der Anwendung um 12% reduzieren lässt

    Fathers’ presence and adolescents’ interpersonal relationship quality: Moderated mediation model

    Get PDF
    IntroductionMost previous studies focused on the effects of fathers’ presence on adolescent development, but rarely examined the mechanisms underlying the presence of fathers on adolescent development. Moreover, previous studies ignored the impact of fathers’ way of being present on adolescent interpersonal relationships. Based on social identity theory, the present study introduced adolescents’ social responsibility as a mediating variable to explore the influence of father’s presence style on adolescents’ interpersonal. This study examined the mechanism of fathers’ way of being present on father’s presence, adolescents’ social responsibility, and their quality of interpersonal relationships; if fathers adopt a democratic approach to be present, the study examines whether teenagers are more likely to enhance their sense of social responsibility and achieve harmonious interpersonal relationships.MethodsParticipants were 1,942 senior high school and college students who responded to the Fatherhood Questionnaire, Social Responsibility Questionnaire, and Interpersonal Relationship Quality Diagnosis Scale. This study used PROCESS macro of SPSS 24.0 and Amos 26.0 to examine the hypotheses.ResultsEmpirical results demonstrated that (a) fathers’ presence is directly and positively related to adolescents’ social responsibility, (b) fathers’ presence is indirectly and positively related to the quality of adolescents’ interpersonal relationships through social responsibility, and (c) parenting styles played a moderating role in the first half of the fathers’ presence on social responsibility and the quality of interpersonal relationships. Results demonstrated that more harmonious interpersonal relationships were present among teenagers when fathers adopted a democratic upbringing, and this interaction effect on interpersonal relationships was mediated by teenagers’ sense of social responsibility.DiscussionThe findings of this study enrich the literature by exploring the significance of emphasizing fathers’ democratic presence on teenagers’ sense of social responsibility and interpersonal relationships. The practical implications of this study are that society should encourage more fathers to be present and guide them to adopt a democratic parenting style that will benefit adolescents’ development and family well-being

    Neural-Singular-Hessian: Implicit Neural Representation of Unoriented Point Clouds by Enforcing Singular Hessian

    Full text link
    Neural implicit representation is a promising approach for reconstructing surfaces from point clouds. Existing methods combine various regularization terms, such as the Eikonal and Laplacian energy terms, to enforce the learned neural function to possess the properties of a Signed Distance Function (SDF). However, inferring the actual topology and geometry of the underlying surface from poor-quality unoriented point clouds remains challenging. In accordance with Differential Geometry, the Hessian of the SDF is singular for points within the differential thin-shell space surrounding the surface. Our approach enforces the Hessian of the neural implicit function to have a zero determinant for points near the surface. This technique aligns the gradients for a near-surface point and its on-surface projection point, producing a rough but faithful shape within just a few iterations. By annealing the weight of the singular-Hessian term, our approach ultimately produces a high-fidelity reconstruction result. Extensive experimental results demonstrate that our approach effectively suppresses ghost geometry and recovers details from unoriented point clouds with better expressiveness than existing fitting-based methods

    RDMA-Based Deterministic Communication Architecture for Autonomous Driving

    Get PDF
    Autonomous driving is a big challenge for nextgeneration vehicles and requires multiple computationallyintensive deep neural networks (DNNs) to be implemented on distributed automotive platforms. Distributed software—enabling autonomous functionalities—has strict timing requirements, e.g., low and deterministic end-to-end latency. Such timings rely on the communication technologies used in the automotive platform, as much on the computation performance of CPUs, GPUs, TPUs, and FPGAs. Hence, we advocate the use of Remote Direct Memory Access (RDMA) technology—typically used in data centers—in automotive platforms. As shown by our experiments with real hardware, Soft-RoCE (software implementation of RDMA) offers low latency communication because of minimal CPU involvement and reduced memory copies. Simultaneously, we show that the native implementation of RDMA does not support determinism, i.e., there is a high variation in communication delays in the presence of interfering data packets. To mitigate this issue, we propose a multi-layer communication stack comprising a deterministic scheduler on top of the SoftRoCE layer. Further, we have developed a C++ library that offers easy-to-use communication interfaces for distributed applications while implementing the proposed architecture. Experiments show that our library (i) reduces the end-to-end latency of distributed object detection by nearly 9% while having an implementation overhead of less than 1.5% and (ii) minimizes the effects of other data traffic on the delay in high-priority communication
    • …
    corecore