267 research outputs found

    Optimized Broadcast for Deep Learning Workloads on Dense-GPU InfiniBand Clusters: MPI or NCCL?

    Full text link
    Dense Multi-GPU systems have recently gained a lot of attention in the HPC arena. Traditionally, MPI runtimes have been primarily designed for clusters with a large number of nodes. However, with the advent of MPI+CUDA applications and CUDA-Aware MPI runtimes like MVAPICH2 and OpenMPI, it has become important to address efficient communication schemes for such dense Multi-GPU nodes. This coupled with new application workloads brought forward by Deep Learning frameworks like Caffe and Microsoft CNTK pose additional design constraints due to very large message communication of GPU buffers during the training phase. In this context, special-purpose libraries like NVIDIA NCCL have been proposed for GPU-based collective communication on dense GPU systems. In this paper, we propose a pipelined chain (ring) design for the MPI_Bcast collective operation along with an enhanced collective tuning framework in MVAPICH2-GDR that enables efficient intra-/inter-node multi-GPU communication. We present an in-depth performance landscape for the proposed MPI_Bcast schemes along with a comparative analysis of NVIDIA NCCL Broadcast and NCCL-based MPI_Bcast. The proposed designs for MVAPICH2-GDR enable up to 14X and 16.6X improvement, compared to NCCL-based solutions, for intra- and inter-node broadcast latency, respectively. In addition, the proposed designs provide up to 7% improvement over NCCL-based solutions for data parallel training of the VGG network on 128 GPUs using Microsoft CNTK.Comment: 8 pages, 3 figure

    Chlorination and oxidation of the extracellular matrix protein laminin and basement membrane extracts by hypochlorous acid and myeloperoxidase

    Get PDF
    Basement membranes are specialized extracellular matrices that underlie arterial wall endothelial cells, with laminin being a key structural and biologically-active component. Hypochlorous acid (HOCl), a potent oxidizing and chlorinating agent, is formed in vivo at sites of inflammation via the enzymatic action of myeloperoxidase (MPO), released by activated leukocytes. Considerable data supports a role for MPO-derived oxidants in cardiovascular disease and particularly atherosclerosis. These effects may be mediated via extracellular matrix damage to which MPO binds. Herein we detect and quantify sites of oxidation and chlorination on isolated laminin-111, and laminin in basement membrane extracts (BME), by use of mass spectrometry. Increased modification was detected with increasing oxidant exposure. Mass mapping indicated selectivity in the sites and extent of damage; Met residues were most heavily modified. Fewer modifications were detected with BME, possibly due to the shielding effects. HOCl oxidised 30 (of 56 total) Met and 7 (of 24) Trp residues, and chlorinated 33 (of 99) Tyr residues; 3 Tyr were dichlorinated. An additional 8 Met and 10 Trp oxidations, 14 chlorinations, and 18 dichlorinations were detected with the MPO/H2O2/Cl- system when compared to reagent HOCl. Interestingly, chlorination was detected at Tyr2415 in the integrin-binding region; this may decrease cellular adhesion. Co-localization of MPO-damaged epitopes and laminin was detected in human atherosclerotic lesions. These data indicate that laminin is extensively modified by MPO-derived oxidants, with structural and functional changes. These modifications, and compromised cell-matrix interactions, may promote endothelial cell dysfunction, weaken the structure of atherosclerotic lesions, and enhance lesion rupture. Keywords: Extracellular matrix, Hypochlorous acid, Laminin, Protein oxidation, 3-chlorotyrosine, Myeloperoxidas

    Fibers are not (P)Threads: The Case for Loose Coupling of Asynchronous Programming Models and MPI Through Continuations

    Full text link
    Asynchronous programming models (APM) are gaining more and more traction, allowing applications to expose the available concurrency to a runtime system tasked with coordinating the execution. While MPI has long provided support for multi-threaded communication and non-blocking operations, it falls short of adequately supporting APMs as correctly and efficiently handling MPI communication in different models is still a challenge. Meanwhile, new low-level implementations of light-weight, cooperatively scheduled execution contexts (fibers, aka user-level threads (ULT)) are meant to serve as a basis for higher-level APMs and their integration in MPI implementations has been proposed as a replacement for traditional POSIX thread support to alleviate these challenges. In this paper, we first establish a taxonomy in an attempt to clearly distinguish different concepts in the parallel software stack. We argue that the proposed tight integration of fiber implementations with MPI is neither warranted nor beneficial and instead is detrimental to the goal of MPI being a portable communication abstraction. We propose MPI Continuations as an extension to the MPI standard to provide callback-based notifications on completed operations, leading to a clear separation of concerns by providing a loose coupling mechanism between MPI and APMs. We show that this interface is flexible and interacts well with different APMs, namely OpenMP detached tasks, OmpSs-2, and Argobots.Comment: 12 pages, 7 figures Published in proceedings of EuroMPI/USA '20, September 21-24, 2020, Austin, TX, US

    Increased neutrophil-lymphocyte ratio is a poor prognostic factor in patients with primary operable and inoperable pancreatic cancer

    Get PDF
    Background: The neutrophil-lymphocyte ratio (NLR) has been proposed as an indicator of systemic inflammatory response. Previous findings from small-scale studies revealed conflicting results about its independent prognostic significance with regard to different clinical end points in pancreatic cancer (PC) patients. Therefore, the aim of our study was the external validation of the prognostic significance of NLR in a large cohort of PC patients. Methods: Data from 371 consecutive PC patients, treated between 2004 and 2010 at a single centre, were evaluated retrospectively. The whole cohort was stratified into two groups according to the treatment modality. Group 1 comprised 261 patients with inoperable PC at diagnosis and group 2 comprised 110 patients with surgically resected PC. Cancer-specific survival (CSS) was assessed using the Kaplan–Meier method. To evaluate the independent prognostic significance of the NLR, the modified Glasgow prognostic score (mGPS) and the platelet-lymphocyte ratio univariate and multivariate Cox regression models were applied. Results: Multivariate analysis identified increased NLR as an independent prognostic factor for inoperable PC patients (hazard ratio (HR)=2.53, confidence interval (CI)=1.64–3.91, P<0.001) and surgically resected PC patients (HR=1.61, CI=1.02–2.53, P=0.039). In inoperable PC patients, the mGPS was associated with poor CSS only in univariate analysis (HR=1.44, CI=1.04–1.98). Conclusion: Risk prediction for cancer-related end points using NLR does add independent prognostic information to other well-established prognostic factors in patients with PC, regardless of the undergoing therapeutic modality. Thus, the NLR should be considered for future individual risk assessment in patients with PC

    A phase 1 study of mTORC1/2 inhibitor BI 860585 as a single agent or with exemestane or paclitaxel in patients with advanced solid tumors

    Get PDF
    This phase 1 trial (NCT01938846) determined the maximum tolerated dose (MTD) of the mTOR serine/threonine kinase inhibitor, BI 860585, as monotherapy and with exemestane or paclitaxel in patients with advanced solid tumors. This 3+3 dose-escalation study assessed BI 860585 monotherapy (5–300 mg/day; Arm A), BI 860585 (40–220 mg/day; Arm B) with 25 mg/day exemestane, and BI 860585 (80–220 mg/day; Arm C) with 60–80 mg/m2 /week paclitaxel, in 28-day cycles. Primary endpoints were the number of patients with dose-limiting toxicities (DLTs) in cycle 1 and the MTD. Forty-one, 25, and 24 patients were treated (Arms A, B, and C). DLTs were observed in four (rash (n = 2), elevated alanine aminotransferase/aspartate aminotransferase, diarrhea), four (rash (n = 3), stomatitis, and increased gamma-glutamyl transferase), and two (diarrhea, increased blood creatine phosphokinase) patients in cycle 1. The BI 860585 MTD was 220 mg/day (Arm A) and 160 mg/day (Arms B and C). Nine patients achieved an objective response (Arm B: Four partial responses (PRs); Arm C: Four PRs; one complete response). The disease control rate was 20%, 28%, and 58% (Arms A, B, and C). The most frequent treatment-related adverse events (AEs) were hyperglycemia (54%) and diarrhea (39%) (Arm A); diarrhea (40%) and stomatitis (40%) (Arm B); fatigue (58%) and diarrhea (58%) (Arm C). The MTD was determined in all arms. Antitumor activity was observed with BI 860585 monotherapy and in combination with exemestane or paclitaxel
    • …
    corecore