161 research outputs found
Ramping fermions in optical lattices across a Feshbach resonance
We study the properties of ultracold Fermi gases in a three-dimensional
optical lattice when crossing a Feshbach resonance. By using a zero-temperature
formalism, we show that three-body processes are enhanced in a lattice system
in comparison to the continuum case. This poses one possible explanation for
the short molecule lifetimes found when decreasing the magnetic field across a
Feshbach resonance. Effects of finite temperatures on the molecule formation
rates are also discussed by computing the fraction of double-occupied sites.
Our results show that current experiments are performed at temperatures
considerably higher than expected: lower temperatures are required for
fermionic systems to be used to simulate quantum Hamiltonians. In addition, by
relating the double occupancy of the lattice to the temperature, we provide a
means for thermometry in fermionic lattice systems, previously not accessible
experimentally. The effects of ramping a filled lowest band across a Feshbach
resonance when increasing the magnetic field are also discussed: fermions are
lifted into higher bands due to entanglement of Bloch states, in good agreement
with recent experiments.Comment: 9 pages, 7 figure
TABARNAC: Tools for Analyzing Behavior of Applications Running on NUMA Architecture
In modern parallel architectures, memory accesses represent a commonbottleneck. Thus, optimizing the way applications access the memory is an important way to improve performance and energy consumption. Memory accesses are even more important with NUMAmachines, as the access time to data depends on its location inthe memory. Many efforts were made todevelop adaptive tools to improve memory accesses at the runtime by optimizingthe mapping of data and threads to NUMA nodes. However, theses tools are notable to change the memory access pattern of the original application,therefore a code written without considering memory performance mightnot benefit from them. Moreover, automatic mapping tools take time to convergetowards the best mapping, losing optimization opportunities. Adeeper understanding of the memory behavior can help optimizing it,removing the need for runtime analysis.In this paper, we present TABARNAC, a tool for analyzing the memory behavior of parallel applications with a focus on NUMA architectures.TABARNAC provides a new visualization of the memory access behavior, focusing on thedistribution of accesses by thread and by structure. Such visualization allows thedeveloper to easily understand why performance issues occur and how to fix them.Using TABARNAC, we explain why some applications do not benefit from dataand thread mapping. Moreover, we propose several code modifications toimprove the memory access behavior of several parallel applications.Les accès mémoire représentent une source de problème de performance fréquenteavec les architectures parallèle moderne. Ainsi optimiser la manière dont lesapplications accèdent à la mémoire est un moyen efficace d'améliorer laperformance et la consommation d'énergie. Les accès mémoire prennent d'autantplus d'important avec les machines NUMA où le temps d'accès à une donnéedépend de sa localisation dans la mémoire. De nombreuse études ont proposéesdes outils adaptatif pour améliorer les accès mémoire en temps réel, cesoutils opèrent en changeant le placement des données et des thread sur lesnœuds NUMA. Cependant ces outils n'ont pas la possibilité de changer la façondont l'application accède à la mémoire. De ce fait un code développé sansprendre en compte les performances des accès mémoire pourrait ne pas en tirerparti. De plus les outils de placement automatique ont besoin de temps pourconverger vers le meilleur placement, perdant des opportunités d'optimisation.Mieux comprendre le comportement mémoire peut aider à l'optimiser et supprimerle besoin d'optimisation en temps réel.Cette étude présente TABARNAC un outil pour analyser le comportement mémoired'application parallèles s'exécutant sur architecture NUMA. TABARNAC offreune nouvelle forme de visualisation du comportement mémoire mettant l'accentsur la distribution des accès entre les thread et par structure de données. Cetype de visualisations permettent de comprendre facilement pourquoi lesproblèmes de performances apparaissent et comment les résoudre. En utilisantTABARNAC, nous expliquons pourquoi certaines applications ne tirent pas partid'outils placement de donnée et de thread. De plus nous proposons plusieursmodification de code permettant d'améliorer le comportement mémoire de plusieursapplications parallèles
TABARNAC: Visualizing and Resolving Memory Access Issues on NUMA Architectures
International audienceIn modern parallel architectures, memory accesses represent a common bottleneck. Thus, optimizing the way applications access the memory is an important way to improve performance and energy consumption. Memory accesses are even more important with NUMA machines, as the access time to data depends on its location in the memory. Many efforts were made to develop adaptive tools to improve memory accesses at the runtime by optimizing the mapping of data and threads to NUMA nodes. However, theses tools are not able to change the memory access pattern of the original application, therefore a code written without considering memory performance might not benefit from them. Moreover, automatic mapping tools take time to converge towards the best mapping, losing optimization opportunities. A deeper understanding of the memory behavior can help optimizing it, removing the need for runtime analysis. In this paper, we present TABARNAC , a tool for analyzing the memory behavior of parallel applications with a focus on NUMA architectures. TABARNAC provides a new visualization of the memory access behavior, focusing on the distribution of accesses by thread and by structure. Such visualization allows the developer to easily understand why performance issues occur and how to fix them. Using TABARNAC , we explain why some applications do not benefit from data and thread mapping. Moreover, we propose several code modifications to improve the memory access behavior of several parallel applications. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credi
Performance Evaluation of Python Parallel Programming Models: Charm4Py and mpi4py
Python is rapidly becoming the lingua franca of machine learning and
scientific computing. With the broad use of frameworks such as Numpy, SciPy,
and TensorFlow, scientific computing and machine learning are seeing a
productivity boost on systems without a requisite loss in performance. While
high-performance libraries often provide adequate performance within a node,
distributed computing is required to scale Python across nodes and make it
genuinely competitive in large-scale high-performance computing. Many
frameworks, such as Charm4Py, DaCe, Dask, Legate Numpy, mpi4py, and Ray, scale
Python across nodes. However, little is known about these frameworks' relative
strengths and weaknesses, leaving practitioners and scientists without enough
information about which frameworks are suitable for their requirements. In this
paper, we seek to narrow this knowledge gap by studying the relative
performance of two such frameworks: Charm4Py and mpi4py.
We perform a comparative performance analysis of Charm4Py and mpi4py using
CPU and GPU-based microbenchmarks other representative mini-apps for scientific
computing.Comment: 7 pages, 7 figures. To appear at "Sixth International IEEE Workshop
on Extreme Scale Programming Models and Middleware
Acceleration of dynamic ice loss in Antarctica from satellite gravimetry
The dynamic stability of the Antarctic Ice Sheet is one of the largest uncertainties in projections of future global sea-level rise. Essential for improving projections of the ice sheet evolution is the understanding of the ongoing trends and accelerations of mass loss in the context of ice dynamics. Here, we examine accelerations of mass change of the Antarctic Ice Sheet from 2002 to 2020 using data from the GRACE (Gravity Recovery and Climate Experiment; 2002–2017) and its follow-on GRACE-FO (2018-present) satellite missions. By subtracting estimates of net snow accumulation provided by re-analysis data and regional climate models from GRACE/GRACE-FO mass changes, we isolate variations in ice-dynamic discharge and compare them to direct measurements based on the remote sensing of the surface-ice velocity (2002–2017). We show that variations in the GRACE/GRACE-FO time series are modulated by variations in regional snow accumulation caused by large-scale atmospheric circulation. We show for the first time that, after removal of these surface effects, accelerations of ice-dynamic discharge from GRACE/GRACE-FO agree well with those independently derived from surface-ice velocities. For 2002–2020, we recover a discharge acceleration of -5.3 ± 2.2 Gt yr−2 for the entire ice sheet; these increasing losses originate mainly in the Amundsen and Bellingshausen Sea Embayment regions (68%), with additional significant contributions from Dronning Maud Land (18%) and the Filchner-Ronne Ice Shelf region (13%). Under the assumption that the recovered rates and accelerations of mass loss persisted independent of any external forcing, Antarctica would contribute 7.6 ± 2.9 cm to global mean sea-level rise by the year 2100, more than two times the amount of 2.9 ± 0.6 cm obtained by linear extrapolation of current GRACE/GRACE-FO mass loss trends
TABARNAC: Tools for Analyzing Behavior of Applications Running on NUMA Architecture
In modern parallel architectures, memory accesses represent a commonbottleneck. Thus, optimizing the way applications access the memory is an important way to improve performance and energy consumption. Memory accesses are even more important with NUMAmachines, as the access time to data depends on its location inthe memory. Many efforts were made todevelop adaptive tools to improve memory accesses at the runtime by optimizingthe mapping of data and threads to NUMA nodes. However, theses tools are notable to change the memory access pattern of the original application,therefore a code written without considering memory performance mightnot benefit from them. Moreover, automatic mapping tools take time to convergetowards the best mapping, losing optimization opportunities. Adeeper understanding of the memory behavior can help optimizing it,removing the need for runtime analysis.In this paper, we present TABARNAC, a tool for analyzing the memory behavior of parallel applications with a focus on NUMA architectures.TABARNAC provides a new visualization of the memory access behavior, focusing on thedistribution of accesses by thread and by structure. Such visualization allows thedeveloper to easily understand why performance issues occur and how to fix them.Using TABARNAC, we explain why some applications do not benefit from dataand thread mapping. Moreover, we propose several code modifications toimprove the memory access behavior of several parallel applications.Les accès mémoire représentent une source de problème de performance fréquenteavec les architectures parallèle moderne. Ainsi optimiser la manière dont lesapplications accèdent à la mémoire est un moyen efficace d'améliorer laperformance et la consommation d'énergie. Les accès mémoire prennent d'autantplus d'important avec les machines NUMA où le temps d'accès à une donnéedépend de sa localisation dans la mémoire. De nombreuse études ont proposéesdes outils adaptatif pour améliorer les accès mémoire en temps réel, cesoutils opèrent en changeant le placement des données et des thread sur lesnœuds NUMA. Cependant ces outils n'ont pas la possibilité de changer la façondont l'application accède à la mémoire. De ce fait un code développé sansprendre en compte les performances des accès mémoire pourrait ne pas en tirerparti. De plus les outils de placement automatique ont besoin de temps pourconverger vers le meilleur placement, perdant des opportunités d'optimisation.Mieux comprendre le comportement mémoire peut aider à l'optimiser et supprimerle besoin d'optimisation en temps réel.Cette étude présente TABARNAC un outil pour analyser le comportement mémoired'application parallèles s'exécutant sur architecture NUMA. TABARNAC offreune nouvelle forme de visualisation du comportement mémoire mettant l'accentsur la distribution des accès entre les thread et par structure de données. Cetype de visualisations permettent de comprendre facilement pourquoi lesproblèmes de performances apparaissent et comment les résoudre. En utilisantTABARNAC, nous expliquons pourquoi certaines applications ne tirent pas partid'outils placement de donnée et de thread. De plus nous proposons plusieursmodification de code permettant d'améliorer le comportement mémoire de plusieursapplications parallèles
Gamma-Irradiated Influenza Virus Uniquely Induces IFN-I Mediated Lymphocyte Activation Independent of the TLR7/MyD88 Pathway
Background: We have shown previously in mice, that infection with live viruses, including influenza/A and Semliki Forest virus (SFV), induces systemic partial activation of lymphocytes, characterized by cell surface expression of CD69 and CD86, but not CD25. This partial lymphocytes activation is mediated by type-I interferons (IFN-I). Importantly, we have shown that c-irradiated SFV does not induce IFN-I and the associated lymphocyte activation. Principal Findings: Here we report that, in contrast to SFV, c-irradiated influenza A virus elicits partial lymphocyte activation in vivo. Furthermore, we show that when using influenza viruses inactivated by a variety of methods (UV, ionising radiation and formalin treatment), as well as commercially available influenza vaccines, only c-irradiated influenza virus is able to trigger IFN-I-dependent partial lymphocyte activation in the absence of the TLR7/MyD88 signalling pathways. Conclusions: Our data suggest an important mechanism for the recognition of c-irradiated influenza vaccine by cytosolic receptors, which correspond with the ability of c-irradiated influenza virus to induce cross-reactive and cross-protective cytotoxic T cell responses.Yoichi Furuya, Jennifer Chan, En-Chi Wan, Aulikki Koskinen, Kerrilyn R. Diener, John D. Hayball, Matthias Regner, Arno Müllbacher, Mohammed Alsharif
Teres Ligament Patch Reduces Relevant Morbidity After Distal Pancreatectomy (the DISCOVER Randomized Controlled Trial)
Objective:The aim of this study was to analyze the impact of teres ligament covering on pancreatic fistula rate after distal pancreatectomy (DP).Background:Postoperative pancreatic fistula (POPF) represents the most significant complication after DP. Retrospective studies suggested a benefit of covering the resection margin by a teres ligament patch.Methods:This prospective randomized controlled study (DISCOVER trial) included 152 patients undergoing DP, between October 2010 and July 2014. Patients were randomized to undergo closure of the pancreatic cut margin without (control, n = 76) or with teres ligament coverage (teres, n = 76). The primary endpoint was the rate of POPF, and the secondary endpoints included postoperative morbidity and mortality, length of hospital stay, and readmission rate.Results:Both groups were comparable regarding epidemiology (age, sex, body mass index), operative parameters (operation time [OP] time, blood loss, method of pancreas transection, additional operative procedures), and histopathological findings. Overall inhospital mortality was 0.6% (1/152 patients). In the group of patients with teres ligament patch, the rate of reoperations (1.3% vs 13.0%;P = 0.009), and also the rate of readmission (13.1 vs 31.5%;P = 0.011) were significantly lower. Clinically relevant POPF rate (grade B/C) was 32.9% (control) versus 22.4% (teres, P = 0.20). Multivariable analysis showed teres ligament coverage to be a protective factor for clinically relevant POPF (P = 0.0146).Conclusions:Coverage of the pancreatic remnant after DP is associated with less reinterventions, reoperations, and need for readmission. Although the overall fistula rate is not reduced by the coverage procedure, it should be considered as a valid measure for complication prevention due to its clinical benefit
Leveraging Cloud Heterogeneity for Cost-Efficient Execution of Parallel Applications
Public cloud providers offer a wide range of instance types, with different processing and interconnection speeds, as well as varying prices. Furthermore, the tasks of many parallel applications show different computational demands due to load imbalance. These differences can be exploited for improving the cost efficiency of parallel applications in many cloud environments by matching application requirements to instance types. In this paper, we introduce the concept of heterogeneous cloud systems consisting of different instance types to leverage the different computational demands of large parallel applications for improved cost efficiency. We present a mechanism that automatically suggests a suitable combination of instances based on a characterization of the application and the instance types. With such a heterogeneous cloud, we are able to improve cost efficiency significantly for a variety of MPI-based applications, while maintaining a similar performance.Peer ReviewedPostprint (author's final draft
Performance Evaluation of Multiple Cloud Data Centers Allocations for HPC
This paper evaluates the behavior of the Microsoft Azure G5 cloud instance type over multiple Data Centers. The purpose is to identify if there are major differences between them and to help the users choose the best option for their needs. Our results show that there are differences in the network level for the same instance type in different locations and inside the same location at different times. The network performance causes interference in the applications level, as we could verify in our results.This research received funding from the EU H2020 Programme and from MCTI/RNP-Brazil under the HPC4E project, grant agreement
no. 689772. Experiments presented in this paper were carried out using the Grid'5000 testbed, supported by a scientific interest group hosted by Inria and including CNRS, RENATER and several Universities as well as other organizations
(see https://www.grid5000.fr). Additional funding was provided by CAPES and Microsoft.Peer ReviewedPostprint (author's final draft
- …