1,279 research outputs found
Adjoint computations by algorithmic differentiation of a parallel solver for time-dependent PDEs
A computational fluid dynamics code is differentiated using algorithmic
differentiation (AD) in both tangent and adjoint modes. The two novelties of
the present approach are 1) the adjoint code is obtained by letting the AD tool
Tapenade invert the complete layer of message passing interface (MPI)
communications, and 2) the adjoint code integrates time-dependent, non-linear
and dissipative (hence physically irreversible) PDEs with an explicit time
integration loop running for ca. time steps. The approach relies on
using the Adjoinable MPI library to reverse the non-blocking communication
patterns in the original code, and by controlling the memory overhead induced
by the time-stepping loop with binomial checkpointing. A description of the
necessary code modifications is provided along with the validation of the
computed derivatives and a performance comparison of the tangent and adjoint
codes.Comment: Submitted to Journal of Computational Scienc
Ending Extreme Poverty and Sharing Prosperity: Progress and Policies
To guide its work toward a "world free of poverty," the World Bank Group in 2013 established two clear goals: end extreme poverty by 2030 and promote shared prosperity. Along with the requirement to pursue these goals sustainably -- economically, environmentally, and socially -- the two goals are comprehensive in nature. They are fully aligned to support the Sustainable Development Goals (SDGs) set by the United Nations to replace the Millennium Development Goals (MDGs). To evaluate progress, the two goals are measured by two overall indicators: a reduction in the global headcount ratio of extreme poverty (the population share of those whose income is below the international poverty line) to 3 percent by 2030, and the promotion of income growth in the bottom 40 (B40) percent of the population in each country.This Policy Research Note updates the assessment of progress toward these two goals in a sustainable manner. The poverty goal is examined through three lenses: the evolution of income poverty based on the new international poverty line that has been re-estimated at $1.90 a day; an assessment of person-equivalent income poverty, a new intuitive indicator that combines the incidence with the depth of poverty; and a review of the breadth of poverty, recognizing that income shortfalls often coexist with multiple non-income deprivations. The shared prosperity goal is examined on the basis of the latest comparison of (comparable) household data on B40 income growth. As part of its analysis of the two goals, this note also comments on the status of defining and monitoring sustainability in its economic, environmental and social aspects
CXL Memory as Persistent Memory for Disaggregated HPC: A Practical Approach
In the landscape of High-Performance Computing (HPC), the quest for efficient
and scalable memory solutions remains paramount. The advent of Compute Express
Link (CXL) introduces a promising avenue with its potential to function as a
Persistent Memory (PMem) solution in the context of disaggregated HPC systems.
This paper presents a comprehensive exploration of CXL memory's viability as a
candidate for PMem, supported by physical experiments conducted on cutting-edge
multi-NUMA nodes equipped with CXL-attached memory prototypes. Our study not
only benchmarks the performance of CXL memory but also illustrates the seamless
transition from traditional PMem programming models to CXL, reinforcing its
practicality.
To substantiate our claims, we establish a tangible CXL prototype using an
FPGA card embodying CXL 1.1/2.0 compliant endpoint designs (Intel FPGA CXL IP).
Performance evaluations, executed through the STREAM and STREAM-PMem
benchmarks, showcase CXL memory's ability to mirror PMem characteristics in
App-Direct and Memory Mode while achieving impressive bandwidth metrics with
Intel 4th generation Xeon (Sapphire Rapids) processors.
The results elucidate the feasibility of CXL memory as a persistent memory
solution, outperforming previously established benchmarks. In contrast to
published DCPMM results, our CXL-DDR4 memory module offers comparable bandwidth
to local DDR4 memory configurations, albeit with a moderate decrease in
performance. The modified STREAM-PMem application underscores the ease of
transitioning programming models from PMem to CXL, thus underscoring the
practicality of adopting CXL memory.Comment: 12 pages, 9 figure
Exploring Fully Offloaded GPU Stream-Aware Message Passing
Modern heterogeneous supercomputing systems are comprised of CPUs, GPUs, and
high-speed network interconnects. Communication libraries supporting efficient
data transfers involving memory buffers from the GPU memory typically require
the CPU to orchestrate the data transfer operations. A new offload-friendly
communication strategy, stream-triggered (ST) communication, was explored to
allow offloading the synchronization and data movement operations from the CPU
to the GPU. A Message Passing Interface (MPI) one-sided active target
synchronization based implementation was used as an exemplar to illustrate the
proposed strategy. A latency-sensitive nearest neighbor microbenchmark was used
to explore the various performance aspects of the implementation. The offloaded
implementation shows significant on-node performance advantages over standard
MPI active RMA (36%) and point-to-point (61%) communication. The current
multi-node improvement is less (23% faster than standard active RMA but 11%
slower than point-to-point), but plans are in progress to purse further
improvements.Comment: 12 pages, 17 figure
Early Holocene ritual complexity in South America: the archaeological record of Lapa do Santo (east-central Brazil)
Early Archaic human skeletal remains found in a burial context in Lapa do Santo in eastcentral Brazil provide a rare glimpse into the lives of hunter-gatherer communities in South America, including their rituals for dealing with the dead. These included the reduction of the body by means of mutilation, defleshing, tooth removal, exposure to fire and possibly cannibalism, followed by the secondary burial of the remains according to strict rules. In a later period, pits were filled with disarticulated bones of a single individual without signs of body manipulation, demonstrating that the region was inhabited by dynamic groups in constant transformation over a period of centuries
- …