1,279 research outputs found

    Adjoint computations by algorithmic differentiation of a parallel solver for time-dependent PDEs

    Get PDF
    A computational fluid dynamics code is differentiated using algorithmic differentiation (AD) in both tangent and adjoint modes. The two novelties of the present approach are 1) the adjoint code is obtained by letting the AD tool Tapenade invert the complete layer of message passing interface (MPI) communications, and 2) the adjoint code integrates time-dependent, non-linear and dissipative (hence physically irreversible) PDEs with an explicit time integration loop running for ca. 10610^{6} time steps. The approach relies on using the Adjoinable MPI library to reverse the non-blocking communication patterns in the original code, and by controlling the memory overhead induced by the time-stepping loop with binomial checkpointing. A description of the necessary code modifications is provided along with the validation of the computed derivatives and a performance comparison of the tangent and adjoint codes.Comment: Submitted to Journal of Computational Scienc

    Ending Extreme Poverty and Sharing Prosperity: Progress and Policies

    Get PDF
    To guide its work toward a "world free of poverty," the World Bank Group in 2013 established two clear goals: end extreme poverty by 2030 and promote shared prosperity. Along with the requirement to pursue these goals sustainably -- economically, environmentally, and socially -- the two goals are comprehensive in nature. They are fully aligned to support the Sustainable Development Goals (SDGs) set by the United Nations to replace the Millennium Development Goals (MDGs). To evaluate progress, the two goals are measured by two overall indicators: a reduction in the global headcount ratio of extreme poverty (the population share of those whose income is below the international poverty line) to 3 percent by 2030, and the promotion of income growth in the bottom 40 (B40) percent of the population in each country.This Policy Research Note updates the assessment of progress toward these two goals in a sustainable manner. The poverty goal is examined through three lenses: the evolution of income poverty based on the new international poverty line that has been re-estimated at $1.90 a day; an assessment of person-equivalent income poverty, a new intuitive indicator that combines the incidence with the depth of poverty; and a review of the breadth of poverty, recognizing that income shortfalls often coexist with multiple non-income deprivations. The shared prosperity goal is examined on the basis of the latest comparison of (comparable) household data on B40 income growth. As part of its analysis of the two goals, this note also comments on the status of defining and monitoring sustainability in its economic, environmental and social aspects

    CXL Memory as Persistent Memory for Disaggregated HPC: A Practical Approach

    Full text link
    In the landscape of High-Performance Computing (HPC), the quest for efficient and scalable memory solutions remains paramount. The advent of Compute Express Link (CXL) introduces a promising avenue with its potential to function as a Persistent Memory (PMem) solution in the context of disaggregated HPC systems. This paper presents a comprehensive exploration of CXL memory's viability as a candidate for PMem, supported by physical experiments conducted on cutting-edge multi-NUMA nodes equipped with CXL-attached memory prototypes. Our study not only benchmarks the performance of CXL memory but also illustrates the seamless transition from traditional PMem programming models to CXL, reinforcing its practicality. To substantiate our claims, we establish a tangible CXL prototype using an FPGA card embodying CXL 1.1/2.0 compliant endpoint designs (Intel FPGA CXL IP). Performance evaluations, executed through the STREAM and STREAM-PMem benchmarks, showcase CXL memory's ability to mirror PMem characteristics in App-Direct and Memory Mode while achieving impressive bandwidth metrics with Intel 4th generation Xeon (Sapphire Rapids) processors. The results elucidate the feasibility of CXL memory as a persistent memory solution, outperforming previously established benchmarks. In contrast to published DCPMM results, our CXL-DDR4 memory module offers comparable bandwidth to local DDR4 memory configurations, albeit with a moderate decrease in performance. The modified STREAM-PMem application underscores the ease of transitioning programming models from PMem to CXL, thus underscoring the practicality of adopting CXL memory.Comment: 12 pages, 9 figure

    Exploring Fully Offloaded GPU Stream-Aware Message Passing

    Full text link
    Modern heterogeneous supercomputing systems are comprised of CPUs, GPUs, and high-speed network interconnects. Communication libraries supporting efficient data transfers involving memory buffers from the GPU memory typically require the CPU to orchestrate the data transfer operations. A new offload-friendly communication strategy, stream-triggered (ST) communication, was explored to allow offloading the synchronization and data movement operations from the CPU to the GPU. A Message Passing Interface (MPI) one-sided active target synchronization based implementation was used as an exemplar to illustrate the proposed strategy. A latency-sensitive nearest neighbor microbenchmark was used to explore the various performance aspects of the implementation. The offloaded implementation shows significant on-node performance advantages over standard MPI active RMA (36%) and point-to-point (61%) communication. The current multi-node improvement is less (23% faster than standard active RMA but 11% slower than point-to-point), but plans are in progress to purse further improvements.Comment: 12 pages, 17 figure

    Early Holocene ritual complexity in South America: the archaeological record of Lapa do Santo (east-central Brazil)

    Get PDF
    Early Archaic human skeletal remains found in a burial context in Lapa do Santo in eastcentral Brazil provide a rare glimpse into the lives of hunter-gatherer communities in South America, including their rituals for dealing with the dead. These included the reduction of the body by means of mutilation, defleshing, tooth removal, exposure to fire and possibly cannibalism, followed by the secondary burial of the remains according to strict rules. In a later period, pits were filled with disarticulated bones of a single individual without signs of body manipulation, demonstrating that the region was inhabited by dynamic groups in constant transformation over a period of centuries
    corecore