253 research outputs found

    C Language Extensions for Hybrid CPU/GPU Programming with StarPU

    Get PDF
    Modern platforms used for high-performance computing (HPC) include machines with both general-purpose CPUs, and "accelerators", often in the form of graphical processing units (GPUs). StarPU is a C library to exploit such platforms. It provides users with ways to define "tasks" to be executed on CPUs or GPUs, along with the dependencies among them, and by automatically scheduling them over all the available processing units. In doing so, it also relieves programmers from the need to know the underlying architecture details: it adapts to the available CPUs and GPUs, and automatically transfers data between main memory and GPUs as needed. While StarPU's approach is successful at addressing run-time scheduling issues, being a C library makes for a poor and error-prone programming interface. This paper presents an effort started in 2011 to promote some of the concepts exported by the library as C language constructs, by means of an extension of the GCC compiler suite. Our main contribution is the design and implementation of language extensions that map to StarPU's task programming paradigm. We argue that the proposed extensions make it easier to get started with StarPU,eliminate errors that can occur when using the C library, and help diagnose possible mistakes. We conclude on future work

    Distinction of the Steinberg representation III: the tamely ramified case

    Get PDF
    Let FF be a nonarchimedean local field, let EE be a Galois quadratic extension of FF and let GG be a quasisplit group defined over FF; a conjecture by Dipendra Prasad states that the Steinberg representation of G(E)G(E) is then χ\chi-distinguished for a given unique character χ\chi of G(F)G(F). In the first two papers of the series, Broussous and the author have proved that result when GG is FF-split and E/FE/F is unramified; this paper deals with the tamely ramified case, still with GG FF-split.Comment: 74 pages; proof of χ\chi-distinction improved + some minor correction

    Multi-spectral piston sensor for co-phasing giant segmented mirrors and multi-aperture interferometric arrays

    Full text link
    This paper presents the optical design of a multi-spectral piston sensor suitable to co-phasing giant segmented mirrors equipping the Future Extremely Large Telescopes (ELTs). The general theory of the sensor is described in detail and numerical simulations have been carried out, demonstrating that direct piston and tip-tilt measurements are feasible within accuracies respectively close to 20 nm and 10 nano-radians. Those values are compatible with the co-phasing requirements, although the method seems to be perturbed by uncorrected atmospheric seein

    Distinction of representations via Bruhat-Tits buildings of p-adic groups

    Full text link
    Introductory and pedagogical treatmeant of the article : P. Broussous "Distinction of the Steinberg representation", with an appendix by Fran\c{c}ois Court\`es, IMRN 2014, no 11, 3140-3157. To appear in Proceedings of Chaire Jean Morlet, Dipendra Prasad, Volker Heiermann Ed. 2017. Contains modified and simplified proofs of loc. cit. This article is written in memory of Fran\c{c}ois Court\`es who passed away in september 2016.Comment: 33 pages, 4 figure

    The magnetic field along the jets of NGC 4258 as deduced from high frequency radio observations

    Full text link
    We present 2.4" resolution, high sensitivity radio continuum observations of the nearby spiral galaxy NGC 4258 in total intensity and linear polarization obtained with the Very Large Array at 3.6 cm (8.44 GHz). The radio emission along the northern jet and the center of the galaxy is polarized and allows investigation of the magnetic field. Assuming energy-equipartition between the magnetic field and the relativistic particles and distinguishing between (1) a relativistic electron-proton jet and (2) a relativistic electron-positron jet, we obtain average magnetic field strengths of about (1) 310\muG and (2) 90\muG. The rotation measure is determined to range from -400 to -800 rad/m^2 in the northern jet. Correcting the observed E-vectors of polarized intensity for Faraday rotation, the magnetic field along the jet turns out to be orientated mainly along the jet axis. An observed tilt with respect to the jet axis may indicate also a toroidal magnetic field component or a slightly helical magnetic field around the northern jet.Comment: 9 pages with 9 figures. Accepted for publication in A&

    Blasts and shocks in the disc of NGC 4258

    Get PDF
    We present integral field spectroscopic observations of the central region of the active galaxy NGC 4258 obtained with the fibre IFU system INTEGRAL. We have been able to detect cold neutral gas by means of the interstellar NaD doublet absorption and to trace its distribution and kinematics with respect to the underlying disc. The neutral gas is blue-shifted with projected velocities in the 120--370 km/s range. We have also detected peculiar kinematics in part of the ionized gas in this region by means of a careful kinematic decomposition. The bipolar spatial distribution of the broader component is roughly coincident with the morphology of the X-ray diffuse emission. The kinematics of this gas can be explained in terms of expansion at very high (projected) velocities of up to 300 km/s. The observations also reveal the existence of a strip of neutral gas, parallel to the major kinematic axis, that is nearly coincident with a region of very high [SII]/Hα\alpha ratio tracing the shocked gas. Our observations are consistent with the jet model presented by \cite{wilsonetal01} in which a cocoon originating from the nuclear jet is shocking the gas in the galaxy disc. Alternatively, our observations are also consistent with the bipolar hypershell model of \cite{Sofue80} and \cite{SofueandVogler01}. On balance, we prefer the latter model as the most likely explanation for the puzzling features of this peculiar object.Comment: 7 pages, 10 colour figures. Accepted for publication in MNRAS

    Reproductibilité et performance : pourquoi choisir ?

    Get PDF
    International audienceResearch processes often rely on high-performance computing (HPC), but HPC is often seen as antithetical to "reproducibility": one would have to choose between software that achieves high performance, and software that can be deployed in a reproducible fashion. However, by giving up on reproducibility we would give up on verifiability, a foundation of the scientific process. How can we conciliate performance and reproducibility? This article looks at two performance-critical aspects in HPC: message passing (MPI) and CPU micro-architecture tuning. Engineering work that has gone into performance portability has already proved fruitful, but some areas remain unaddressed when it comes to CPU tuning. We propose package multi-versioning, a technique developed for GNU Guix, a tool for reproducible software deployment, and show that it allows us to implement CPU tuning without compromising on reproducibility and provenance tracking.Les travaux de recherche dépendent souvent de calcul intensif (ou HPC, pour « high-performance computing »), mais celui-ci est souvent perçu comme incompatible avec la « reproductibilité » : il faudrait choisir entre un logiciel performant et un logiciel qui puisse être déployé de manière reproductible. Mais en renonçant à la reproductibilité, on perdrait la capacité de vérifier les résultats, qui est pourtant un fondement de la démarche scientifique. Comment peut-on concilier performance et reproductibilité ? Cet article s’intéresse à deux aspects critiques de la performance en HPC : le passage de messages (MPI) et les micro-architectures de processeurs. Le travail d’ingénierie pour atteindre la portabilité des performances a été fructueux, mais certaines zones d’ombres persistent lorsqu’il s’agit de produire du code pour un processeur spécifique. Nous proposons le multi-versionage de paquets, une technique développée pour GNU Guix, un outil de déploiement logiciel reproductible, et montrons que cela permet de produire du code optimisé pour un CPU sans renoncer à la reproductibilité et à la traçabilité
    corecore