9 research outputs found

    Towards a self-consistent orbital evolution for EMRIs

    Full text link
    We intend to develop part of the theoretical tools needed for the detection of gravitational waves coming from the capture of a compact object, 1-100 solar masses, by a Supermassive Black Hole, up to a 10 billion solar masses, located at the centre of most galaxies. The analysis of the accretion activity unveils the star population around the galactic nuclei, and tests the physics of black holes and general relativity. The captured small mass is considered a probe of the gravitational field of the massive body, allowing a precise measurement of the particle motion up to the final absorption. The knowledge of the gravitational signal, strongly affected by the self-force - the orbital displacement due to the captured mass and the emitted radiation - is imperative for a successful detection. The results include a strategy for wave equations with a singular source term for all type of orbits. We are now tackling the evolution problem, first for radial fall in Regge- Wheeler gauge, and later for generic orbits in the harmonic or de Donder gauge for Schwarzschild-Droste black holes. In the Extreme Mass Ratio Inspiral, the determination of the orbital evolution demands that the motion of the small mass be continuously corrected by the self-force, i.e. the self-consistent evolution. At each of the integration steps, the self-force must be computed over an adequate number of modes; further, a differential-integral system of general relativistic equations is to be solved and the outputs regularised for suppressing divergences. Finally, for the provision of the computational power, parallelisation is under examination.Comment: IX Lisa Conference (held the 21-25 May 2012 in Paris) Proceedings by the Astronomical Society of the Pacific Conference Seri

    heterogeneous applications on SMP clusters

    No full text
    Task-to-processor allocation for distributed heterogeneous applications on SMP cluster

    ModÚles et outils pour le déploiement d'applications de Réalité Virtuelle sur des architectures distribuées

    No full text
    Virtual Reality applications require a huge amount of computational power that clusters, sets of computers connected with networks, are able to provide. To take advantage of these architectures, it is possible to split applications into several parts, called components, and to map them on different cluster nodes. Performance of such applications depends on hardware performance, on their mapping and also on the synchronization and communication schemes between components. To determine if a VR application can run in an interactive way, we can map and run it on the architecture. If the application does not perform as expected we have to try another mapping. However, it is often a long and tedious process before finding a mapping with expected performance. To speed up this process we define a performance model which enables to evaluate performance of a given mapping for a distributed application on a cluster from architecture, application, and mapping descriptions. Then, we propose an approach based on constraint programming to automatically generate mappings. Constraints are defined from our model, from performance of the architecture and also from performance expected by the user. This approach enables to answer the following answers: Does at least one mapping exists with the expected performance on the given architecture ? If it does then what are these mappings? Does the application performs better if we increase the number of nodes of the architecture?Les applications de RĂ©alitĂ© Virtuelle requiĂšrent une puissance de calcul importante qui peut ĂȘtre apportĂ©e par les grappes de PC, des ensembles d'ordinateurs connectĂ©s par des rĂ©seaux performants. Afin d'exploiter la puissance de ces architectures, une approche consiste à  dĂ©composer les applications en plusieurs composants qui sont ensuite dĂ©ployĂ©s sur les diffĂ©rentes machines. Les performances de telles applications dĂ©pendent alors du matĂ©riel ainsi que des synchronisations entre les diffĂ©rents composants. Evaluer les performances d'une application de RV suivant un dĂ©ploiement donnĂ© consiste à  observer si son exĂ©cution permet l'interactivitĂ©. Cependant, cette phase de test rend la recherche d'un dĂ©ploiement rĂ©pondant à  ce critĂšre longue et fastidieuse et monopolise l'architecture. Nous proposons donc de dĂ©finir un modĂšle permettant l'Ă©valuation des performances à  partir de la modĂ©lisation de l'architecture, de l'application et de son dĂ©ploiement. Nous proposons ensuite d'utiliser la programmation par contraintes pour rĂ©soudre les contraintes extraites de notre modĂšle et permettre ainsi d'automatiser la gĂ©nĂ©ration de dĂ©ploiements capables de fournir le niveau d'interactivitĂ© souhaitĂ©. Cette approche permet ainsi de rĂ©pondre aux nombreuses questions que peut se poser un dĂ©veloppeur : Existe t'il un ou plusieurs dĂ©ploiements de mon application permettant l'interactivitĂ© sur mon architecture ? Si oui, quels sont ils ? L'ajout de machines supplĂ©mentaires permet il un gain de performances

    Performance prediction for mappings of distributed applications on pc clusters

    No full text
    Abstract. Distributed applications running on clusters may be composed of several components with very different performance requirements. The FlowVR middleware allows the developer to deploy such applications and to define communication and synchronization schemes between components without modifying the code. While it eases the creation of mappings, FlowVR does not come with a performance model. Consequently the optimization of mappings is left to the developer’s skills. But this task becomes difficult as the number of components and cluster nodes grow and even more complex if the cluster is composed of heterogeneous nodes and networks. In this paper we propose an approach to predict performance of FlowVR distributed applications given a mapping and a cluster. We also give some advice to the developer to create efficient mappings and to avoid configurations which may lead to unexpected performance. Since the FlowVR model is very close to underlying models of lots of distributed codes, our approach can be useful for all designers of such applications.

    Multiple networks for heterogeneous distributed applications

    No full text
    Abstract- We have experienced in our distributed applications that the network is the main limiting factor for performances on clusters. Indeed clusters are cheap and it is easier to add more nodes to extend the computing capacity than to switch to costly high performance networks. Consequently the developer should especially take care of communications and synchronizations in its application design. The FlowVR middleware offers a way to build distributed applications independently of a particular communication or synchronization scheme. This eases the design of distributed applications independently of their coupling and mapping on clusters. Moreover we propose a performance prediction model for FlowVR applications which is adapted to heterogeneous SMP clusters with multiple networks. In this paper we present an analysis of communication schemes based on our performance prediction model. We give some advices to the developer to optimize communications in its mappings. We also show how to use multiple networks on heterogeneous clusters to balance network load and decrease communication times. Since the FlowVR model is very close to underlying models of lots of distributed codes, our approach can be useful for all developers of such applications

    A multi-level optimization strategy to improve the performance of the stencil computation

    No full text
    International audienceStencil computation represents an important numerical kernel in scientific computing. Leveraging multicore or manycore parallelism to optimize such operations represents a major challenge due both to the bandwidth demand and the low arithmetic intensity. The situation is worsened by the complexity of current architectures and the potential impact of various mechanisms (cache memory, vectorization, compilation). In this paper, we describe a multi-level optimization strategy that combines manual vectorization, space tiling and stencil composition. A major effort of this study is the comparison of our results with Pochoir stencil compiler framework. We evaluate our methodology with a set of three different compilers (Intel, Clang and GCC) on two recent generations of Intel multicore platforms. Our results show a good match with the theoretical performance models (i.e. roofline models). We also outperform Pochoir performance by a factor of x2.5 in the best cases

    Data layout and SIMD abstraction layers: decoupling interfaces from implementations

    No full text
    International audienceFrom a high level point of view, developers define objects they manipulate in terms of structures or classes. For example, a pixel may be represented as a structure of three color components : red, green, blue and an image as an array of pixels. In such cases, the data layout is said to be organized as an array of structures (AoS). However, developing efficient applications on modern processors and accelerators often require to organize data in different ways. An image may also be stored as a structure of three arrays, one for each component. This data layout is called a structure of array (SoA) and is also mandatory to take advantage of SIMD units embedded in all modern processors. In this paper, we propose a lightweight C++ template-based framework to provide the high level representation most programmers use (AoS) on different data layouts fitted for SIMD vectorization. Some templated containers are provided for each proposed layout with a uniform AoS-like interface to access elements. Containers are transformed into different combinations of tuples and vectors from the C++ Standard Template Library (STL) at compile time. This way, we provide more optimization opportunities for the code, especially automatic vectorization. We study the performance of our data-layouts and compare them to their explicit versions, based on structures and vectors, for different algorithms and architectures (x86 and ARM). Results show that compilers do not always perform automatic vectorization on our data-layouts as with their explicit versions even if underlying containers and access patterns are similar. Thus, we investigate the use of SIMD intrinsics and of Boost.SIMD 1 /bSIMD libraries to vectorize the codes. We show that combining our approach with Boost.SIMD/bSIMD libraries ensures a similar performance as with a manual vectorization using intrinsics, and in almost all cases better performance than with automatic vectorization without increasing the code complexity

    Vectorization of a spectral finite-element numerical kernel

    No full text
    International audienc

    An out-of-core GPU approach for accelerating geostatistical interpolation

    Get PDF
    International audienceGeostatistical methods provide a powerful tool to understand the complexity of data arising from Earth sciences. Since the mid 70's, this numerical approach is widely used to understand the spatial variation of natural phenomena in various domains like Oil and Gas, Mining or Environmental Industries. Considering the huge amount of data available, standard implementations of these numerical methods are not efficient enough to tackle current challenges in geosciences. Moreover, most of the software packages available for geostatisticians are designed for a usage on a desktop computer due to the trial and error procedure used during the interpolation. The Geological Data Management (GDM) software package developed by the French geological survey (BRGM) is widely used to build reliable three-dimensional geological models that require a large amount of memory and computing resources. Considering the most time-consuming phase of kriging methodology, we introduce an efficient out-of-core algorithm that fully benefits from graphics cards acceleration on desktop computer. This way we are able to accelerate kriging on GPU with data 4 times bigger than a classical in-core GPU algorithm, with a limited loss of performances
    corecore