Search CORE

9 research outputs found

Towards a self-consistent orbital evolution for EMRIs

Author: Aoudia Sofiane
Cordier Stèphane
Jubertie Sylvain
Ritter Patxi
Spallicci Alessandro
Publication venue
Publication date: 21/05/2012
Field of study

We intend to develop part of the theoretical tools needed for the detection of gravitational waves coming from the capture of a compact object, 1-100 solar masses, by a Supermassive Black Hole, up to a 10 billion solar masses, located at the centre of most galaxies. The analysis of the accretion activity unveils the star population around the galactic nuclei, and tests the physics of black holes and general relativity. The captured small mass is considered a probe of the gravitational field of the massive body, allowing a precise measurement of the particle motion up to the final absorption. The knowledge of the gravitational signal, strongly affected by the self-force - the orbital displacement due to the captured mass and the emitted radiation - is imperative for a successful detection. The results include a strategy for wave equations with a singular source term for all type of orbits. We are now tackling the evolution problem, first for radial fall in Regge- Wheeler gauge, and later for generic orbits in the harmonic or de Donder gauge for Schwarzschild-Droste black holes. In the Extreme Mass Ratio Inspiral, the determination of the orbital evolution demands that the motion of the small mass be continuously corrected by the self-force, i.e. the self-consistent evolution. At each of the integration steps, the self-force must be computed over an adequate number of modes; further, a differential-integral system of general relativistic equations is to be solved and the outputs regularised for suppressing divergences. Finally, for the provision of the computational power, parallelisation is under examination.Comment: IX Lisa Conference (held the 21-25 May 2012 in Paris) Proceedings by the Astronomical Society of the Pacific Conference Seri

arXiv.org e-Print Archive

HAL-INSU

HAL-OBSPM

heterogeneous applications on SMP clusters

Author: Emmanuel Melin
Emmanuel Melin
Sylvain Jubertie
Sylvain Jubertie
Publication venue
Publication date
Field of study

Task-to-processor allocation for distributed heterogeneous applications on SMP cluster

CiteSeerX

Modèles et outils pour le déploiement d'applications de Réalité Virtuelle sur des architectures distribuées

Author: Jubertie Sylvain
Publication venue: HAL CCSD
Publication date: 14/12/2007
Field of study

Virtual Reality applications require a huge amount of computational power that clusters, sets of computers connected with networks, are able to provide. To take advantage of these architectures, it is possible to split applications into several parts, called components, and to map them on different cluster nodes. Performance of such applications depends on hardware performance, on their mapping and also on the synchronization and communication schemes between components. To determine if a VR application can run in an interactive way, we can map and run it on the architecture. If the application does not perform as expected we have to try another mapping. However, it is often a long and tedious process before finding a mapping with expected performance. To speed up this process we define a performance model which enables to evaluate performance of a given mapping for a distributed application on a cluster from architecture, application, and mapping descriptions. Then, we propose an approach based on constraint programming to automatically generate mappings. Constraints are defined from our model, from performance of the architecture and also from performance expected by the user. This approach enables to answer the following answers: Does at least one mapping exists with the expected performance on the given architecture ? If it does then what are these mappings? Does the application performs better if we increase the number of nodes of the architecture?Les applications de Réalité Virtuelle requièrent une puissance de calcul importante qui peut être apportée par les grappes de PC, des ensembles d'ordinateurs connectés par des réseaux performants. Afin d'exploiter la puissance de ces architectures, une approche consiste à décomposer les applications en plusieurs composants qui sont ensuite déployés sur les différentes machines. Les performances de telles applications dépendent alors du matériel ainsi que des synchronisations entre les différents composants. Evaluer les performances d'une application de RV suivant un déploiement donné consiste à observer si son exécution permet l'interactivité. Cependant, cette phase de test rend la recherche d'un déploiement répondant à ce critère longue et fastidieuse et monopolise l'architecture. Nous proposons donc de définir un modèle permettant l'évaluation des performances à partir de la modélisation de l'architecture, de l'application et de son déploiement. Nous proposons ensuite d'utiliser la programmation par contraintes pour résoudre les contraintes extraites de notre modèle et permettre ainsi d'automatiser la génération de déploiements capables de fournir le niveau d'interactivité souhaité. Cette approche permet ainsi de répondre aux nombreuses questions que peut se poser un développeur : Existe t'il un ou plusieurs déploiements de mon application permettant l'interactivité sur mon architecture ? Si oui, quels sont ils ? L'ajout de machines supplémentaires permet il un gain de performances

Thèses en Ligne

Performance prediction for mappings of distributed applications on pc clusters

Author: Emmanuel Melin
Sylvain Jubertie
Publication venue
Publication date: 01/01/2007
Field of study

Abstract. Distributed applications running on clusters may be composed of several components with very different performance requirements. The FlowVR middleware allows the developer to deploy such applications and to define communication and synchronization schemes between components without modifying the code. While it eases the creation of mappings, FlowVR does not come with a performance model. Consequently the optimization of mappings is left to the developer’s skills. But this task becomes difficult as the number of components and cluster nodes grow and even more complex if the cluster is composed of heterogeneous nodes and networks. In this paper we propose an approach to predict performance of FlowVR distributed applications given a mapping and a cluster. We also give some advice to the developer to create efficient mappings and to avoid configurations which may lead to unexpected performance. Since the FlowVR model is very close to underlying models of lots of distributed codes, our approach can be useful for all designers of such applications.

CiteSeerX

Crossref

Multiple networks for heterogeneous distributed applications

Author: Emmanuel Melin
Sylvain Jubertie
Publication venue
Publication date
Field of study

Abstract- We have experienced in our distributed applications that the network is the main limiting factor for performances on clusters. Indeed clusters are cheap and it is easier to add more nodes to extend the computing capacity than to switch to costly high performance networks. Consequently the developer should especially take care of communications and synchronizations in its application design. The FlowVR middleware offers a way to build distributed applications independently of a particular communication or synchronization scheme. This eases the design of distributed applications independently of their coupling and mapping on clusters. Moreover we propose a performance prediction model for FlowVR applications which is adapted to heterogeneous SMP clusters with multiple networks. In this paper we present an analysis of communication schemes based on our performance prediction model. We give some advices to the developer to optimize communications in its mappings. We also show how to use multiple networks on heterogeneous clusters to balance network load and decrease communication times. Since the FlowVR model is very close to underlying models of lots of distributed codes, our approach can be useful for all developers of such applications

CiteSeerX

A multi-level optimization strategy to improve the performance of the stencil computation

Author: Dupros Fabrice
Jubertie Sylvain
sornet gauthier
Publication venue: HAL CCSD
Publication date: 12/06/2017
Field of study

International audienceStencil computation represents an important numerical kernel in scientific computing. Leveraging multicore or manycore parallelism to optimize such operations represents a major challenge due both to the bandwidth demand and the low arithmetic intensity. The situation is worsened by the complexity of current architectures and the potential impact of various mechanisms (cache memory, vectorization, compilation). In this paper, we describe a multi-level optimization strategy that combines manual vectorization, space tiling and stencil composition. A major effort of this study is the comparison of our results with Pochoir stencil compiler framework. We evaluate our methodology with a set of three different compilers (Intel, Clang and GCC) on two recent generations of Intel multicore platforms. Our results show a good match with the theoretical performance models (i.e. roofline models). We also outperform Pochoir performance by a factor of x2.5 in the best cases

Data layout and SIMD abstraction layers: decoupling interfaces from implementations

Author: Falcou Joel
Jubertie Sylvain
Masliah Ian
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 16/07/2018
Field of study

International audienceFrom a high level point of view, developers define objects they manipulate in terms of structures or classes. For example, a pixel may be represented as a structure of three color components : red, green, blue and an image as an array of pixels. In such cases, the data layout is said to be organized as an array of structures (AoS). However, developing efficient applications on modern processors and accelerators often require to organize data in different ways. An image may also be stored as a structure of three arrays, one for each component. This data layout is called a structure of array (SoA) and is also mandatory to take advantage of SIMD units embedded in all modern processors. In this paper, we propose a lightweight C++ template-based framework to provide the high level representation most programmers use (AoS) on different data layouts fitted for SIMD vectorization. Some templated containers are provided for each proposed layout with a uniform AoS-like interface to access elements. Containers are transformed into different combinations of tuples and vectors from the C++ Standard Template Library (STL) at compile time. This way, we provide more optimization opportunities for the code, especially automatic vectorization. We study the performance of our data-layouts and compare them to their explicit versions, based on structures and vectors, for different algorithms and architectures (x86 and ARM). Results show that compilers do not always perform automatic vectorization on our data-layouts as with their explicit versions even if underlying containers and access patterns are similar. Thus, we investigate the use of SIMD intrinsics and of Boost.SIMD 1 /bSIMD libraries to vectorize the codes. We show that combining our approach with Boost.SIMD/bSIMD libraries ensures a similar performance as with a manual vectorization using intrinsics, and in almost all cases better performance than with automatic vectorization without increasing the code complexity

HAL-CentraleSupelec

Crossref

HAL-Rennes 1

Vectorization of a spectral finite-element numerical kernel

Author: De Martin Florent
Dupros Fabrice
Jubertie Sylvain
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 24/02/2018
Field of study

International audienc

An out-of-core GPU approach for accelerating geostatistical interpolation

Author: Allombert Victor
Aochi Hideo
Bellier Christian
Bourgine Bernard
Dupros Fabrice
Jubertie Sylvain
Michea David
Publication venue: 'Elsevier BV'
Publication date: 01/01/2014
Field of study

International audienceGeostatistical methods provide a powerful tool to understand the complexity of data arising from Earth sciences. Since the mid 70's, this numerical approach is widely used to understand the spatial variation of natural phenomena in various domains like Oil and Gas, Mining or Environmental Industries. Considering the huge amount of data available, standard implementations of these numerical methods are not efficient enough to tackle current challenges in geosciences. Moreover, most of the software packages available for geostatisticians are designed for a usage on a desktop computer due to the trial and error procedure used during the interpolation. The Geological Data Management (GDM) software package developed by the French geological survey (BRGM) is widely used to build reliable three-dimensional geological models that require a large amount of memory and computing resources. Considering the most time-consuming phase of kriging methodology, we introduce an efficient out-of-core algorithm that fully benefits from graphics cards acceleration on desktop computer. This way we are able to accelerate kriging on GPU with data 4 times bigger than a classical in-core GPU algorithm, with a limited loss of performances

Elsevier - Publisher Connector

HAL - UPEC / UPEM