200 research outputs found

    TreeMatch : Un algorithme de placement de processus sur architectures multicœurs

    Get PDF
    Conférence ComPAR/RenPAR 2013National audienceDepuis quelques années, les clusters de nœuds NUMA à processeurs multi-cœurs deviennent très répandus. Programmer efficacement ces architectures est un réel défi compte tenu de leur hiérarchie complexe. Afin d'en tirer pleinement profit, il est nécessaire de prendre en compte cette structure de façon précise et d'y faire correspondre le schéma de communication de l'application. Ce faisant, les coûts de communication sont réduits et l'on observe des gains sur le temps d'exécution total de l'application. Nous présentons ici comment nous utilisons d'un côté le schéma de communication et de l'autre une représentation fidèle de l'architecture pour produire une permutation des processus d'une application donnée, permettant ainsi une réduction des coûts de communication

    TAPIOCA: An I/O Library for Optimized Topology-Aware Data Aggregation on Large-Scale Supercomputers

    Get PDF
    International audienceReading and writing data efficiently from storage system is necessary for most scientific simulations to achieve good performance at scale. Many software solutions have been developed to decrease the I/O bottleneck. One well-known strategy, in the context of collective I/O operations, is the two-phase I/O scheme. This strategy consists of selecting a subset of processes to aggregate contiguous pieces of data before performing reads/writes. In this paper, we present TAPIOCA, an MPI-based library implementing an efficient topology-aware two-phase I/O algorithm. We show how TAPIOCA can take advantage of double-buffering and one-sided communication to reduce as much as possible the idle time during data aggregation. We also introduce our cost model leading to a topology-aware aggregator placement optimizing the movements of data. We validate our approach at large scale on two leadership-class supercomputers: Mira (IBM BG/Q) and Theta (Cray XC40). We present the results obtained with TAPIOCA on a micro-benchmark and the I/O kernel of a large-scale simulation. On both architectures, we show a substantial improvement of I/O performance compared with the default MPI I/O implementation. On BG/Q+GPFS, for instance, our algorithm leads to a performance improvement by a factor of twelve while on the Cray XC40 system associated with a Lustre filesystem, we achieve an improvement of four

    Matching communication pattern with underlying hardware architecture

    Get PDF
    International audienceMATCHING COMMUNICATION PATTERN WITH UNDERLYING HARDWARE ARCHITECTUR

    Topology and affinity aware hierarchical and distributed load-balancing in Charm++

    Get PDF
    International audienceThe evolution of massively parallel supercomputers make palpable two issues in particular: the load imbalance and the poor management of data locality in applications. Thus, with the increase of the number of cores and the drastic decrease of amount of memory per core, the large performance needs imply to particularly take care of the load-balancing and as much as possible of the locality of data. One mean to take into account this locality issue relies on the placement of the processing entities and load balancing techniques are relevant in order to improve application performance. With large-scale platforms in mind, we developed a hierarchical and distributed algorithm which aim is to perform a topology-aware load balancing tailored for Charm++ applications. This algorithm is based on both LibTopoMap for the network awareness aspects and on TREEMATCH to determine a relevant placement of the processing entities. We show that the proposed algorithm improves the overall execution time in both the cases of real applications and a synthetic benchmark as well. For this last experiment, we show a scalability up to one millions processing entities

    Formal Detection of Attentional Tunneling in Human Operator-Automation Interactions

    Get PDF
    The allocation of visual attention is a key factor for the humans when operating complex systems under time pressure with multiple information sources. In some situations, attentional tunneling is likely to appear and leads to excessive focus and poor decision making. In this study, we propose a formal approach to detect the occurrence of such an attentional impairment that is based on machine learning techniques. An experiment was conducted to provoke attentional tunneling during which psycho-physiological and oculomotor data from 23 participants were collected. Data from 18 participants were used to train an adaptive neuro-fuzzy inference system (ANFIS). From a machine learning point of view, the classification performance of the trained ANFIS proved the validity of this approach. Furthermore, the resulting classification rules were consistent with the attentional tunneling literature. Finally, the classifier was robust to detect attentional tunneling when performing over test data from four participants

    Fleury-sur-Orne – Rue Louise-Michel, centre de maintenance du tramway

    Get PDF
    La fouille a permis de mettre en évidence les fossés correspondant à deux monuments funéraires néolithiques partiels et un monument double entier de type Passy. Ils s’inscrivent dans la continuité de la nécropole de Fleury-sur-Orne « Les Hauts de l’Orne », avec un des monuments du diagnostic déjà partiellement fouillé en 2014 (mon. 24). L’autre monument partiel, no 7, est inscrit dans la partie nord de l’emprise. Il mesure plus de 118 m de long pour 15 de large Ses fossés sont sub-parallèles...

    Topology-Aware Data Aggregation for Intensive I/O on Large-Scale Supercomputers

    Get PDF
    International audienceReading and writing data efficiently from storage systems is critical for high performance data-centric applications. These I/O systems are being increasingly characterized by complex topologies and deeper memory hierarchies. Effective parallel I/O solutions are needed to scale applications on current and future supercomputers. Data aggregation is an efficient approach consisting of electing some processes in charge of aggregating data from a set of neighbors and writing the aggregated data into storage. Thus, the bandwidth use can be optimized while the contention is reduced. In this work, we take into account the network topology for mapping aggregators and we propose an optimized buffering system in order to reduce the aggregation cost. We validate our approach using micro-benchmarks and the I/O kernel of a large-scale cosmology simulation. We show improvements up to 15Ă— faster for I/O operations compared to a standard implementation of MPI I/O
    • …
    corecore