22 research outputs found

    A New Approach to Configurable Dynamic Scheduling in Clusters based on Single System Image Technologies

    Get PDF
    Clusters are now considered as an alternative to parallel machines to execute workloads made up of sequential and/or parallel applications. For efficient application execution on clusters, dynamic global process scheduling is of prime importance. Different dynamic scheduling policies that have been studied for distributed systems or parallel machines may be used in clusters. The choice of a particular policy depends on the kind of workload to be executed. In a cluster, it is thus highly desirable to implement a configurable global scheduler to be able to adapt the dynamic scheduling policy to the workload characteristics, to take benefit of all cluster resources and tocope with node shutdown and reboot. In this paper, we present the architecture of the global scheduler and the process management mechanisms of Kerrighed, a single system image operating system designed for high performance computing on clusters. Kerrighed provides a development framework allowing to easily implement dynamic scheduling policies without kernel modification. In Kerrighed, the global scheduling policy can be dynamically changed while applications execute on the cluster. Kerrighed's process management mechanisms allow to easily deploy parallelapplications in the cluster and to efficiently migrate or checkpoint processes, including processes sharing memory. Kerrighed has been implemented as a set of modules extending Linux kernel. Preliminary performance results are presented

    Image and Video Segmentation of Appearance-Volatile Objects

    Get PDF
    Segmentation is a process of partitioning a digital image or frame into multiple regions or objects. The goal of segmentation is to identify and locate the objects of interest with their boundaries. Recent segmentation approaches often follow such a pipeline: they first train the model on a collected dataset and then evaluate the trained model on a given image or video. They assume that the appearance of object is consistent in training and testing sets. However, the appearance of object may change in different photography conditions. How to effectively segment the objects with volatile appearance remains under-explored. In this work, we present a framework for image and video segmentation of appearance-volatile objects, including two novel modules, uncertain region refinement and feature bank. For image segmentation, we designed a new confidence loss and a fine-grained segmentation module to enhance the segmentation accuracy in uncertain regions. For video segmentation, we proposed a matching-based algorithm which feature banks are created to store features for region matching and classification. We introduced an adaptive feature bank update scheme to dynamically absorb new features and discard obsolete features. We compared our algorithm and the state-of-the-art methods on the public benchmarks. Our algorithm outperforms the existing methods and can produce more reliable and accurate segmentation results

    Impacts des effets NUMA sur les communications haute performance dans les grappes de calcul

    Get PDF
    La multiplication des processeurs et des coeurs dans les machines a conduit les architectes à renoncer aux bus mémoire centralisés. Les effets NUMA (Non-Uniform Memory Access), surtout connus pour leur impact sur l'efficacité de l'ordonnancement de processus, révèlent également une influence sensible sur les performances des entrées-sorties. Dans cet article, nous présentons une évaluation de leur incidence sur les performances réseau dans les grappes de calcul, en exhibant leur ampleur et leur aspect parfois asymétrique sur le débit. Nous proposons une solution de placement automatique et portable des tâches de communication dans la bibliothèque NewMadeleine qui permet d'obtenir des performances analogues à celles d'un placement manuel, en utilisant des informations de topologie collectées auprès du système

    A Flexible Thread Scheduler for Hierarchical Multiprocessor Machines

    Get PDF
    International audienceWith the current trend of multiprocessor machines towards more and more hierarchical architectures, exploiting the full computational power requires careful distribution of execution threads and data so as to limit expensive remote memory accesses. Existing multi-threaded libraries provide only limited facilities to let applications express distribution indications, so that programmers end up with explicitly distributing tasks according to the underlying architecture, which is difficult and not portable. In this article, we present: (1) a model for dynamically expressing the structure of the computation; (2) a scheduler interpreting this model so as to make judicious hierarchical distribution decisions; (3) an implementation within the Marcel user-level thread library. We experimented our proposal on a scientific application running on a ccNUMA Bull NovaScale with 16 Intel Itanium II processors; results show a 30% gain compared to a classical scheduler, and are similar to what a handmade scheduler achieves in a non-portable way

    Memory Migration on Next-Touch

    Get PDF
    International audienceNUMA abilities such as explicit migration of memory buffers enable flexible placement of data buffers at runtime near the tasks that actually access them. The move_pages system call may be invoked manually but it achieves limited throughput and implies a strong collaboration of the application. Indeed, the location of threads and their memory access patterns must be carefully known so as to decide when migrating the right memory buffer on time. We present the implementation of a Next-Touch memory placement policy so as to enable automatic dynamic migration of pages when they are actually accessed by a task. We introduce a new PTE flag setup by madvise, and the corresponding Copy-on-Touch codepath in the page-fault handler which allocates the new page near the accessing task. We then look at the performance and overheads of this model and compare it to using the move_pages system call

    Enabling High-Performance Memory Migration for Multithreaded Applications on Linux

    Get PDF
    International audienceAs the number of cores per machine increases, memory architectures are being redesigned to avoid bus contention and sustain higher throughput needs. The emergence of Non-Uniform Memory Access (NUMA) constraints has caused affinities between threads and buffers to become an important decision criteria for schedulers. Memory migration enables the dynamically joined distribution of work and data across the machine but requires high-performance data transfers as well as a convenient programming interface. We present the improvement of the Linux migration primitives and the implementation of a Next-Touch policy in the kernel to provide multithreaded applications with an easy way to dynamically maintain thread-data affinity. Microbenchmarks show that our work enables a high-performance, synchronous and lazy memory migration within multithreaded applications. A threaded LU factorization then reveals the large improvement that our Next-Touch policy model may bring in applications with complex access patterns

    Faculty Publications and Creative Works 2004

    Get PDF
    Faculty Publications & Creative Works is an annual compendium of scholarly and creative activities of University of New Mexico faculty during the noted calendar year. Published by the Office of the Vice President for Research and Economic Development, it serves to illustrate the robust and active intellectual pursuits conducted by the faculty in support of teaching and research at UNM

    Um sistema de comunicação configurável e extensível baseado em metaprogramação estática

    Get PDF
    Dissertação (mestrado) - Universidade Federal de Santa Catarina, Centro Tecnológico. Programa de Pós-Graduação em Ciência da Computação

    Exploring Computational Chemistry on Emerging Architectures

    Get PDF
    Emerging architectures, such as next generation microprocessors, graphics processing units, and Intel MIC cards, are being used with increased popularity in high performance computing. Each of these architectures has advantages over previous generations of architectures including performance, programmability, and power efficiency. With the ever-increasing performance of these architectures, scientific computing applications are able to attack larger, more complicated problems. However, since applications perform differently on each of the architectures, it is difficult to determine the best tool for the job. This dissertation makes the following contributions to computer engineering and computational science. First, this work implements the computational chemistry variational path integral application, QSATS, on various architectures, ranging from microprocessors to GPUs to Intel MICs. Second, this work explores the use of analytical performance modeling to predict the runtime and scalability of the application on the architectures. This allows for a comparison of the architectures when determining which to use for a set of program input parameters. The models presented in this dissertation are accurate within 6%. This work combines novel approaches to this algorithm and exploration of the various architectural features to develop the application to perform at its peak. In addition, this expands the understanding of computational science applications and their implementation on emerging architectures while providing insight into the performance, scalability, and programmer productivity
    corecore