316 research outputs found

    A self-mobile skeleton in the presence of external loads

    Get PDF
    Multicore clusters provide cost-effective platforms for running CPU-intensive and data-intensive parallel applications. To effectively utilise these platforms, sharing their resources is needed amongst the applications rather than dedicated environments. When such computational platforms are shared, user applications must compete at runtime for the same resource so the demand is irregular and hence the load is changeable and unpredictable. This thesis explores a mechanism to exploit shared multicore clusters taking into account the external load. This mechanism seeks to reduce runtime by finding the best computing locations to serve the running computations. We propose a generic algorithmic data-parallel skeleton which is aware of its computations and the load state of the computing environment. This skeleton is structured using the Master/Worker pattern where the master and workers are distributed on the nodes of the cluster. This skeleton divides the problem into computations where all these computations are initiated by the master and coordinated by the distributed workers. Moreover, the skeleton has built-in mobility to implicitly move the parallel computations between two workers. This mobility is data mobility controlled by the application, the skeleton. This skeleton is not problem-specific and therefore it is able to execute different kinds of problems. Our experiments suggest that this skeleton is able to efficiently compensate for unpredictable load variations. We also propose a performance cost model that estimates the continuation time of the running computations locally and remotely. This model also takes the network delay, data size and the load state as inputs to estimate the transfer time of the potential movement. Our experiments demonstrate that this model takes accurate decisions based on estimates in different load patterns to reduce the total execution time. This model is problem-independent because it considers the progress of all current computations. Moreover, this model is based on measurements so it is not dependent on the programming language. Furthermore, this model takes into account the load state of the nodes on which the computation run. This state includes the characteristics of the nodes and hence this model is architecture-independent. Because the scheduling has direct impact on system performance, we support the skeleton with a cost-informed scheduler that uses a hybrid scheduling policy to improve the dynamicity and adaptivity of the skeleton. This scheduler has agents distributed over the participating workers to keep the load information up to date, trigger the estimations, and facilitate the mobility operations. On runtime, the skeleton co-schedules its computations over computational resources without interfering with the native operating system scheduler. We demonstrate that using a hybrid approach the system makes mobility decisions which lead to improved performance and scalability over large number of computational resources. Our experiments suggest that the adaptivity of our skeleton in shared environment improves the performance and reduces resource contention on nodes that are heavily loaded. Therefore, this adaptivity allows other applications to acquire more resources. Finally, our experiments show that the load scheduler has a low incurred overhead, not exceeding 0.6%, compared to the total execution time

    Proto-Plasm: parallel language for adaptive and scalable modelling of biosystems

    Get PDF
    This paper discusses the design goals and the first developments of Proto-Plasm, a novel computational environment to produce libraries of executable, combinable and customizable computer models of natural and synthetic biosystems, aiming to provide a supporting framework for predictive understanding of structure and behaviour through multiscale geometric modelling and multiphysics simulations. Admittedly, the Proto-Plasm platform is still in its infancy. Its computational framework—language, model library, integrated development environment and parallel engine—intends to provide patient-specific computational modelling and simulation of organs and biosystem, exploiting novel functionalities resulting from the symbolic combination of parametrized models of parts at various scales. Proto-Plasm may define the model equations, but it is currently focused on the symbolic description of model geometry and on the parallel support of simulations. Conversely, CellML and SBML could be viewed as defining the behavioural functions (the model equations) to be used within a Proto-Plasm program. Here we exemplify the basic functionalities of Proto-Plasm, by constructing a schematic heart model. We also discuss multiscale issues with reference to the geometric and physical modelling of neuromuscular junctions

    Geographic Variation in Rock Wren (Salpinctes Obsoletus) Song Complexity

    Get PDF
    Birds sing to advertise for mates and repel rivals, but there is enormous variety in how they do this. One of the best-studied and most intriguing questions in the field is how song varies in complexity from one bird to the next, at all taxonomic levels. Several studies have found associations between migratory behavior or latitudinal gradients and song complexity, but it remains unclear how universal this pattern is or what factors may be driving it. This small body of literature suffers from several problems, perhaps the most glaring of which is the lack of systematic, population-level studies. The main goals of this dissertation were to determine what evidence there is for the hypothesis that song complexity is influenced by latitude and/or migratory behavior and whether such a pattern can be detected in a single species, the rock wren (Salpinctes obsoletus). I recorded rock wren song at 11 sites in a latitudinal transect with both migratory and sedentary populations, and used morphological measurements and genome-level SNP scans to test my classification scheme of migratory versus sedentary populations. Song repertoire size was larger in sedentary rock wrens but did not vary with latitude, while migratory wrens had smaller mean repertoire sizes which increased with increasing latitude. Morphological measurements differed between migratory and sedentary populations, suggesting life history differences between these two groups. Population genetic structure was only apparent using outlier loci, but the resulting structure was not concordant with migratory behavior or site membership. Taken together, these results suggest migration does not pose a barrier to gene flow between migratory and sedentary populations, and that migratory and sedentary behavior is associated with differences in song complexity and morphology, although in a way inconsistent with any previously published hypotheses

    Efficiently and Transparently Maintaining High SIMD Occupancy in the Presence of Wavefront Irregularity

    Get PDF
    Demand is increasing for high throughput processing of irregular streaming applications; examples of such applications from scientific and engineering domains include biological sequence alignment, network packet filtering, automated face detection, and big graph algorithms. With wide SIMD, lightweight threads, and low-cost thread-context switching, wide-SIMD architectures such as GPUs allow considerable flexibility in the way application work is assigned to threads. However, irregular applications are challenging to map efficiently onto wide SIMD because data-dependent filtering or replication of items creates an unpredictable data wavefront of items ready for further processing. Straightforward implementations of irregular applications on a wide-SIMD architecture are prone to load imbalance and reduced occupancy, while more sophisticated implementations require advanced use of parallel GPU operations to redistribute work efficiently among threads. This dissertation will present strategies for addressing the performance challenges of wavefront- irregular applications on wide-SIMD architectures. These strategies are embodied in a developer framework called Mercator that (1) allows developers to map irregular applications onto GPUs ac- cording to the streaming paradigm while abstracting from low-level data movement and (2) includes generalized techniques for transparently overcoming the obstacles to high throughput presented by wavefront-irregular applications on a GPU. Mercator forms the centerpiece of this dissertation, and we present its motivation, performance model, implementation, and extensions in this work

    PiCo: A Domain-Specific Language for Data Analytics Pipelines

    Get PDF
    In the world of Big Data analytics, there is a series of tools aiming at simplifying programming applications to be executed on clusters. Although each tool claims to provide better programming, data and execution models—for which only informal (and often confusing) semantics is generally provided—all share a common under- lying model, namely, the Dataflow model. Using this model as a starting point, it is possible to categorize and analyze almost all aspects about Big Data analytics tools from a high level perspective. This analysis can be considered as a first step toward a formal model to be exploited in the design of a (new) framework for Big Data analytics. By putting clear separations between all levels of abstraction (i.e., from the runtime to the user API), it is easier for a programmer or software designer to avoid mixing low level with high level aspects, as we are often used to see in state-of-the-art Big Data analytics frameworks. From the user-level perspective, we think that a clearer and simple semantics is preferable, together with a strong separation of concerns. For this reason, we use the Dataflow model as a starting point to build a programming environment with a simplified programming model implemented as a Domain-Specific Language, that is on top of a stack of layers that build a prototypical framework for Big Data analytics. The contribution of this thesis is twofold: first, we show that the proposed model is (at least) as general as existing batch and streaming frameworks (e.g., Spark, Flink, Storm, Google Dataflow), thus making it easier to understand high-level data-processing applications written in such frameworks. As result of this analysis, we provide a layered model that can represent tools and applications following the Dataflow paradigm and we show how the analyzed tools fit in each level. Second, we propose a programming environment based on such layered model in the form of a Domain-Specific Language (DSL) for processing data collections, called PiCo (Pipeline Composition). The main entity of this programming model is the Pipeline, basically a DAG-composition of processing elements. This model is intended to give the user an unique interface for both stream and batch processing, hiding completely data management and focusing only on operations, which are represented by Pipeline stages. Our DSL will be built on top of the FastFlow library, exploiting both shared and distributed parallelism, and implemented in C++11/14 with the aim of porting C++ into the Big Data world

    Architecture aware parallel programming in Glasgow parallel Haskell (GPH)

    Get PDF
    General purpose computing architectures are evolving quickly to become manycore and hierarchical: i.e. a core can communicate more quickly locally than globally. To be effective on such architectures, programming models must be aware of the communications hierarchy. This thesis investigates a programming model that aims to share the responsibility of task placement, load balance, thread creation, and synchronisation between the application developer and the runtime system. The main contribution of this thesis is the development of four new architectureaware constructs for Glasgow parallel Haskell that exploit information about task size and aim to reduce communication for small tasks, preserve data locality, or to distribute large units of work. We define a semantics for the constructs that specifies the sets of PEs that each construct identifies, and we check four properties of the semantics using QuickCheck. We report a preliminary investigation of architecture aware programming models that abstract over the new constructs. In particular, we propose architecture aware evaluation strategies and skeletons. We investigate three common paradigms, such as data parallelism, divide-and-conquer and nested parallelism, on hierarchical architectures with up to 224 cores. The results show that the architecture-aware programming model consistently delivers better speedup and scalability than existing constructs, together with a dramatic reduction in the execution time variability. We present a comparison of functional multicore technologies and it reports some of the first ever multicore results for the Feedback Directed Implicit Parallelism (FDIP) and the semi-explicit parallelism (GpH and Eden) languages. The comparison reflects the growing maturity of the field by systematically evaluating four parallel Haskell implementations on a common multicore architecture. The comparison contrasts the programming effort each language requires with the parallel performance delivered. We investigate the minimum thread granularity required to achieve satisfactory performance for three implementations parallel functional language on a multicore platform. The results show that GHC-GUM requires a larger thread granularity than Eden and GHC-SMP. The thread granularity rises as the number of cores rises

    Évaluation du potentiel de liquéfaction à l’aide du nouvel essai triaxial en cisaillement simple (TxSS)

    Get PDF
    Abstract: In recent decades, several total and effective stress constitutive models have been developed in geotechnical engineering practice to perform one-dimensional site response analysis. These constitutive models have been incorporated in either finite difference or finite element dynamic analysis programs. Also, numerous research work has been done to predict more accurately earthquake-induced pore water pressure more accurately. Hitherto, the developed pore pressure models can be classified into stress, strain, and energy-based models. In the current study, strain-based and energy-based pore pressure models were proposed based on a series of strain-controlled cyclic TxSS tests performed on reconstituted specimens of Baie Saint Paul, Carignan, Ottawa C-109, Ottawa F-65, and Quebec CF6B sands. The proposed strain-based model was used to investigate the equivalent number of cycles concept and to assess the pore pressure as a damage metric. The energy-based model was combined with the Sigmoidal-model in FLAC 2D to introduce a simplified coupled energy-based pore pressure model. The proposed model was calibrated and verified, in terms of shear stress-strain response and excess pore pressure development, based on a series of strain-controlled cyclic TxSS test results. On an elemental-level, the model results were validated under cyclic strain-controlled and stress-controlled tests and a fair agreement was observed between the energy-based model and DSS results in terms of liquefaction resistance. In addition, the proposed model was validated by incorporating the energy-model in FLAC3D platform to study the cyclic behavior under triaxial and simple shear conditions. The numerical simulation clarifies the difference between cyclic triaxial and simple shear conditions as well as the load conditions (i.e. stress or strain-controlled conditions). Further validation was performed by numerical simulation of a centrifuge model conducted by Ramirez et al. (2017) at the University of Colorado Boulder by using the well-established Finn model and by the proposed energy-based model. The comparison shows the capability of the proposed energy-based model in conjunction with the Sigmoidal-model to very well simulate the seismic response in liquefaction analysis. The proposed simplified coupled energy-based pore pressure model was implemented to assess the compatibility of the liquefaction charts in the eastern and western North America as a part of this study. Different hypothetical sand deposits having different fundamental periods were subjected to two scaled-up earthquakes to perform 1-D site response analysis. One is compatible with the National Building Code of Canada 2005 (synthetic earthquake) and another incompatible real earthquake from the western region (Northridge earthquake). The comparison in terms of the generated pore pressure, the equivalent number of cycles and incorporated liquefaction charts (CRR-(N1)60CS) highlights the inaccuracy of using current liquefaction charts in Eastern regions.Au cours des dernières décennies, plusieurs modèles constitutifs des contraintes totales et effectives ont été devellope dans la pratique de geotechnique pour effectuer une analyse unidimensionnelle de la réponse du site. Ces modèles constitutifs ont en fait été incorporés dans l'analyse dynamique par différence finie ou par éléments finis. De nombreux travaux de recherche ont été effectués pour prédire avec plus de précision la pression d'eau interstitielle induite par le séisme. Jusqu'à présent, les modèles de pression interstitielle développés peuvent être classés en modèles basés sur la contrainte, la déformation et l'énergie-dissipe. Dans le cadre de la présente étude, un modèle de pression interstitielle basé sur la déformation et un autre modèle de pression interstitielle basé sur l'énergie sont proposés à partir d'une série d'essais TxSS cycliques contrôlés par déformation effectués sur des échantillons de sol reconstitué à Baie Saint-Paul, Ottawa C-109, Ottawa F-65 et dans les sables de Québéc. Le modèle basé sur la déformation proposé a été utilisé pour étudier le concept du nombre équivalent de cycles et pour évaluer la pression interstitielle comme mesure des dommages. Cependant, le modèle basé sur l'énergie est combiné avec le modèle Sigmoidal dans le logiciel FLAC pour introduire un modèle couplé simplifié de pression interstitielle basé sur l'énergie. Le modèle proposé est calibré et vérifié, en termes de réponse contrainte-déformation et de pression interstitielles, sur la base d'une série de résultats d'essais TxSS cycliques à contrainte contrôlée. Au niveau des éléments, les résultats du modèle ont été validés dans le cadre d'essais cycliques à contrainte contrôlée et d'essais alternatifs à contrainte contrôlée et une concordance a été observé entre les résultats du modèle énergétique et ceux du DSS en termes de potentielle de liquéfaction. De plus, le modèle proposé a été utilisé en incorporant le modèle d'énergie dans la plate-forme FLAC3D pour étudier le comportement cyclique dans des conditions de cisaillement simple et triaxial. La simulation numérique clarifie la différence entre les conditions de cisaillement cycliques triaxiales et les conditions de cisaillement simples ainsi que les conditions de charge (c'est-à-dire les conditions de contrainte ou de déformation contrôlées). D'autres validations ont été effectuées par simulation numérique d'un modèle expérimental de centrifugeuse mené par Ramirez et al (2017) à l'Université du Colorado Boulder par le modèle Finn bien établi et par le modèle énergétique proposé. La comparaison obtenue montre la capacité du modèle énergétique proposé conjointement avec le modèle Sigmoidal à capturer la réponse sismique dans l'analyse de liquéfaction. Dans le cadre de cette étude, le modèle simplifié de pression interstitielle couplée basée sur l'énergie a été mis en oeuvre pour évaluer la capacité des chartes de liquéfaction dans l'Est et l'Ouest de l'Amérique du Nord. Différents dépôts de sable de niveau hypothétique ayant des périodes fondamentales différentes ont été soumis à deux séismes de grande échelle pour effectuer une analyse 1-D de la réponse du site. L'un est compatible avec le Code Nationale du Batiment 2005 (tremblement de terre synthétique) et un autre tremblement de terre réel incompatible de la région ouest (séisme de Northridge). La comparaison en termes de pression interstitielle générée, de nombre équivalent de cycles et de charte de liquéfaction incorporés (CRR-(N1)60CS) souligne une certaines imprecision de l'utilisation des chartes de liquéfaction actuels dans les régions de l'Est

    An Automatic Fusion Mechanism for Variable-Length List Skeletons in SkeTo

    Full text link

    Autonomic behavioural framework for structural parallelism over heterogeneous multi-core systems.

    Get PDF
    With the continuous advancement in hardware technologies, significant research has been devoted to design and develop high-level parallel programming models that allow programmers to exploit the latest developments in heterogeneous multi-core/many-core architectures. Structural programming paradigms propose a viable solution for e ciently programming modern heterogeneous multi-core architectures equipped with one or more programmable Graphics Processing Units (GPUs). Applying structured programming paradigms, it is possible to subdivide a system into building blocks (modules, skids or components) that can be independently created and then used in di erent systems to derive multiple functionalities. Exploiting such systematic divisions, it is possible to address extra-functional features such as application performance, portability and resource utilisations from the component level in heterogeneous multi-core architecture. While the computing function of a building block can vary for di erent applications, the behaviour (semantic) of the block remains intact. Therefore, by understanding the behaviour of building blocks and their structural compositions in parallel patterns, the process of constructing and coordinating a structured application can be automated. In this thesis we have proposed Structural Composition and Interaction Protocol (SKIP) as a systematic methodology to exploit the structural programming paradigm (Building block approach in this case) for constructing a structured application and extracting/injecting information from/to the structured application. Using SKIP methodology, we have designed and developed Performance Enhancement Infrastructure (PEI) as a SKIP compliant autonomic behavioural framework to automatically coordinate structured parallel applications based on the extracted extra-functional properties related to the parallel computation patterns. We have used 15 di erent PEI-based applications (from large scale applications with heavy input workload that take hours to execute to small-scale applications which take seconds to execute) to evaluate PEI in terms of overhead and performance improvements. The experiments have been carried out on 3 di erent Heterogeneous (CPU/GPU) multi-core architectures (including one cluster machine with 4 symmetric nodes with one GPU per node and 2 single machines with one GPU per machine). Our results demonstrate that with less than 3% overhead, we can achieve up to one order of magnitude speed-up when using PEI for enhancing application performance
    corecore