3,508 research outputs found

    Optimal pre-scheduling of problem remappings

    Get PDF
    A large class of scientific computational problems can be characterized as a sequence of steps where a significant amount of computation occurs each step, but the work performed at each step is not necessarily identical. Two good examples of this type of computation are: (1) regridding methods which change the problem discretization during the course of the computation, and (2) methods for solving sparse triangular systems of linear equations. Recent work has investigated a means of mapping such computations onto parallel processors; the method defines a family of static mappings with differing degrees of importance placed on the conflicting goals of good load balance and low communication/synchronization overhead. The performance tradeoffs are controllable by adjusting the parameters of the mapping method. To achieve good performance it may be necessary to dynamically change these parameters at run-time, but such changes can impose additional costs. If the computation's behavior can be determined prior to its execution, it can be possible to construct an optimal parameter schedule using a low-order-polynomial-time dynamic programming algorithm. Since the latter can be expensive, the performance is studied of the effect of a linear-time scheduling heuristic on one of the model problems, and it is shown to be effective and nearly optimal

    A High-performance, Energy-efficient Modular DMA Engine Architecture

    Full text link
    Data transfers are essential in today's computing systems as latency and complex memory access patterns are increasingly challenging to manage. Direct memory access engines (DMAEs) are critically needed to transfer data independently of the processing elements, hiding latency and achieving high throughput even for complex access patterns to high-latency memory. With the prevalence of heterogeneous systems, DMAEs must operate efficiently in increasingly diverse environments. This work proposes a modular and highly configurable open-source DMAE architecture called intelligent DMA (iDMA), split into three parts that can be composed and customized independently. The front-end implements the control plane binding to the surrounding system. The mid-end accelerates complex data transfer patterns such as multi-dimensional transfers, scattering, or gathering. The back-end interfaces with the on-chip communication fabric (data plane). We assess the efficiency of iDMA in various instantiations: In high-performance systems, we achieve speedups of up to 15.8x with only 1 % additional area compared to a base system without a DMAE. We achieve an area reduction of 10 % while improving ML inference performance by 23 % in ultra-low-energy edge AI systems over an existing DMAE solution. We provide area, timing, latency, and performance characterization to guide its instantiation in various systems.Comment: 14 pages, 14 figures, accepted by an IEEE journal for publicatio

    Active-Routing: Parallelization and Scheduling of 3D-Memory Vault Computations

    Get PDF
    In an age where big data is more available than ever, new high-bandwidth, low-latency memory technology, such as Hybrid Memory Cubes (HMC), have extended into the third dimension to tighten the increasing gap between memory and CPU speeds. Processing power built into these new 3D memory technologies allows CPU cores to offload computations to memory, leading to recent interest in the design space of Processing-In-Memory (PIM) when several HMC units are chained together in a network. Using topology-oblivious Active-Routing technique in such a network, computations like dot products over a large set of data can be distributed across a virtual "tree" such that partial results are compounded at every branch "on the way" back to the CPU. We propose driving performance of Active-Routing by offloading computations to memory with high throughput offloading techniques. We present Vault-Level Parallelism to further parallelize computations by strategically dispatching computations to DRAM vault controllers within each HMC. Our new implementation distributes the resources of Active-Routing to each of the vault controllers in the HMC so as to reduce contention for compute resources. We simulate our implemented techniques and assess their performance using previously developed micro-benchmarks and a widely accepted benchmark in scientific computing. The evaluation results show an increase in overall data throughout the Active-Routing Tree with an aggregate 23x speedup

    Parallel Multiscale Contact Dynamics for Rigid Non-spherical Bodies

    Get PDF
    The simulation of large numbers of rigid bodies of non-analytical shapes or vastly varying sizes which collide with each other is computationally challenging. The fundamental problem is the identification of all contact points between all particles at every time step. In the Discrete Element Method (DEM), this is particularly difficult for particles of arbitrary geometry that exhibit sharp features (e.g. rock granulates). While most codes avoid non-spherical or non-analytical shapes due to the computational complexity, we introduce an iterative-based contact detection method for triangulated geometries. The new method is an improvement over a naive brute force approach which checks all possible geometric constellations of contact and thus exhibits a lot of execution branching. Our iterative approach has limited branching and high floating point operations per processed byte. It thus is suitable for modern Single Instruction Multiple Data (SIMD) CPU hardware. As only the naive brute force approach is robust and always yields a correct solution, we propose a hybrid solution that combines the best of the two worlds to produce fast and robust contacts. In terms of the DEM workflow, we furthermore propose a multilevel tree-based data structure strategy that holds all particles in the domain on multiple scales in grids. Grids reduce the total computational complexity of the simulation. The data structure is combined with the DEM phases to form a single touch tree-based traversal that identifies both contact points between particle pairs and introduces concurrency to the system during particle comparisons in one multiscale grid sweep. Finally, a reluctant adaptivity variant is introduced which enables us to realise an improved time stepping scheme with larger time steps than standard adaptivity while we still minimise the grid administration overhead. Four different parallelisation strategies that exploit multicore architectures are discussed for the triad of methodological ingredients. Each parallelisation scheme exhibits unique behaviour depending on the grid and particle geometry at hand. The fusion of them into a task-based parallelisation workflow yields promising speedups. Our work shows that new computer architecture can push the boundary of DEM computability but this is only possible if the right data structures and algorithms are chosen

    Discrete element method simulation of packing and rheological properties of coke and coke/pitch mixtures

    Get PDF
    La production mondiale d’aluminium, produit via le procédé Hall Héroult, est actuellement autour de 60000 tonnes annuellement. Ce procédé a principalement conservé le concept original développé en 1886. Les anodes de carbone précuites utlisées dans ce procédé représentent une part importante du design des cellules d’électrolyse de l’aluminium. Les anodes font partie de la réaction chimique de la réduction de l’alumine et sont consommées lors du processus d’électrolyse. De ce fait, le niveau de consommation et la qualité des anodes ont un effet direct sur la performance des alumineries dans le marché extrêmement compétitif de la production d’aluminium. Bien que le processus et le design des anodes datent de 130 ans, l’effet des propriétés des matières premières sur la qualité finale des anodes n’est pas tout à fait maîtrisé, nécessitant ainsi des recherches approfondies. Les anodes de carbone sont composées de particules de coke, de pitch et de mégots d’anodes. Le pitch à la température de mélange et de formage est un liquide. Par conséquent, le mélange est une pâte de coke et des agrégats de mégots et pitch agissant comme liant. Le comportement de l'écoulement et du compactage de ce mélange en raison de la coexistence d'une variété de paramètres physiques, chimiques et mécaniques sont des phénomènes complexes. Compte tenu de l'importance des anodes de haute qualité et de longue durée en performance et donc l'économie des cellules de réduction, sous-estimer et prédire les propriétés finales des anodes sont très importantes pour les fonderies. La modélisation numérique dans des problèmes aussi complexes peut fournir un laboratoire virtuel où les effets de différents paramètres de processus ou des matériaux sur la qualité de l'anode peuvent être étudiés sans risquer la performance du pot. Toutefois, le choix de la méthode numérique est une décision critique qui doit être prise en fonction de la physique du problème et de l'échelle géométrique des problèmes étudiés. La méthode des éléments discrets (DEM) est utilisée dans ce travail de recherche pour modéliser les deux phases de la pâte d’anode; les agrégats de coke et le brai de pétrole. Dans cette partie du travail, les modèles DEM d’agrégats de coke sont utilisés pour simuler les tests de densité en vrac vibrée des particules de coke et pour révéler les paramètres impliqués. De par sa nature, la DEM est idéale pour étudier les contacts entre particules. Les résultats de ces travaux seront ensuite utilisés pour proposer de nouvelles recettes d’agrégats secs avec une densité en vrac supérieure. La résistivité électrique de lits de particules a été mesurée expérimentalement. Les informations sur les contacts entre particules obtenues à partir des modèles numériques ont été utilisées pour expliquer la résistivité électrique de lits de particules avec différentes distribution de tailles de particules. Les résultats ont montré que lorsque le nombre de contacts par unité de volume augmente dans un échantillon, la résistivité électrique augmente aussi. La densité compactée du lit de particules a aussi une influence sur le passage de courant dans les matériaux granulaires. D’après les résultats obtenus, conserver la densité de contacts aussi basse que possible est bénéfique pour la conductivité électrique s’il n’a pas d’impact négatif sur la densité compactée. Le brai de houille est un matériau viscoélastique à température élevée. Dans ce travail, les propriétés rhéologiques du brai et de la matrice liante (brai + particules fines de coke) ont été mesurées expérimentalement en utilisant un rhéomètre à cisaillement dynamique à 135, 140 145 et 150 °C. Le modèle de Burger à quatre éléments est alors utilisé pour modéliser le comportement mécanique du brai à 150 °C. Le modèle vérifié est alors utilisé pour étudier les propriétés rhéologiques du brai et du mélange coke /brai à 150 °C. Le modèle de Burger calibré démontre une bonne prédiction des propriétés viscoélastiques du brai et de la matrice liante à différentes températures. Les résultats obtenus montrent que, considérant la physique du problème, la méthode des éléments distincts est une technique de simulation numérique adaptée pour étudier les effets des matières premières sur les propriétés mécaniques et physiques des mélanges coke /brai.Global aluminum production now is around 60 000 metric tonnes, annually, which is produced by the Hall-Héroult process. The process has mostly kept the original concept developed in 1886. Pre-baked carbon anodes are an important part of the design of aluminum smelting cells. Anodes are part of the chemical reaction of alumina reduction and are consumed during the process. Thus, quality and properties of anodes have direct effects on the performance and economy of the aluminum production in today’s highly competitive market. Although the design of anodes goes back to 130 years ago, effects of raw materials properties on final quality of anodes still need to be investigated. Anodes are composed of granulated calcined coke, binder pitch and recycled anode butts. Pitch at temperatures of mixing and forming steps is a liquid. Hence the mixture is a paste of coke and butts aggregates with pitch acting as binder. Flow and compaction behavior of this mixture, because of the co-existence of a variety of physical, chemical and mechanical parameters are complicated phenomena. Given the importance of high quality and long lasting anodes in performance and so the economy of the reduction cells, understating and predicting the final properties of anodes are very important for smelters. Numerical modeling in such complicated problems can provide a virtual laboratory where effects of different materials or process parameters on anode quality index can be studied without risking the pot performance. However, the choice of the numerical framework is a critical decision which needs to be taken according to the physics of the problem and the geometrical scale of the investigated problems. Discrete Element Method (DEM) is used in this research work to model the anode paste. In the first step, DEM models of coke aggregates are used to simulate the vibrated bulk density test of coke particles and to reveal the parameters involved. As a micromechanical model, DEM provides a unique opportunity to investigate the particle-particle contacts. The developed DEM models of coke aggregates were then used to propose a new dry aggregates recipe exhibiting higher packing density. Packing density of coke aggregates has direct effect on the baked density of anodes. High density is a very favorable anode quality index as it has positive effects on mechanical strength, and consumption rate of anodes in the cell. Electrical resistivity of bed of particles was experimentally measured. Particle-particle contacts information obtained from numerical models were used to explain the electrical resistivity of samples with different size distribution. Results showed that the increase in the number of contacts in volume unit of a sample increases, the electrical resistivity of the particle bed. Packing density also influences the electrical current transfer in granular systems. According to the obtained results, keeping the contacts density as low as possible is beneficial for electrical conductivity if it does not have a negative effect on packing density. Pitch is a viscoelastic material at elevated temperatures. In the present work, rheological properties of pitch and binder matrix (pitch+fine coke particles) were experimentally measured using a dynamic shear rheometer at 135, 140, 145 and 150 ºC. Four-element Burger’s model is then used to model the mechanical behavior of pitch and binder matrix. The verified model is then used to investigate the rheological properties of pitch and coke/pitch mixtures at 150 ºC. Calibrated Burger’s model showed to have a good prediction of viscoelastic properties of pitch and binder matrix at different temperatures. Obtained numerical results showed that available empirical equations in the literature fail to predict the complex modulus of mixtures of pitch and coke particles. As pitch has viscoelastic response and coke particles have irregular shapes, rheology of this mixture is more complicated and needs well-tailored mathematical models. Complex modulus of pitch decreases by increasing the temperature from 135 to 150 ºC, this makes easier the coke/pitch mixtures to flow. DEM modeling showed that the mixture gets a better compaction and so lower porosity by vibro-compacting at higher temperatures. The ability of pitch to penetrate to inter-particle voids in the porous structure of bed of coke particles was also shown to be improved by temperature. Final anode structure with less porosity and so high density is favorable for its mechanical strength as well as its chemical reaction in the cell as Based on the obtained results and considering the physics of the problem, it can be said that discrete element method is an appropriate numerical simulation technique to study the effects of raw materials and the anode paste formulation on mechanical and physical properties of coke/pitch mixtures. The platform created in the course of this research effort, provides a unique opportunity to study a variety of parameters such as size distribution, shape and content of coke particles, content and rheological properties of pitch on densification of coke/pitch mixtures in vibro-compaction process. Outputs of this thesis provide a better understanding of complicated response of anode paste in the forming process
    • …
    corecore