10 research outputs found

    Energy-efficiency evaluation of Intel KNL for HPC workloads

    Get PDF
    Energy consumption is increasingly becoming a limiting factor to the design of faster large-scale parallel systems, and development of energy-efficient and energy-aware applications is today a relevant issue for HPC code-developer communities. In this work we focus on energy performance of the Knights Landing (KNL) Xeon Phi, the latest many-core architecture processor introduced by Intel into the HPC market. We take into account the 64-core Xeon Phi 7230, and analyze its energy performance using both the on-chip MCDRAM and the regular DDR4 system memory as main storage for the application data-domain. As a benchmark application we use a Lattice Boltzmann code heavily optimized for this architecture and implemented using different memory data layouts to store its lattice. We assessthen the energy consumption using different memory data-layouts, kind of memory (DDR4 or MCDRAM) and number of threads per core

    A PETSc parallel-in-time solver based on MGRIT algorithm

    Get PDF
    We address the development of a modular implementation of the MGRIT (MultiGrid-In-Time) algorithm to solve linear and nonlinear systems that arise from the discretization of evolutionary models with a parallel-in-time approach in the context of the PETSc (the Portable, Extensible Toolkit for Scientific computing) library. Our aim is to give the opportunity of predicting the performance gain achievable when using the MGRIT approach instead of the Time Stepping integrator (TS). To this end, we analyze the performance parameters of the algorithm that provide a-priori the best number of processing elements and grid levels to use to address the scaling of MGRIT, regarded as a parallel iterative algorithm proceeding along the time dimensio

    Storage QoS provisioning for execution programming of data-intensive applications

    Get PDF
    Abstract. In this paper a method for execution programming of data-intensive applications is presented. The method is based on storage Quality of Service (SQoS) provisioning. SQoS provisioning uses the semantic based storage monitoring based on a storage resources model and a storage performance management. Test results show the gain for the execution time when using the QStorMan toolkit which implements the presented method. Taking into account the SQoS provisioning opportunity on the one hand, and the increasingly growing user demands on the other hand, we believe that the execution programming of data-intensive applications can bring a new quality into the application execution

    Massively parallel lattice–Boltzmann codes on large GPU clusters

    Get PDF
    This paper describes a massively parallel code for a state-of-the art thermal lattice–Boltzmann method. Our code has been carefully optimized for performance on one GPU and to have a good scaling behavior extending to a large number of GPUs. Versions of this code have been already used for large-scale studies of convective turbulence. GPUs are becoming increasingly popular in HPC applications, as they are able to deliver higher performance than traditional processors. Writing efficient programs for large clusters is not an easy task as codes must adapt to increasingly parallel architectures, and the overheads of node-to-node communications must be properly handled. We describe the structure of our code, discussing several key design choices that were guided by theoretical models of performance and experimental benchmarks. We present an extensive set of performance measurements and identify the corresponding main bottlenecks; finally we compare the results of our GPU code with those measured on other currently available high performance processors. Our results are a production-grade code able to deliver a sustained performance of several tens of Tflops as well as a design and optimization methodology that can be used for the development of other high performance applications for computational physics

    Sustainability performance assessment of municipal solid waste management utilising aggregated indicators approach

    Get PDF
    There is a need for effective and sustainable municipal solid waste (MSW) management system to be implemented in Malaysia, especially in the urban areas. Indicators have often been chosen as a tool to evaluate the performances of the current MSW management system in Malaysia. From the literature reviewed, no index was found to be similar with the one being proposed by this study. This study was conducted to produce a set of indicators that evaluate the MSW management system throughout the entire life cycle. The development of these indicators involved intensive literature reviews, discussion meetings with stakeholders, and workshop organisation with solid waste management experts. Weightage were assigned to the established indicators by using analytical hierarchy process, which were then incorporated into a performance index, known as municipal solid waste management performance index (MSWMPI). Data collection were done at five cities, which were Muar, Rembau, Putrajaya, Langkawi and Pekan. As a result, a total of nine indicators under four criteria, C1 (MSW Generation and Segregation), C2 (MSW Collection and Transportation), C3 (MSW Treatment) and C4 (MSW Disposal), were finalised. The weightage for the four criteria were found to be 32.17% for C1, 19.82% for C2, 25.41% for C3, and 22.60% for C4. Among the five cities, Pekan had the highest MSWMPI, with a value of 74.85 and was rated as performing good. On the other hand, the MSW management system in Muar had the lowest MSWMPI, with a value of 51.23. Langkawi had an MSWMPI of 59.89, which was followed behind closely by Rembau (58.12) and finally, Putrajaya had the MSWMPI value of 52.43. City profiling among the respective cities had also been done to identify the hotspots in the MSW management system. It was found that most cities performing well in C1 and C2, would not perform greatly in C3 and C4, and vice versa

    A computationally efficient Branch-and-Bound algorithm for the permutation flow-shop scheduling problem

    Get PDF
    International audienceIn this work we propose an efficient branch-and-bound (B&B) algorithm for the permutation flow-shop problem (PFSP) with makespan objective. We present a new node decomposition scheme that combines dynamic branching and lower bound refinement strategies in a computationally efficient way. To alleviate the computational burden of the two-machine bound used in the refinement stage, we propose an online learning-inspired mechanism to predict promising couples of bottleneck machines. The algorithm offers multiple choices for branching and bounding operators and can explore the search tree either sequentially or in parallel on multi-core CPUs. In order to empirically determine the most efficient combination of these components, a series of computational experiments with 600 benchmark instances is performed. A main insight is that the problem size, aswell as interactions between branching and bounding operators substantially modify the trade-off between the computational requirements of a lower bound and the achieved tree size reduction. Moreover, we demonstrate that parallel tree search is a key ingredient for the resolution of largeproblem instances, as strong super-linear speedups can be observed. An overall evaluation using two well-known benchmarks indicates that the proposed approach is superior to previously published B&B algorithms. For the first benchmark we report the exact resolution – within less than20 minutes – of two instances defined by 500 jobs and 20 machines that remained open for more than 25 years, and for the second a total of 89 improved best-known upper bounds, including proofs of optimality for 74 of them

    Proceedings of the Third International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2016) Sofia, Bulgaria

    Get PDF
    Proceedings of: Third International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2016). Sofia (Bulgaria), October, 6-7, 2016

    Scalable Algorithms for the Analysis of Massive Networks

    Get PDF
    Die Netzwerkanalyse zielt darauf ab, nicht-triviale Erkenntnisse aus vernetzten Daten zu gewinnen. Beispiele für diese Erkenntnisse sind die Wichtigkeit einer Entität im Verhältnis zu anderen nach bestimmten Kriterien oder das Finden des am besten geeigneten Partners für jeden Teilnehmer eines Netzwerks - bekannt als Maximum Weighted Matching (MWM). Da der Begriff der Wichtigkeit an die zu betrachtende Anwendung gebunden ist, wurden zahlreiche Zentralitätsmaße eingeführt. Diese Maße stammen hierbei aus Jahrzehnten, in denen die Rechenleistung sehr begrenzt war und die Netzwerke im Vergleich zu heute viel kleiner waren. Heute sind massive Netzwerke mit Millionen von Kanten allgegenwärtig und eine triviale Berechnung von Zentralitätsmaßen ist oft zu zeitaufwändig. Darüber hinaus ist die Suche nach der Gruppe von k Knoten mit hoher Zentralität eine noch kostspieligere Aufgabe. Skalierbare Algorithmen zur Identifizierung hochzentraler (Gruppen von) Knoten in großen Graphen sind von großer Bedeutung für eine umfassende Netzwerkanalyse. Heutigen Netzwerke verändern sich zusätzlich im zeitlichen Verlauf und die effiziente Aktualisierung der Ergebnisse nach einer Änderung ist eine Herausforderung. Effiziente dynamische Algorithmen sind daher ein weiterer wesentlicher Bestandteil moderner Analyse-Pipelines. Hauptziel dieser Arbeit ist es, skalierbare algorithmische Lösungen für die zwei oben genannten Probleme zu finden. Die meisten unserer Algorithmen benötigen Sekunden bis einige Minuten, um diese Aufgaben in realen Netzwerken mit bis zu Hunderten Millionen von Kanten zu lösen, was eine deutliche Verbesserung gegenüber dem Stand der Technik darstellt. Außerdem erweitern wir einen modernen Algorithmus für MWM auf dynamische Graphen. Experimente zeigen, dass unser dynamischer MWM-Algorithmus Aktualisierungen in Graphen mit Milliarden von Kanten in Millisekunden bewältigt.Network analysis aims to unveil non-trivial insights from networked data by studying relationship patterns between the entities of a network. Among these insights, a popular one is to quantify the importance of an entity with respect to the others according to some criteria. Another one is to find the most suitable matching partner for each participant of a network knowing the pairwise preferences of the participants to be matched with each other - known as Maximum Weighted Matching (MWM). Since the notion of importance is tied to the application under consideration, numerous centrality measures have been introduced. Many of these measures, however, were conceived in a time when computing power was very limited and networks were much smaller compared to today's, and thus scalability to large datasets was not considered. Today, massive networks with millions of edges are ubiquitous, and a complete exact computation for traditional centrality measures are often too time-consuming. This issue is amplified if our objective is to find the group of k vertices that is the most central as a group. Scalable algorithms to identify highly central (groups of) vertices on massive graphs are thus of pivotal importance for large-scale network analysis. In addition to their size, today's networks often evolve over time, which poses the challenge of efficiently updating results after a change occurs. Hence, efficient dynamic algorithms are essential for modern network analysis pipelines. In this work, we propose scalable algorithms for identifying important vertices in a network, and for efficiently updating them in evolving networks. In real-world graphs with hundreds of millions of edges, most of our algorithms require seconds to a few minutes to perform these tasks. Further, we extend a state-of-the-art algorithm for MWM to dynamic graphs. Experiments show that our dynamic MWM algorithm handles updates in graphs with billion edges in milliseconds

    Vol. 14, No. 1 (Full Issue)

    Get PDF
    corecore